Latest Posts (20 found)
Jim Nielsen -29 days ago

You Might Debate It — If You Could See It

Imagine I’m the design leader at your org and I present the following guidelines I want us to adopt as a team for doing design work: How do you think that conversation would go? I can easily imagine a spirited debate where some folks disagree with any or all of my points, arguing that they should be struck as guidelines from our collective ethos of craft. Perhaps some are boring, or too opinionated, or too reliant on trends. There are lots of valid, defensible reasons. I can easily see this discussion being an exercise in frustration, where we debate for hours and get nowhere — “I suppose we can all agree to disagree”. And yet — thanks to a link to Codex’s front-end tool guidelines in Simon Willison’s article about how coding agents work — I see that these are exactly the kind of guidelines that are tucked away inside an LLM that’s generating output for many teams. It’s like a Trojan Horse of craft: guidelines you might never agree to explicitly are guiding LLM outputs, which means you are agreeing to them implicitly. It’s a good reminder about the opacity of the instructions baked in to generative tools. We would debate an open set of guidelines for hours, but if there’re opaquely baked in to a tool without our knowledge does anybody even care? When you offload your thinking, you might be on-loading someone else’s you’d never agree to — personally or collectively. Reply via: Email · Mastodon · Bluesky Typography: Use expressive, purposeful fonts and avoid default stacks (Inter, Roboto, Arial, system). Motion: Use a few meaningful animations (page-load, staggered reveals) instead of generic micro-motions. Background: Don't rely on flat, single-color backgrounds; use gradients, shapes, or subtle patterns to build atmosphere. Overall: Avoid boilerplate layouts and interchangeable UI patterns. Vary themes, type families, and visual languages.

0 views
iDiallo Today

Communication Is Surveillance by Design

In the very last scene of The Bourne Supremacy , Jason Bourne calls the CIA from what they presume is a public phone. Landy, who answers the call, instructs her team to trace it. Bourne says he wants to come in and asks for someone specific to meet with him. Landy stalls for time while her team tries to triangulate his exact location, so she asks how she can find the person he's referring to. That's when Bourne drops his famous line: "It's easy. She's standing right next to you." revealing that he's right in their vicinity. He hangs up seconds before the team could have located him. That's one badass ending. (֊⎚-⎚) It's not the only film where the protagonist, or antagonist, is clever enough to know exactly when to hang up before being pinpointed. There seems to be this universal piece of software that all law enforcement agencies use to triangulate calls in movies. It's some application built in the '90s, operating at modem speed, that just needs a little more time. A countdown clock. Tense music. Cut to black. What is that software actually doing? "Triangulate" implies three points, maybe three cell towers sending a ping and measuring the response time from each, then using the difference to calculate distance. Computers, even old ones, are very good at math. So why would that take a full minute? Well, mostly it doesn't. That's fiction. The moment your phone connects to a cell tower, it generates a Call Detail Record (CDR) . This record includes who you're calling (the network needs to know in order to route the call), how long the call lasts, and which specific tower and sector handled it. Location data is captured and stored automatically from the instant the call begins. In other words, the moment Jason Bourne hits send, he's already been logged. When you connect to a single tower, location accuracy can still be within several hundred meters. But phones typically connect to multiple towers simultaneously, and triangulation narrows that down to tens of meters. If you're calling from a payphone, there's no triangulation needed at all. The address of each payphone is already on record. The one advantage the protagonist realistically has is that CDR data isn't usually available in real time. Law enforcement needs to contact the telecom provider, obtain a court order, and wade through all the bureaucracy that entails. If there's a clock ticking, it should be for the number of days it takes to gather that data. Not how long the triangulation software takes to calculate. The moment you accessed this page, you left a trail. Your device asked your Internet Service Provider (ISP) to connect you to my website. That request generated a log ,the digital equivalent of a CDR, recording that your IP address requested a connection to mine. When your ISP routed you to my server, it handed over your IP address so I'd know where to send the data back. From that IP address alone, I can make a rough guess at your location, usually accurate to your city or region. Your ISP, however, knows exactly where you are. They assigned you that IP address and are actively providing your connection. This is where HTTPS comes in. You've probably noticed the padlock icon in your browser. When you connect to a website over HTTPS, the content of your communication is encrypted in transit. Your ISP (or anyone listening on the network) can see that you connected to, say , but they cannot read what you sent or received. The data looks like noise to them. The main distinction is that HTTPS hides the content, not the connection. Your ISP still sees the domain you visited. They still have a timestamp. They still have your IP address. The metadata is fully visible, even if the message itself is not. Using HTTPS wasn't something most people worried about until 2013, when Edward Snowden's leaked documents revealed that the NSA had been running programs like PRISM that compelled major technology companies to hand over user data. They tapped directly into the fiber-optic cables connecting Google and Yahoo's data centers. At those interception points, traffic that hadn't yet been encrypted internally was flowing in the open. The NSA could read emails, messages, and files, not by breaking encryption, but by scooping up data before encryption was ever applied. Or by accessing it at a point where it had already been decrypted. The content was exposed. You can partially obscure your activity from your ISP by using a VPN. A VPN tunnels your traffic through a third-party server, so your ISP sees only that you connected to the VPN, not where you went from there. But now the VPN provider holds that information instead. You haven't entirely eliminated the trail, you've relocated it . One way or another, when you use any electronic means of communication, you leave breadcrumbs. The connection is always recorded somewhere. That's why end-to-end encryption (E2EE) is important. Unlike HTTPS, which encrypts data in transit but means the server itself can read your messages, with end-to-end encryption only the sender and recipient can read the content. The service provider in the middle never holds the keys. In practice, when you send a message through an E2EE app like Signal , your device encrypts the message using your recipient's public key before it ever leaves your phone. The encrypted message travels through Signal's servers, but Signal cannot read it, because they don't have the private key needed to decrypt it. Only your recipient's device holds that key. Even if Signal were compelled by a government order to hand over your messages, all they could produce is scrambled data that's meaningless without the key. This is a meaningful protection. But it doesn't change the underlying reality: Signal still knows that your device contacted another device, at what time, and how often. The content is hidden. The connection is not. We cannot make communication invisible. We can only make it unreadable. In the realistic world, the only thing keeping Jason Bourne two steps ahead of law enforcement, is the bureaucracy and legal delays involved to retrieve CDR data. It's not his cleverness, not the speed of triangulating software, it's not technology.

0 views

Reformed

We are, once again and inexplicably, seeing a conversation unfold about reforming the military force in our streets, with body cameras and training standing in for a moral reckoning about the kind of world we want to live in, the kind of world that is livable for more than the wealthy few. We know what such “reforms” accomplish, because we’ve seen this many times before: an armed, unaccountable force with body cameras is no less deadly or immoral than an armed, unaccountable force without. A trained secret police is still the secret police. A short walk from where I write this is the old Walnut Street Jail , the first penitentiary built in the US, a precursor to the more infamous Eastern State Penitentiary, which was designed and operated by the Quakers. The Quakers advocated for reforms to the old prison systems, in which deprivation and corporal punishment were the norm, arguing that solitude, cleanliness, and discipline were better methods for rehabilitation. More than 200 years after those “reforms,” our prisons remain locations of intense deprivation, physical violence, coerced labor, and, frequently, inhumane solitary confinement—the “penitence” the Quakers were after still in short supply. Reports from the detention centers built today to house people pulled from the streets without due process shows that even those minimal standards are anything but: inedible food, overcrowding, lights on twenty-four hours a day, refusal of medical care, rape, and murder are all regular occurrences in these new prisons. This is the process that reform takes: the system is modified around the edges, often in ways that seem to cushion or obscure its real purpose, but the underlying conditions that maintain it remain unchanged. The old ways resurface, eventually. But if not reform, then what? What else can we do? André Gorz proposes a concept of “non-reformist reforms,” reforms which bring the future into the present…[that] make power tangible now by means of actions which demonstrate to the workers their positive strength. For Gorz, a reform is non-reformist if it both exercises the power and agency of workers acting together and foreshadows the future world in the present. That is, a non-reformist reform requires both concrete, bottoms-up action and the reflection of a different world within that action, the way a small fractal prefigures the large. Body cameras promise increased surveillance with no attendant increase in accountability, while training maintains the distribution of money and resources away from care and towards cops and prisons; both reforms represent business as usual, not a remade world. Only abolitionist demands—to defund militarized police forces in all their many forms, to invest instead in schools, libraries, homes, healthcare, childcare, and more—can both exercise that power and foreshadow a world where care overcomes criminalization. To put this another way: a reform maintains the old world, often under cover. While a non-reformist reform demands that we build a new world, one in which all humans and the more-than-human world can thrive. We must take small steps towards the future we want; there is no other way. But each step must point the way toward that future, a drop of water that heralds the wave. View this post on the web , subscribe to the newsletter , or reply via email .

0 views

3 Bullets and a Call to Action

☕ Welcome to The Coder Cafe! Today, we discuss an efficient communication method presented in the Debugging Teams book called 3 bullets and a call to action. I’ve been using it extensively over the past months, and I can confirm its efficiency. Get cozy, grab a coffee, and let’s begin! At Google, I recently switched to a new domain: Google Distributed Cloud Connected 1 . Here, all the teams are very busy, and finding an efficient way to communicate over email or chat can be challenging, especially when asking someone to do something. Recently, I came across a simple technique: three bullets and one call to action. The idea is the following: Add three bullet points explaining the key context Follow with one clear call to action Let’s look at a concrete example. Suppose you receive the following email: I recently wrote a design doc on how to save storage in the context of X, where I describe the current problem and the approach we could take to address it. In the document, I go through the main trade-offs involved and explain why the proposal focuses on solution Y in particular. I also included several open questions related to the deployment strategy and some areas where feedback would be especially helpful. It would be great if you could take a look at the document and leave comments by Friday. Quite a mouthful. It requires a non-trivial amount of brain time to understand both the context and what the person is actually asking for. Now let’s apply the three bullets and a call to action strategy: I recently wrote a design doc on how to save storage in the context of X. It highlights the main trade-offs and focuses on the solution Y. I’ve added open questions around the deployment strategy. Could you please have a look and leave comments by Friday? Much better, right? The call to action is clear, and the context is structured around short and easy-to-scan sentences. Why does it work? When communicating via email or chat, people prefer short and memorable messages that do not require too much cognitive effort to process. Bullet points help break information into smaller chunks, which makes the message easier to scan quickly. Ideally, the bullet points and the call to action should be as short as possible. Another aspect is that 3 is often a magic number in communication. With 2 items, you often get a contrast. With 3 items, you start to get a small structure or rhythm that is easier for the mind to process. That is one of the reasons why the rule of three appears so often in writing, storytelling, and presentations, where it helps make ideas more engaging and convincing. Remember: to improve your chances of getting an answer to your request, use 3 short bullets and an efficient call to action. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. 10 Rules I Learned About Technical Writing The XY Problem Don’t Forget About Your Mental Health Rule of three — Thinking Insights Rule of three (writing) — Wikipedia Only 2, sorry about that. ❤️ If you enjoyed this post, please hit the like button. 💬 What do you think about this strategy? Have you tried something similar? Leave a comment That partially explains why I wasn’t so active with The Coder Cafe these days. It will get better, I promise. At Google, I recently switched to a new domain: Google Distributed Cloud Connected 1 . Here, all the teams are very busy, and finding an efficient way to communicate over email or chat can be challenging, especially when asking someone to do something. Recently, I came across a simple technique: three bullets and one call to action. The idea is the following: Add three bullet points explaining the key context Follow with one clear call to action I recently wrote a design doc on how to save storage in the context of X. It highlights the main trade-offs and focuses on the solution Y. I’ve added open questions around the deployment strategy. 10 Rules I Learned About Technical Writing The XY Problem Don’t Forget About Your Mental Health Rule of three — Thinking Insights Rule of three (writing) — Wikipedia

0 views

Jensen Huang and Andy Grove, Groq LPUs and Vera CPUs, Hotel California

GTC 2026 marked an important inflection point for Nvidia, as the company is selling multiple architectures, instead of focusing on just one GPU. The motivation is serve all needs and keep all customers.

0 views

Exploiting brain flaws

In my “ closing thoughts ” post about the phone usage experiment, I mentioned I had deeper thoughts I wanted to share. Here I am, sharing those thoughts. I ran various month-long life experiments over the years, many of which I chronicled here on this blog. For that reason, the outcome of this recent phone experiment wasn’t a surprise: if I make the conscious decision to pay attention to some specific aspect of my life, there’s a high likelihood I’ll manage to enact significant changes in that specific area. Or so I thought. You see, I am a flawed human being, like many—most?—of the people out there. If I were in therapy, there would be a plethora of issues I’d be discussing with my therapist, but in therapy I am not, and so I thought it would be fun—for me at least, not sure about you—to tackle one of them here, since it’s strictly related to this recent phone experiment. «Wait a second, if that’s the case, then why aren’t you in therapy, Manu?» Good question, I’m glad you asked. There are two main reasons. The first, and less important reason, is that I am a stubborn motherfucker, and the idea of asking someone else to help me fix my inner issues is something that doesn’t sit right with me. The second, and more important reason, is that I have a fundamental distrust of psychologists. Not in psychology as a field, I have no issues with that. I even considered going into psychology back when I was about to finish high school and was thinking about possible career paths. I also read plenty of psychology books, and the book that had more impact on me growing up was a psychology book written by a psychologist. The issue I have with psychologists is that all the ones I had the pleasure to meet in person were deeply flawed and fucked up individuals, and that left an impression on me. Now I carry this fundamental (and partly irrational) distrust in them, which is a bit problematic since it’s hard to go to therapy when you don’t trust the person on the other side. Maybe this will change at some point in the future, who knows. I'm open to that possibility. Anyway, to get back on track, the issue I wanted to discuss is related to disappointment. Specifically, my issue with the concept of disappointing others. This is something I had to deal with since I was a kid, and I’m not sure why that is. I don’t know if it was triggered by something specific that happened or if it’s just part of my character, but disappointing others and especially the thought of seeing them disappointed because of something I did or didn’t do, is something I have always struggled with. To this day, I still do. The reason why I think this is all related to my weird life experiments is that those experiments usually follow a pattern: I experiment with something, I blog about it, I get to enjoy the benefits of some positive change, the experiment ends, I stop blogging about it, and slowly but surely the old habits manage to creep back in. It happens every time, like clockwork. But this time around, I realised that the reason why it happens is that I, fundamentally, do not give much of a fuck about myself. That itself is a topic for another time, but in the context of this discussion, the thing that matters is that as long as I’m blogging and I’m sharing my experience, the irrational pressure of disappointing someone keeps me on track. At a rational level, I know that no one gives a fuck if I fail at these silly experiments, and yet, for some reason, that extra pressure is what keeps me in check. Now, is this a healthy way to exist in this world? Probably not. Do I care? Definitely not. But, having realised this, I’m now wondering how I can exploit this to my advantage. Because there are things I’d love to change in my life, and I’m starting to think leveraging the disappointment-lever to my advantage could be the way to go. My phone usage, for example, is still under control, and that’s because I know I’m gonna keep sharing those numbers. Not weekly, because that’s boring, but probably every couple of months. And this fact alone, the decision of doing this, is apparently enough to keep my brain on track. Brains are weird, what can I say? I’m still figuring out which changes I want to put in place in my life. The tricky part is that they need to be trackable and shareable somehow; otherwise, this will not work, but I’m sure I’ll manage to come up with a solution. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
(think) Yesterday

One Year with the HHKB: A Mini Review

The keyboard is the most important tool for a programmer. Choose wisely. I’m a keyboard nerd. I’ve owned several great keyboards over the years, starting with the legendary Das Keyboard 3 Ultimate (blank keys and Cherry MX Blue switches – my co-workers loved me), then moving through the Das Keyboard 4 , the excellent KUL ES-87 , and eventually landing on what I considered my dream keyboard: the Leopold FC660C . The Leopold FC660C – my daily driver for almost a decade. The Leopold was a revelation. It’s where I discovered Topre switches – that glorious electrostatic capacitive feel that’s somewhere between membrane and mechanical, yet somehow better than both. After years of clacking away on Cherry MX Blues (much to the dismay of my wife and everyone within a 10-meter radius), the smooth, thocky Topre experience felt like coming home. The compact 65% layout was the cherry on top – small enough to save desk space, but with dedicated arrow keys and a few essential extras. I used the Leopold daily for almost a decade, from 2016 all the way to early 2025. That’s quite a run. So why change something that was working so well? Two words: wireless Topre . I wanted to cut the cord, and if you want a wireless keyboard with Topre switches your options are… well, pretty much just the HHKB (Happy Hacking Keyboard) Hybrid Type S . The HHKB Hybrid Type S – the object of today’s review. Not to mention I’d been exposed to the HHKB hype for as long as I can remember. The keyboard has an almost cult-like following among programmers, especially in the Unix and Lisp communities. I’m honestly not sure why I went for the Leopold instead of the HHKB back in 2016 – the HHKB was definitely on my radar even then – but in hindsight the Leopold served me incredibly well. When I finally pulled the trigger on the HHKB Hybrid Type S in early 2025, I had sky-high expectations. I got it in early 2025, so now I’ve had it for over a year. I deliberately avoided writing about it earlier – I think it’s important to live with a piece of hardware for a good while before passing judgment, especially when there’s an adjustment period involved. So let’s dig in. Looks. The HHKB is a handsome keyboard. The minimalist design, the clean lines, the elegant keycap legends – it’s a looker. I’d say it edges out the Leopold slightly in the aesthetics department, though the battery housing bump on the back is a bit of an eyesore. A minor quibble, though. Weight. It’s impressively light and portable. Some people complain this makes it feel “cheap” since the body is essentially all plastic, but I appreciate being able to toss it in a bag without thinking twice. 1 Keycaps and switches. The Topre experience is excellent, as expected. The keycaps are high quality PBT and the switches feel more or less identical to what I had on the Leopold. If you already know you love Topre, you’ll love the HHKB’s typing feel. The keys on the HHKB Type S are a bit quieter and lighter to press than those of the Leopold, but the difference is not big. Control key placement. This is probably the one aspect of the HHKB’s unconventional layout that I actually love. Control sits right where Caps Lock is on a standard keyboard – exactly where it belongs. On every other keyboard I’ve ever owned, the first thing I’d do is remap Caps Lock to Control anyway, so it’s nice to have a keyboard that gets this right out of the box. Wireless. Being able to pair with multiple devices via Bluetooth and switch between them is genuinely nice. No more cable clutter on the desk. That said, the wireless implementation comes with some significant caveats – more on that below. The layout. For a keyboard that markets itself as a “programmer’s keyboard,” some of the layout decisions are baffling. The tilde/backtick key is in a terrible position (top right corner, miles away from where your fingers expect it). For someone who lives in the terminal, that’s a real problem. I remapped it to the Escape key position almost immediately, since I don’t particularly care where Escape lives – I use a dual-mapping on Caps Lock (Control when held, Escape when tapped via Karabiner Elements ). The backslash placement is also awkward, and the Alt/Option keys are unnecessarily tiny even though there’s plenty of space to make them bigger. There’s no right Control key despite ample room for one (I compensate with a similar hold/tap mapping on the Return key). And the lack of dedicated arrow keys – while manageable when programming – is genuinely annoying in applications that make heavy use of them (browsers, document editors, Slack, etc.). I’ve mostly gotten used to using Fn+key combos for arrows, but I still miss the Leopold’s dedicated arrow keys on a regular basis. The firmware. For such an expensive and supposedly premium product, the firmware feels primitive. You get basic key remapping and a few DIP switches, but it’s nothing compared to the power and flexibility of QMK or VIA that you’ll find on many keyboards at half the price. HHKB recently released firmware 2.0 with some interesting updates, but I haven’t had a chance to try it yet. In the meantime, Karabiner Elements does the heavy lifting for me – but I shouldn’t have to rely on third-party software to make a $300+ keyboard work the way I want. Battery life. It’s mediocre at best, and the HHKB uses disposable AA batteries rather than a built-in rechargeable battery. In 2025. For a premium wireless keyboard. I’ll let that sink in. The sleep/wake behavior. This is my single biggest complaint and the thing that still drives me up the wall a year later. To save battery, the keyboard goes to sleep after 30 minutes of inactivity – that’s perfectly reasonable. What’s not reasonable is that pressing a key doesn’t wake it up. You have to press the power button to bring it back to life. Every. Single. Time. I still don’t understand why it can’t auto-wake like virtually every other wireless keyboard on the market. You come back from a coffee break, start typing, and… nothing. Then you remember, reach for the power button, wait a second for it to reconnect, and then you can start typing. It’s a small thing, but it’s also extremely annoying. The Das Keyboard 4 – where it all started for me (well, almost). Big, loud, and proud. So, is the HHKB the real deal, or is it mostly hype? After a year of daily use, I’d say it’s… a bit of both. It’s a good keyboard – the typing feel is fantastic (because Topre), it looks great on a desk, and the wireless capability is genuinely useful despite its rough edges. But I wouldn’t say it offers much over other Topre keyboards. The layout quirks, the primitive firmware, the battery situation, and that maddening sleep/wake behavior all hold it back from being the definitive keyboard it’s often made out to be. The fundamental problem is that there are so few Topre keyboards on the market that our options are extremely limited. For me it came down to either the HHKB or the Realforce R3 TKL . The Realforce has a better, more conventional layout for sure, but I didn’t love the aesthetics – it felt too big for what it offered, and visually it didn’t do much for me. 2 Despite its shortcomings, the HHKB has grown on me. I don’t think my typing experience has actually improved compared to the Leopold, but my desk certainly looks a bit nicer and I always smile when I look down at it. Sometimes that’s enough. I hope this won’t be the last Topre keyboard from HHKB, and that down the road they’ll release a version that addresses some of my frustrations. But I won’t be upset if I end up typing on my current HHKB for a very long time. If you have an HHKB or any other Topre keyboard, I’d love to hear about your experience in the comments. What do you love? What drives you crazy? Have you found clever workarounds for the layout quirks? And if you’re still on Cherry MX Blues… well, your co-workers would like a word with you. That’s all I have for you today. Keep typing! Not that I carry it around much.  ↩︎ If they tweak it a bit in the future I’ll definitely get one, though.  ↩︎ Not that I carry it around much.  ↩︎ If they tweak it a bit in the future I’ll definitely get one, though.  ↩︎

0 views
Kaushik Gopal Yesterday

Here’s my list of reasons for using Opencode

Here’s my list of reasons for using Opencode . I’m often experimenting with the bleeding edge models as they come out. I actively switch between models for tasks and I use them all enough where I can tell the difference. Opencode lets me switch between models mid-task or mid-conversation. Fluidly. I wrote about this and agentic fluidity in more detail but tldr: Opencode has the client/server architecture baked in. So I can just start an opencode server on one machine, expose it through and start using it on my phone or other machines. I talked about this on my podcast in some detail but Opencode has the best implementation of subagents and modes. You can switch to a subagent definition as your primary mode, then operate other subagents from there. It makes orchestrator-type tasks super easy. I love that OpenCode is opinionated about their UX. They don’t try to be Claude Code or Codex. In the process they have some really nice UX patterns like a sidebar with ongoing file changes, context/cost, MCPs connected etc. It’s the first time I’ve not needed to worry about a custom statusline.sh or building one. The plugin ecosystem is highly customizable. To the point where you can add new features, integrate with external services or even modify OpenCode’s default behavior. The wonderful Jesse Vincent mentioned this to me when I was stupidly contemplating a fork. It’s not all rainbows and sunshine. Anomaly — the team behind OpenCode — is small . Which sometimes shows, because there’s definitely bug s and some features missing . But I will say… none that’s deterred me from using it for the last two months, exclusively . Go give it a shot . Many of the serious AI coders I know are really liking it and switching.

0 views

GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52

OpenAI today: Introducing GPT‑5.4 mini and nano . These models join GPT-5.4 which was released two weeks ago . OpenAI's self-reported benchmarks show the new 5.4-nano out-performing their previous GPT-5 mini model when run at maximum reasoning effort. The new mini is also 2x faster than the previous mini. Here's how the pricing looks - all prices are per million tokens. is notably even cheaper than Google's Gemini 3.1 Flash-Lite: I used GPT-5.4 nano to generate a description of this photo I took at the John M. Mossman Lock Collection : Here's the output: The image shows the interior of a museum gallery with a long display wall. White-painted brick walls are covered with many framed portraits arranged in neat rows. Below the portraits, there are multiple glass display cases with dark wooden frames and glass tops/fronts, containing various old historical objects and equipment. The room has a polished wooden floor, hanging ceiling light fixtures/cords, and a few visible pipes near the top of the wall. In the foreground, glass cases run along the length of the room, reflecting items from other sections of the gallery. That took 2,751 input tokens and 112 output tokens, at a cost of 0.069 cents (less than a tenth of a cent). That means describing every single photo in my 76,000 photo collection would cost around $52.44. I released llm 0.29 with support for the new models. Then I had OpenAI Codex loop through all five reasoning effort levels and all three models and produce this combined SVG grid of pelicans riding bicycles ( generation transcripts here ). I do like the gpt-5.4 xhigh one the best, it has a good bicycle (with nice spokes) and the pelican has a fish in its beak! You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views

Why Are We Still Doing This?

Hi! If you like this piece and want to support my work, please subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5000 to 185,000 words, including vast, extremely detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large .   I just put out a massive Hater’s Guide To The SaaSpocalypse — an urgent and in-depth analysis of the end of the hypergrowth era of software — and my Hater’s Guides To Private Equity , Anthropic , Oracle and Microsoft are huge (12k+ word) research projects priced lower than the cost of a cup of coffee, which is partly an inflation issue on the part of the coffee shop, but what I’m getting at is this is a ton of value. Where’s Your Ed At Premium is incredibly useful, read by hedge funds, private equity firms, Fortune 500 CEOs, a large chunk of the business and tech media, and quite a few CEOs of major tech firms. I am regularly several steps ahead in my coverage, and you get an absolute ton of value, several books’ worth of content a year. Subscribe today and support my work, I deeply appreciate it. Hey everyone! I know everybody is super excited about the supposed power of AI, but I think it’s time we set some fair ground rules going forward so we stop acting so crazy.   Let’s start with a simple one: AI boosters are no longer allowed to explain what’s good about AI using the future tense. You can no longer say “it will,” “could,” “might,” “likely,” “possible,” “estimated,” “promise,” or any other term that reviews today’s capabilities in the language of the future.  I am constantly asked to explain my opinions (not that anybody who disagrees with me actually reads them ) in the terms of the present, I am constantly harangued for proof of what I believe, and every time I hand it over there’s some sort of ham-fisted response of “it’s getting better” and “it will get even more better from here!’ That’s no longer permissible! I am no longer accepting any arguments that tell me something will happen, or that “things are trending” in a certain way. For an industry so thoroughly steeped in cold, hard rationality, AI boosters are so quick to jump to flights of fancy — to speak of the mythical “AGI” and the supposed moment when everything gets cheaper and also powerful enough to be reliable or effective.  I hear all this crap about AI changing everything, but where’s the proof?  Wow. Anthropic managed to turn $30 billion dollars into $5 billion dollars and start one of the single most annoying debates in internet history. No, really, its CFO Krishna Rao stated on March 9, 2026 in a legal filing that it had made “exceeding” $5 billion in revenue and spent “over” $10 billion on inference and training. None of these numbers line up with previous statements about annualized revenue, by the way — I went into this last week — and no amount of contorting around the meaning of “exceeding” takes away from the fact that adding up all the annualized revenues is over $6 billion, which I believe means that Anthropic defines “annualized” in a new and innovative way. In any case, Anthropic turned $30 billion into $5 billion. That’s…bad. That’s just bad business. And I hear no compelling argument as to how this might improve, other than “these companies need more compute, and then something will happen.” In fact, let’s talk about that for a second. At the end of January, OpenAI CFO Sarah Firar said that “our ability to serve customers—as measured by revenue—directly tracks available compute,” messily suggesting that the more compute you have the more revenue you have.     This is, of course, a big bucket of bollocks. Did OpenAI scale its compute dramatically between hitting $20 billion in annualized revenue (to be clear, I have deep suspicions about these numbers and how OpenAI measures “annualized” revenue) in January 2026 and $25 billion in March 2026 ? I think that’s highly unlikely.  I also have to ask — where are the limited parties, exactly? If revenue scales with revenue, wouldn’t that mean that each increase in compute availability would be allowing somebody to pay OpenAI or Anthropic that couldn’t do so before? I don’t see any reports of customers who can’t pay either company due to a lack of available compute. Are there training runs that can’t be done right now? That doesn’t really make sense either, because training doesn’t automatically lead to more revenue, other than in releasing a new model, I guess?  It’s almost as if every talking point in the generative AI industry is the executives in question saying stuff in the hopes that people will just blindly repeat it! But really folks, we’ve gotta start asking: where’s the money?   Anthropic made $5 billion in its entire existence in revenue and spent $10 billion just on compute . OpenAI claims it made $13.1 billion in revenue in 2025 and “only” lost $8 billion — but those numbers seem unlikely considering my report from November of last year that had OpenAI at $4.3 billion in revenue on $8.67 billion of inference costs through September 2025 , and this is accrual accounting, which means these are from the quarters in question. How likely do you think it is that OpenAI booked $8.8 billion in a quarter (Q4 CY2025) and only lost $8 billion in the year after it lost $12 billion ( per the Wall Street Journal ) in the previous quarter?  Look, I get it! This isn’t a situation where thinking critically is rewarded. Even articles explicitly criticizing the economics of these companies are still filled with weasel wording about “expects to grow” and “anticipates hitting,” or the dreaded phrase “if their bet pays off.” Saying obvious stuff like “every AI company is unprofitable” or “there is no path to profitability” or “nobody is talking about AI revenues” is considered unfair or cynical or contrarian , even though these are very reasonable and logical statements grounded in reality. “But Ed! What about Uber!”  What about Uber? Uber is a completely different business to Anthropic and OpenAI or any other AI company. It lost about $30 billion in the last decade or so, and turned a weird kind of profitable through a combination of cutting multiple markets and business lines (EG: autonomous cars), all while gouging customers and paying drivers less .  The economics are also completely different. Uber does not pay for its drivers’ gas, nor their cars, nor does it own any vehicles. Its PP&E has been between $1.5 billion and $2.1 billion since it was founded . Uber’s revenue does not increase with acquisitions of PP&E, nor does its business become significantly more expensive based on how far a driver drives, how many passengers they might have in a day, or how many meals they might deliver. Uber is, effectively, a digital marketplace for getting stuff or people moved from one place to another, and its losses are attributed to the constant need to market itself to customers for fear that other rideshare (Lyft) or delivery companies (DoorDash, Seamless) might take its cash. Also: Uber’s primary business model was on a ride-by-ride basis, not a monthly subscription. Users may have been paying less , but they were still thinking about each transaction with Uber in terms that made sense when prices were raised ( though it briefly tried an unlimited ride pass option in 2016 ) .  Charging on a ride-by-ride basis was the smartest move that Uber made, as it meant that when prices went up, users didn’t have to change their habits.  AI companies make money either through selling subscriptions (or some sort of token-based access to a model) or by renting their models out via their APIs. One of their biggest mistakes was offering any kind of monthly subscription to their services, because the compute cost of a user is almost impossible to reconcile with any amount they’d pay a month, as the exponential complexity of a task is impossible to predict, both based on user habits and the unreliability of an AI model in how it might try and produce an output.  Let’s give an example. Somebody spending $20 a month on a Claude subscription can spend as much as $163 in compute .  There are two reasons this might be happening: In both cases, Anthropic (and OpenAI, for that matter) is screwed. If we assume Anthropic’s gross margin is 38% ( per The Information , though to be clear I no longer trust any leak from Anthropic, also no, Dario did not say Anthropic had 50% gross margins, it was a hypothetical ), that would mean that $163 of compute costs it $101. Now, not every user is spending that much, but I imagine a lot of users are considering the aggressive ( and deceptive ) media campaign around Claude Code means that a great many are, at the very least, testing the limits of the product. Those on the Max $100 and $200-a-month plans are specifically paying for fewer rate limits, meaning that they are explicitly paying to burn more tokens. The obvious argument that you could make is that Anthropic could simply increase the price of the subscription product, but I need to be clear that for any of this to make sense, it would have to do so by at least 300%, and even then that might not do the job. This would immediately price out most consumers — an $80-a-month subscription would immediately price out just about every consumer, and turn this from a “kind of like the cost of Netflix” purchase into something that has to have obvious, defined results. A $400-a-month or $800-a-month subscription would make a Claude or ChatGPT Pro subscription the size of a car payment. For a company with 100 engineers, a subscription to Claude Max 5x would run at around $480,000 a year. And this is assuming that rate limits stay the same, which I doubt they would.  In any case, there is no future for any AI company that uses a subscription-based approach, at least not one where they don’t directly pass on the cost of compute. This is a huge problem for both Anthropic and OpenAI, as their scurrilous growth-lust means that they’ve done everything they can to get customers used to paying a single monthly cost that directly obfuscates the cost of doing business.  I need to be very direct about what this means, because it’s very important and rarely if ever discussed. A user of ChatGPT or Claude Code is only thinking of “tokens” or “compute” in the most indirect sense — a vague awareness of the model using something to do something else, totally unmoored from the customer’s use of the product. All they see is the monthly subscription cost ($20, $100, or $200-a-month) and rate limits that vaguely say you have X% of your five-hour allowance left. Users are not educated in (nor are they thinking about) their “token burn” or burden on the company, because software has basically never made them do so in the past.  This means it will be very, very difficult to increase subscription costs on users, and near-impossible to convince them to pay the cost of the API. It’s like if Uber, which had charged $20-a-month for unlimited rides, suddenly started charging users their drivers’ gas costs, and gas was at around $250 a gallon.  That might not even do the price disparity justice. This theoretical example still involves users being in the back of a car and being driven a distance, and that said driving costs gas. Token burn is an obtuse, irregular process involving a per-million input and output tokens, with the latter increasing when you use reasoning models, which use output tokens to break down how it might handle a task.  The majority of AI users do not think in these terms, and even technical users that do so have likely been using a monthly subscription which doesn’t make them think about the costs. Think about it — you log onto Claude Code every day and do all your work on it, sometimes bumping into rate limits, then coming back five hours (or however long) later and doing the same thing. Perhaps you’re thinking that a particular task might burn more tokens, or that you should use a model like Claude Sonnet over Claude Opus so that you don’t hit your limits earlier, but you do not, in most cases, even if you know the costs of a model, think about them in a way that’s useful. Let’s say that Anthropic and OpenAI immediately decide to switch everybody to the API. How would anybody actually budget? Is somebody that pays $200 a month for Claude Max going to be comfortable paying $1000 or $1500 or $2500 a month in costs, and have, at that point, really no firm understanding of the cost of a particular action?   First, there’s no way to anticipate how many tokens a prompt will actually burn, which makes any kind of budgeting a non-starter. It’s like going to the supermarket and committing to buy a gallon of milk, not knowing if it’ll cost you $5 or $50.  But also, suppose a prompt doesn’t quite return the result you need, and thus, you’re forced to run it again — perhaps with slightly altered phrasing, or with more exposition to ensure the model has every detail you need. And again, you have no idea how many tokens the model will burn. How does a person budget for that kind of thing?  This is a problem both based on user habits and the unreliability of Large Language Models — such as spending several minutes “thinking” when they get stuck in loops trying to evaluate code or come up with a way to execute a task . User habits are also antithetical to switching from a paid subscription to metered access to models. A user might forgive Claude for chasing its own tail for several minutes when not burdened by the cost of it doing so, but if that act cost $2 or $3 or $10, they may hesitate to use the model at all.  I’ll give you another example. You, a relative novice, decide to use Claude Code to build a dinky little personal website. During the process, Claude Code gets lost, messes up a few little things, taking a few minutes in aggregate, and you calmly tell it to fix things and do what you’d like, and after a little back-and-forth you get something you’re happy with. As you try and upload it to Amazon Web Services, you get stuck, and spend ten minutes getting it to explain how you get the website online. At $20 a month, you might find this process delightful , empowering even. You just coded a website (even if it was a clone of one of thousands of different online templates), and you did so using natural language. Wow! What a magical world we live in. You realize as you look at the website that you forgot to add a section. Doing so takes another half an hour. You bump into your rate limits, take a break for five hours, then come back and finish it at the end of the day. The model has told you the entire time that you’re a genius for making this, and the website rocks , and that you built it , even though you didn’t. If you were paying via the API, this excursion could’ve cost you anywhere from $5 to $15. Every single little back-and-forth begins to add up. Every little change. Every little addition. Every attempt that Claude makes to fix something but makes it worse. Every “I don’t get it” you feed it about AWS.  It’s difficult to actually say what it was that made it expensive or not, and doing so adds a level of cognitive burden on top of the constant vigilance you need to make sure the model doesn’t do something unproductive. Even explicit, direct and well-manicured prompts can lead these models on expensive little expeditions.  Token burn isn’t something that neatly maps to another way that we pay for things outside of cloud storage, and even then, there are very few services that rival the chaotic costs of Large Language Models. Even if people can conceptualize that there are inputs and outputs , the latter of which costs more money, mapping a task to a reliable amount of tokens is actually pretty difficult.  Even if these companies were profitable on inference (I do not believe they are), they are dramatically, horrendously un profitable on subscriptions, and there isn’t a chance in Hell that the majority of those subscriptions convert into token-based API users. When Uber — a completely different business, to be clear — jacked up prices, it did so gradually, and also didn’t ask users to dramatically shift how they think about using the app.  Anthropic and OpenAI have no clean way to jack up prices or cut costs. They can increase subscription fees, but doing so would lead to users paying two to five times what they’re paying today, which would undoubtedly lead to massive churn.  They could also reduce rate limits with the intention of pushing people toward the API, but as I’ve discussed, subscription-based customers are neither educated nor prepared to pay a confusing, metered service that directly counters habits driven by an abundance of token burn. Users are not taught to be considerate of their burn or mindful of their costs when using a subscription-based LLM. The other problem is that these companies don’t really appear to have a way to cut costs, because inference remains very expensive and training costs are never going away : I hear a lot of wank about “ASICs” and “TPUs” that will magically bring down costs. When? How? Oh, NVIDIA’s latest chip is 10x more efficient or some bullshit? Show me the fucking evidence ! Because every time the revenues and costs get reported the revenues seem lower and the costs seem higher.   And it’s completely fucking insane that we don’t have an answer beyond “things will get cheaper” or “prices will go up.” Despite everybody talking about it endlessly for three god damn years, LLMs lack the kind of obvious, replicable, industrially-necessary outcomes that make a 3x, 4x or 10x price increase tenable.  I also think that Anthropic and OpenAI have deliberately used their subscriptions as a means of conning the media into conceptualizing AI as far more affordable than it actually is. Most users do not have any real idea of how much it costs to use these services, let alone how much it costs to run them. All of that glowing, effusive press around Claude Code was based on outcomes that were both subsidized and obfuscated by Anthropic. I think that these articles would’ve been much less positive if the reporters were even aware of the actual costs. So, let’s do some maths shall we? Assume a business has 100 engineers, and currently pays $200 a month for each engineer to use Claude Max, at a cost of $20,000 a month, or $240,000 a year. Let’s assume on average you pay your engineers $125,000, meaning that your salaries are $12.5 million a year, not considering other costs (this is a toy example). Now imagine that Claude switches to a metered billing system.  Let’s assume that, in actuality, these engineers are burning a mere $10 a day in tokens, which brings costs to $365,000 a year, or an increase of $125,000… and remember, this is a team of engineers that were previously used to a subscription that allowed them to spend upwards of $2700 a month in tokens, or nearly 10 times the $300 a month they’re now spending. Let’s be a little more realistic, and bump that number up to $25. Now you’re spending $912,500 a year in tokens. $30 a day puts you over a million bucks. Oops, busy month, you’re now spending $40 a day. Now you’re spending more than 10% of your salaries on compute costs.  Anthropic’s own Claude Code documentation says that the average cost is $6 per-developer-per-day, with “daily costs remaining below $12 for 90% of users.” Good news! If you, as an engineer, can limit your usage to $6 a day, you’re actually saving the company money!  But you’re not spending $6 a day. That’s a silly number for anybody coding. One user on Reddit said that they spent $200-to-300 a day on API costs and decided instead to spend $40 to $50 a day on a GPU cluster on Lambda to use the open source model Qwen 3.5 to handle their code, which still works out at $14,600 in API costs.   Another user found that their parallel Claude Code sessions using Claude’s $200-a-month plan (I assume using multiple accounts) worked out to around $12,000 a month in API costs. Another that hit their limits on their Max subscription “only needed another hour or two to finish a project,” and that hour or two resulted in almost $600 in API costs . Even the boosters are beginning to worry.  Last week, Chamath Palihapitiya made a shockingly reasonable point :  When ROI indeed, Chamath. The fact that one of the most-prominent voices (for better or worse) in the tech industry is unable to get a straight answer to “where is the return on investment” — somebody directly incentivized to keep the party going — should have everybody a little worried. Really though, where is the ROI? Who is actually getting a profit out of this? NVIDIA? The companies that make RAM? Because it doesn’t seem to be the companies who are buying the GPUs. It doesn’t seem to be the AI companies. I don’t think it’s true, but if you believe it, you believe code is truly being automated away — to what end? What are the actual documented economic effects we can point at and what are the actual meaningful changes to the world?  Real data. Something from today, please. You are legally banned from saying the words “soon” or “in the future.” No more future-tense. It’s not allowed. All of my stuff has to be in the present — so yours should too.  Let’s do a quick-fire round: Boosters, I am begging of you — point to one thing TODAY, from TODAY’s models, that even remotely justifies burning nearly a trillion dollars and filling our internet full of slop and creating the moral distance from an action that might have blown up a school and empowering the theft of millions of people’s work and having to hear every fucking day about Sam Altman and Dario Amodei, two terrifyingly boring and annoying oafs with no culture and no whimsy in their wretched little hearts.  Even if you are impressed by what LLMs can do, remember that what you’re impressed by is the result of burning more money than anybody has ever burned on anything, including the Great Financial Crisis’ Troubled Asset Relief Plan (a little over $400 billion) and the COVID Paycheck Protection Program (somewhere between $800 billion and $900 billion). Anthropic and OpenAI have raised (assuming OpenAI gets all the money) over $200 billion in funding, on top nearly $700 billion in capex in 2026 alone across Google, Amazon, Meta, and Microsoft , on top of the $800 billion or so they’ve already spent . I haven’t even included the tens of billions spent by CoreWeave, or the $178.5 billion in US-based data center debt deals from 2025 , or the hundreds of billions of venture dollars that went to AI companies worldwide .  Yet when you look even an inch below the surface, everything seems kind of shit.  Per my Hater’s Guide To The SaaSpocalypse : Every single AI startup without exception does the same thing: turn hundreds of millions of dollars into tens of millions of dollars, or a few billion dollars into a few hundred million dollars. None of them are improving their margins. None of them have a solution.  Every single problem I’ve discussed above about the costs of running Anthropic or OpenAI apply directly to every AI startup, except they have far less venture capital backing and are subject, as Cursor was back in June 2025 , to whatever price increases Anthropic or OpenAI decide, such as adding “priority processing” that’s effectively mandatory to have consistent access to frontier models. Absolutely none of these companies have a plan. The only reason anyone is still humouring them is that the media and venture capital continue to promote the idea that — without explaining how — they will magically find a way of becoming margin positive. When? How? Those are problems for rubes who don’t know we’re living in the future! Let’s hope that venture capital can afford to fund them in perpetuity! They can’t, of course, because venture capital has had dogshit returns since 2018 , and AI startups do not have much intellectual property, as most of them are just wrappers for frontier AI labs who also don’t have any path to profitability. As I covered last week , the story is similar for public companies.  Adobe’s “AI-first” revenue ($375 million ARR) works out to about $60 million a quarter at most for a company that makes $6 billion a quarter. ServiceNow has “$600 million in annual contract value,” an extrapolation of a non-specific period’s revenue that does not actually mean $600 million for a company that makes over $10 billion a year . Salesforce’s Agentforce revenue is $800 million , or roughly $66 million a month for a company that makes over $11 billion a year. Shopify, the company that mandates you prove that AI can’t do a job before asking for resources , does not break out AI revenue. Workday, a company that makes about $2.5 billion a quarter in revenue, said it “generated over $100 million in new ACV from emerging AI products, [and that] overall ARR from these solutions was over $400 million.” $400 million ARR is $33 million a month.  To be clear, ARR is not a consistent figure, and churn happens all the time, especially for products like LLMs that have questionable outcomes and high prices. Four fucking years of this and we’re still talking about this stuff in riddles, mostly because it’s a terrible business. Then there’s the infrastructure issue. One of the more-recent (and egregious) failures of journalism is the reporting of data center deals. Before we go any further, one very important detail: when you read “active power,” that does not mean actual available compute capacity, which is called “IT load.” Per my premium data center model from a few months ago , you should take any “active power” and divide it by 1.3 to represent “PUE” — the standard for power usage effectiveness that calculates for everything that gets the power to the IT gear, and all the infrastructure that’s necessary to keep things running, like cooling systems. Anywho, Bloomberg just reported that Meta had signed a “$27 billion” compute capacity deal with “$12 billion of capacity available in 2027” with AI compute company Nebius. Based on discussions with numerous experts in AI infrastructure, it works out to about $12.5 million per megawatt of compute, meaning that “$12 billion of dedicated capacity” would be around 960MW of IT Load. And, of course, Nebius just raised $3.75bn in debt on the back of that compute deal .  This is on top of Microsoft’s $17.4 billion deal , and, of course, Meta’s $3 billion deal from last year.  One little problem: as of its February 12 2026 Letter to Shareholders , Nebius has around 170MW of active power .  How the fuck is it going to have that capacity ready, exactly?  For some context, CoreWeave — an AI compute company backed by ( and backstopped by ) NVIDIA with an entirely separate company building its capacity (Core Scientific) with backing from Blackstone and seemingly every major financier in the world — managed to go from 420MW of active power (NOT IT LOAD) in Q1 2025 to 850MW in active power in Q4 2025 , with much of that already under construction in Q1 2025.  Nebius only started building its 300MW of New Jersey-based compute in March 2025 , and based on its letter to shareholders, things aren’t going very well at all.  Then there’s Nscale, a company that raised $2 billion from NVIDIA, Lenovo and a bunch of other investors , and this week signed a “1.35GW deal” with Microsoft to fill a data center full of the latest generation of Vera Rubin GPUs.  In September 2025, NVIDIA CEO Jensen Huang said that the UK was going to be an “ AI superpower ” as he plunged hundreds of millions of dollars into Nscale as part of an “ historic commitment to the UK AI sector” between NVIDIA, OpenAI, and Microsoft.  When The Guardian visited the supposed site of Nscale’s UK-based data center in February 2026 — which is meant to be built by the end of the year — it found “...a depot stacked with pylons and scrap metal under a corrugated roof, while flatbed lorries drove in and out stacked with poles.” As part of the investigation, The Guardian found that the supposed billions of dollars in data center commitments made by Nscale and CoreWeave were never checked by the government, and that no mechanism existed to audit them. The response from both CoreWeave and Nscale was that these billions of dollars of investments would mostly be in NVIDIA GPUs, which is where we get to the “why” of these massive compute contracts. You see, when Nebius, or Nscale, or CoreWeave signs a giant deal that it doesn’t have the capacity to provide, it does so specifically to raise debt on the contract to buy NVIDIA GPUs. See the below diagram from CoreWeave’s Q1 2025 earnings presentation : If people were actually paying attention, they’d see the immediate problem: a data center takes an incredible amount of time to build, and takes longer depending on the amount of capacity necessary .  It’s a deeply cynical con. Hyperscalers like Microsoft and Meta are paying for these contracts because they don’t reflect as assets on the balance sheet, all while moving the risk onto the AI compute company — and if the AI company misses a deadline, the hyperscaler can walk away.  For example, Nebius’ deal with Microsoft from last year has a clause that says that “...fails to meet agreed delivery dates for a GPU Service and the Company cannot provide alternative capacity, Microsoft has the right to terminate that GPU Service.”  Based on discussions with people with direct knowledge of its infrastructure, Microsoft has already set up Nebius to fail, with the expectation that it would have over 50MW of IT load specifically made up of NVIDIA’s GB200 and GB300 GPUs available by the end of April, with at least another 150MW of IT load (or more) by the end of the year for a company that only has about 130MW of IT Load in its entire global infrastructure, most of which isn’t in Vineland, New Jersey . Hyperscalers are helping no-name companies with little or no history or experience in building data centers borrow billions of dollars in debt which is increasingly funded by people’s retirements and insurance funds lured in by the idea of “consistent yields” from companies that cannot afford to do business without convincing everybody to believe the illogical.  Data centers take forever to build. The “1.2GW” (so 880MW of IT load) Stargate Abilene’s first two buildings were meant to be fully energized by the middle of 2025 . Only the first two-buildings’ worth of 96,000 GPUs were “delivered” by the middle of December 2025 , and while the entire project was meant to be energized by mid-2026 , it appears that only two buildings are actually ready to go.  Every time you report on these deals should include a timeline. In the end, I bet Stargate Abilene never gets built, but if it does, I’d be shocked if it’s done before the middle of 2027, which would mean it takes about 3 years per Gigawatt of power, or about a year per 293MW of IT load.  I have read absolutely zero fucking stories about data center development that take this into account. The flippancy with which the media reports on these data centers — both in the structure of the deals and the realities of the construction ( I go into detail about this in a premium from late last year , but making data centers is hard ) — is allowing con artists to get rich and creating the conditions for yet another great financial crisis.  Pension funds and state investment boards are reading about these deals, seeing “Microsoft,” and assuming that everything will be fine, per my Hater’s Guide To Private Equity : All that the pension fund sees is an article on CNBC or Bloomberg and the name of a company like Microsoft or Meta. In turn, they (or the private credit firm managing their money) buy bonds or fund these debt deals because they see them as stable, straightforward, reliable investment yields, because the media and private credit firms are selling them as such. In reality, data center debt deals are incredibly dangerous, as each one is effectively a bet on both the existence of AI demand (so that the debt can be repaid with revenue) and the existence of the company in question as an ongoing concern. Nscale, Nebius and CoreWeave are only a few years old, and the concept of a 1GW data center is not much older. During the great financial crisis, massive amounts — billions and billions of dollars’ worth — of pension and insurance funds went into Collateralized Debt Obligations (CDOs) that were rated as AAA despite being a rat king of low-grade (and in many cases delinquent) debt.  This time around, data center debt deals are often given junk ratings — such as the B+ rating given to one of CoreWeave’s 2025 debt deals — which might make you think that there’s nothing to worry about, and that investors would naturally steer clear of these investments. The problem is that the markets have AI psychosis, and thus believe anything to do with data centers is a natural winner. Blackstone funded part of its $38 billion investment in Oracle’s data centers — you know, the ones explicitly built for OpenAI, which cannot afford to pay for the compute — using its insurance funds. Per The Information : This is the standard line from anybody in finance about data centers, and is based on little more than wish-casting and fantasy. These are brand new kinds of debt for some of the largest infrastructure projects in history, and as I’ve discussed repeatedly, outside of hyperscalers moving compute off of their balance sheets, there’s only a billion dollars of compute demand . 77% of CoreWeave’s 2025 revenue — and keep in mind that CoreWeave is the largest independent AI compute provider — was from Microsoft and NVIDIA, the latter of which plans to spend $26 billion in the next five years on renting back its GPUs…which suggests that little organic demand exists. 2026 or 2027’s great financial crisis will replace “homes” with “data centers,” and I worry it’ll be calamitous for the pensions and insurance funds that have tied their futures to AI.  Even putting aside my own personal feelings about LLMs…I’m just not sure why we’re doing this anymore.  Okay, okay, I know why we’re doing it — the software industry is out of hypergrowth ideas and has been in a years-long decline since 2018, though it briefly had a burst of excitement in 2021 when money was cheap and everybody was insane after the lockdowns ended. Nevertheless, AI has become one of the largest cons in history, bought and sold based on stuff it can’t do (but might do, one day, at a non-specific time), constantly ignoring the blatant swindles and acidic economics that are only made possible with regulators and the media and the markets piloted by people that don’t know or want to know what’s actually happening. If you are an AI fan, I need to genuinely ask you to consider whether what you’re impressed by is what the LLMs can do today rather than what they might be able to do tomorrow. If you’re excited based on the potential , you’re not excited about technology, you’re excited about marketing.  And I get it. The tech industry hasn’t had anything really exciting in a while. It’s easy to get swept away by hype, especially when everybody is being swept away in exactly the same way. It’s hard to push back when Microsoft, Google, Meta and Amazon are all participating in a financial death cult, and their revenues keep growing — having to understand anything more than the headlines is tough and you’ve got all this shit to do and it’s so much easier to just nod and agree with everybody else. But know that this is an industry that sells itself on fear and lies. Know that LLMs cannot do many of the things that people talk about — they do not blackmail people , no GPT-4 did not trick a taskrabbit , and every single time an AI CEO says AI “will” do something you should spit in their fucking face for making shit up not print it without a second’s thought. It’s time to get specific. What will AI do, and when will it do it? What will the actual software be? How will it work? How much will it cost? How will it make money? How will it become profitable?  Because right now we’re being sold a lie and I’m sick of it, almost as sick as I am of seeing critics framed as outlier factions spreading conspiracy theories. I’ve proven my point again and again and again. Where is the same effort from the AI boosters? All I see is the occasional desperate attempt to claim that LLMs doing what they’ve always done is somehow remarkable?  Oh wow, so you can code a clone of an open source software project , all set up with an LLM that may or may not get the code right. Oh, someone was able to vibe code something that may or may not work and looks exactly the same as every other vibe code project. Congratulations on making a website that’s purple for some reason — you’re puking out a facsimile of an era of websites defined by the colour scheme chosen by Tailwind CSS .  I also want to be clear that I am extremely nervous about how many people appear to be fine with not reading code. I am currently (very slowly) learning Python, and every new thing I learn reinforces my overwhelming anxiety that there is a lot of software being written today by people who don’t read the output from LLMs and, in some cases, may not have understood it if they did. While I’m not saying all or even many software engineers might do this, I am alarmed by the idea that it’s becoming more commonplace — and even more alarmed that the reaction appears to be “ah it’s fine who gives a shit, it works.”  Guess what! It doesn’t always work. Amazon Web Services had multiple recent outages caused by use of its Kiro AI coding tool, and while it insists that AI isn’t to blame , it also convened an internal meeting to discuss this specific issue, and The Financial Times reported that Amazon now requires junior and mid-level engineers to get sign off on AI-assisted changes to code. However you may feel about Amazon as a service, its engineers are likely indicative of corporate engineering on some level, which is making me wonder if we’re not going to have some real problems in software development in the next few years as a result. What does the software industry look like if nobody is actually reading their code? How many software engineers are comfortable doing this? I’m sure somebody will read this and get terribly offended, but to be clear, I’m not accusing you of copy-pasting code you can’t understand and being happy if it works unless that’s exactly what you’re doing.   To be explicit, allowing LLM to write all of your code means that you are no longer developing code, nor are you learning how to develop code, nor are you going to become a better software engineer as a result.  This isn’t even an insult or hyperbole. If you are just a person looking at code, you are only as good as the code the model makes, and as Mo Bitar recently discussed , these models are built to galvanize you, glaze you, and tell you that you’re remarkable as you barely glance at globs of overwritten code that, even if it functions, eventually grows to a whole built with no intention or purpose other than what the model generated from your prompt.  I’m sure there are software engineers using these models ethically, who read all the code, who have complete industry over it and use it like a glorified autocomplete. I’m also sure that there are some that are just asking it to do stuff, glancing at the code and shipping it. It’s impossible to measure how many of each camp there are, but hearing Spotify’s CEO say that its top developers are basically not writing code anymore makes me deeply worried, because this shit isn’t replacing software engineering at all — it’s mindlessly removing friction and putting the burden of “good” or “right” on a user that it’s intentionally gassing up. Ultimately, this entire era is a test of a person’s ability to understand and appreciate friction.  Friction can be a very good thing. When I don’t understand something, I make an effort to do so, and the moment it clicks is magical. In the last three years I’ve had to teach myself a great deal about finance, accountancy, and the greater technology industry, and there have been so many moments where I’ve walked away from the page frustrated, stewed in self-doubt that I’d never understand something. I also have the luxury of time, and sadly, many software engineers face increasingly-deranged deadlines set by bosses that don’t understand a single fucking thing, let alone what LLMs are capable of or what responsible software engineering is. The push from above to use these models because they can “write code faster than a human” is a disastrous conflation of “fast” and “good,” all because of flimsy myths peddled by venture capitalists and the media about “LLMs being able to write all code.” The problem is that LLMs can write all code, but that doesn’t mean the code is good, or that somebody can read the code and understand its intention, or that having a lot of code is a good thing both in the present and in the future of any company built using generative code. And in the end, where are the signs that this is working? Where are the vibe coded software products destabilizing incumbents? Where are the actual software engineers being replaced — not that I want this to happen, to be clear — by LLMs, outside of AI-washing stories that have got so egregious that even Sam Altman called it out ? Where is the revenue? Where are the returns? Where are the outcomes?  Why are we still doing this? Anthropic is intentionally subsidizing its subscribers’ compute in an attempt to gain market share. Anthropic is incapable of creating stable limitations on its models’ compute costs, as Large Language Models cannot be “limited” in a linear sense to “only spend” a certain amount of tokens, as it’s impossible to guarantee how many tokens a task might take.  While I must be clear that Anthropic can limit Claude subscriptions, as can OpenAI limit ChatGPT, I doubt either can do so with precision. Hyperscalers are seeing incredible revenue growth, which is coming from AI! - why aren’t they telling us their AI revenues, then? Also, every single hyperscaler has hiked prices in the last few years, with Microsoft’s latest increases including a 33% increase on cheap subscriptions for front-line workers . Fun fact! Microsoft was the only hyperscaler to ever talk about actual AI revenues, and last did so in January 2025 when it said it had reached a “ $13 billion run rate ” (so $1.03 billion). It has never done so again. We’re in the early day- shut up. Stop it. We’re nearly four years in. What’re you talking about?  The exponential growth in capabilities of AI models- I am calling Jigsaw from “Saw” if you cannot express to me in clear, certain and direct terms what it is that’s actually changed. No benchmarks, either! They had to stop using SWE-Bench because models were trained specifically to solve it . Show me something that an LLM created, all on its own, and it better be fucking great, and fast too. Oh it “sped up coders”? How? To what end? Is the code better? Did they lay people off? Block laid off 4000 people because of AI- Yes hello, Mr. Jigsaw? Yeah it’s Ed, you had me chained against a radiator the other week. No, I’m doing a lot better, I’m glad we talked things out. Anyway, I need your help with something. Everybody is saying that Block laid off 4000 people because of AI, and that proves something! All Jack Dorsey said was that “[Block is] already seeing that the intelligence tools [it’s] creating and using…are enabling a new way of working which fundamentally changes what it means to build and run a company.” I know, that doesn’t mean anything, and all Block is doing is AI-washing, which is when a company uses AI as a scapegoat to justify laying people off.  No, no, don’t handcuff anyone to a radiator, I just needed somebody to talk to. Maybe later, okay? Jokes aside, Block — like many other companies — aggressively recruited during the pandemic , with headcount growing by 2.5x between 2019 and 2025 . And now, as the market conditions are looking choppier, it seems like it’s trying to Ozempic away some of its corporate “bloat.” Saying you’re firing people because of AI is a bit less embarrassing than saying “we fucked up.”  [Software company] is still growing, so AI must be helping?- Is that actually true? Have you looked? Because if you haven’t looked, I wrote about this in the Hater’s Guide To The SaaSpocalypse . AI is not actually driving much revenue at all!

0 views
fLaMEd fury Yesterday

Damn, I Can Still Read

What’s going on, Internet? Last December I finally got off my ass and committed to reading Jared Savages books, Gangland , Gangster’s Paradise , and the recently released Underworld . These books had been on my radar since the release of Gangland, but I was waiting on an ebook version. Then I went all in on audiobooks and decided to wait until they were available in audio format. So, back to December. It was my birthday. My wife sorted me some kid-free time so I dug out my Kobo Libra, charged it up a bit, reconnected to libby, borrowed Gangland and got stuck in. After hundreds of audiobooks and not much ebook reading outside of comics I thought I was in for a bad time. Much to my amazement I found out rather quickly that I could still read books with words, not sound. I also went through a period where I’d get into bed and snuggle in with a book only to find myself asleep after maybe getting through a single page. This made finishing books an audacious task. When I did switch to audiobooks, they became almost the only way I read. Night time reading defaulted to comic books, which I enjoyed but these have taken a back seat so far this year. I’ve got three months of X-Men to catch up on. I’ve read 10 books so far this year, four audiobooks and six books on the Kobo, a big change from previous years since I started my audiobook journey. I’ve got at least three more books lined up after the one I’ve just finished. After finishing the amazing 1985 I started another audiobook that just didn’t click so I quickly abandoned it before falling into sunk cost territory. I’ve picked up a few more podcasts to listen to during the day and have been listening to more music recently. I’m not worried though, I’m sure I’ll pick up the audiobooks again, just waiting for the right ones to make their way into my orbit. The question is, will the backlog of X-Men comics continue to grow or will I be able to find some balance in my physical reading? I just need some more of that kid-free time, right? Hey, thanks for reading this post in your feed reader! Want to chat? Reply by email or add me on XMPP , or send a webmention . Check out the posts archive on the website.

0 views
ava's blog Yesterday

i got featured @ noyb!

Every now and then, Noyb (European Center for Digital Rights) highlights some of their volunteers for their GDPRhub project. Now I got my entry :) Check it out on Mastodon and LinkedIn . Fittingly, one of my translated and summarized decisions finally made it into the weekly newsletter last Thursday! Also, they give you some goodies when you reach some of the volunteer milestones they have. I received mine :) Reply via email Published 17 Mar, 2026

0 views
David Bushell Yesterday

SMTP on the edge

Disclaimer: this post includes my worst idea yet! Until now my contact form submissions were posted to a Cloudflare worker. The worker encrypted the details with PGP encryption . It then used the Amazon AWS “Simple Email Service” API to send an email to myself. PGP encryption meant that any middleman after the worker, like Amazon, could not snoop. (TLS only encrypts in transit.) The setup was okay but involved too many services. If you thought that was over-engineered, get a load of my next idea. My experiment with a self-hosted SMTP server was short-lived but I did learn to code SMTP protocol with server-side JavaScript. During that tinkering I had issue upgrading TLS on the SMTP server for receiving email. In my recent AT Protocol PDS adventure I learned that Proton Mail can generate restricted tokens for SMTP client auth. I’ve also been slowly migrating from Cloudflare to Bunny in my spare time. I was reminded that Bunny has Deno edge workers. Lightbulb moment: can I rawdog SMTP in a Bunny worker? This cuts out the AWS middleman. Neither Bunny nor Proton ever see the unencrypted data. True end-to-end encryption for my contact form! I threw together a proof-of-concept. My script opened a TCP connection to Proton using and sent the SMTP message. The connection was upgraded with to secure it. It then followed a very fragile sequence of SMTP messages to authenticate and send an email. If the unexpected happened it bailed immediately. Surprisingly this worked! I’m not sharing code because I don’t want to be responsible for any misuse. There is nothing in Bunny’s Terms of Service or Acceptable Use policy that explicitly prohibits sending email. Magic containers do block ports but edge scripting doesn’t. I asked Bunny support who replied: While Edge Scripting doesn’t expose the same explicit port limitation table as Magic Containers, it’s not intended to be used as a general-purpose SMTP client or email relay. Outbound traffic is still subject to internal network controls, abuse prevention systems, and our Acceptable Use Policy. Even if SMTP connections may technically work in some cases, sending email directly from Edge Scripts (especially at scale) can trigger automated abuse protections. We actively monitor for spam and unsolicited email patterns, and this type of usage can be restricted without a specific “port block” being publicly documented. If you need to send transactional emails from your application, we strongly recommend using a dedicated email service provider (via API) rather than direct SMTP from Edge Scripting. bunny.net support …that isn’t an outright “no” but it’s obviously a bad idea. To avoid risking an account ban I decided to use the Bunny edge worker to forward the encrypted data to a self-hosted API. That service handles the SMTP. In theory I could decrypt and log locally, but I’d prefer to let Proton Mail manage security. I’m more likely to check my email inbox than a custom GUI anyway. The OpenPGP JavaScript module is a big boy at 388 KB (minified) and 144 KB (compressed). I load this very lazily after an event on my contact form. Last year in a final attempt to save my contact form I added a Cloudflare CAPTCHA to thwart bots. I’ve removed that now because I believe there is sufficient obfuscation and “proof-of-work” to deter bad guys. Binning both Cloudflare and Amazon feels good. I deleted my entire AWS account. My new contact form seems to be working. Please let me know if you’ve tried to contact me in the last two weeks and it errored. If this setup fails, I really will remove the form forever! Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds. PGP encryption in the browser to Bunny edge worker SMTP directly to Proton

0 views
Martin Fowler Yesterday

Context Anchoring

Conversations with AI are ephemeral, decisions made early lose attention as the conversation continues, and disappear entirely with a new session. Rahul Garg explains how Context Anchoring externalizes the decision context into a living document.

0 views

TiNA: Tiered Network Buffer Architecture for Fast Networking in Chiplet-based CPUs

TiNA: Tiered Network Buffer Architecture for Fast Networking in Chiplet-based CPUs Siddharth Agarwal, Tianchen Wang, Jinghan Huang, Saksham Agarwal, and Nam Sung Kim ASPLOS'26 Here we go again , another paper in a top-tier conference on the classic CS problem: how to DMA received packets from NIC to host. It would be interesting to understand why this is such a hot topic these days. This paper deals with the case where the host CPU comprises multiple chiplets. If you get nothing else from this, I hope you will learn something about SNC mode (I had not heard of it before). Recent Intel CPUs can be placed into Sub-NUMA Clustering mode (via a BIOS setting). This causes each chiplet to appear as a separate NUMA node. It is like a single socket CPU is transformed into a 4 socket CPU. The DRAM memory space is divided into four regions (one per chiplet), and the LLC slices within a chiplet only cache data from one memory space. This can be advantageous for some applications, because it can lower average LLC and DRAM access latency (by avoiding inter-chiplet communication). The downside is that the peak LLC capacity available to a single core is reduced. Fig. 3 illustrates these tradeoffs: Source: https://dl.acm.org/doi/10.1145/3760250.3762224 SNC and DDIO Recall that DDIO is a feature of Intel CPUs that allows a NIC to write received packets directly into the LLC, which the host CPU can then read. PCIe lanes are distributed among chiplets. This means that the NIC is directly connected to one chiplet. One way to support DDIO with SNC is to allocate buffers for received packets in the memory region associated with the chiplet that the NIC is connected to. This improves LLC bandwidth (for both the NIC and CPU cores) but decreases the LLC capacity available for network packets. In practice, this means that longer bursts of network packets degrade performance more when SNC is enabled (i.e., leaky DMA is a larger problem in SNC mode). Fig. 6 has data from a microbenchmark to back this up: Source: https://dl.acm.org/doi/10.1145/3760250.3762224 TiNA The solution proposed by this paper requires a change to the NIC/driver interface. Each ring buffer of received network packets is replaced by ring buffers (where is the number of chiplets). Ring buffer is placed in the memory region associated with chiplet . The NIC knows about all of these ring buffers and dynamically decides which one to use. The NIC prefers to use the ring buffer associated with the chiplet that it is directly connected to. However, if a burst of traffic causes high utilization of the LLC capacity of that chiplet, then the NIC will fall back to using the other ring buffers. The NIC estimates LLC utilization based on two competing rates: The rate that received network packets are produced by the NIC The rate that received network packets are consumed by the host The first rate is easy for the NIC to compute as it knows how fast it is sending bytes to the host. The second rate is computed by networking software running on the host, and periodically sent to the NIC. The overall approach reminds me of CEIO . The key difference is the set of memory segments available. CEIO uses NIC-local DRAM as the fallback path. One complication of splitting a single ring buffer into multiple is ensuring that the host processes received packets in order. This paper proposes using sequence numbers associated with each packet. Most protocols already use per-packet sequence numbers. For other protocols (e.g., UDP), the NIC adds a sequence number based on the order in which packets were received. When the host reads a packet from a logical ring buffer, it examines the sequence numbers from the packets at the head of each of the physical ring buffers and chooses the packet with the lowest sequence number. Fig. 9 has benchmark results: lower latency than SNC and non-SNC across a range of microbenchmarks. Source: https://dl.acm.org/doi/10.1145/3760250.3762224 Dangling Pointers It would be nice if SNC allowed more fine-grained configuration. For example, there may be applications where ideal performance is achieved if each CPU core only has access to the L3 slice that is directly connected to it. Subscribe now Source: https://dl.acm.org/doi/10.1145/3760250.3762224 SNC and DDIO Recall that DDIO is a feature of Intel CPUs that allows a NIC to write received packets directly into the LLC, which the host CPU can then read. PCIe lanes are distributed among chiplets. This means that the NIC is directly connected to one chiplet. One way to support DDIO with SNC is to allocate buffers for received packets in the memory region associated with the chiplet that the NIC is connected to. This improves LLC bandwidth (for both the NIC and CPU cores) but decreases the LLC capacity available for network packets. In practice, this means that longer bursts of network packets degrade performance more when SNC is enabled (i.e., leaky DMA is a larger problem in SNC mode). Fig. 6 has data from a microbenchmark to back this up: Source: https://dl.acm.org/doi/10.1145/3760250.3762224 TiNA The solution proposed by this paper requires a change to the NIC/driver interface. Each ring buffer of received network packets is replaced by ring buffers (where is the number of chiplets). Ring buffer is placed in the memory region associated with chiplet . The NIC knows about all of these ring buffers and dynamically decides which one to use. The NIC prefers to use the ring buffer associated with the chiplet that it is directly connected to. However, if a burst of traffic causes high utilization of the LLC capacity of that chiplet, then the NIC will fall back to using the other ring buffers. The NIC estimates LLC utilization based on two competing rates: The rate that received network packets are produced by the NIC The rate that received network packets are consumed by the host

0 views
ava's blog Yesterday

radically accepting my flawed and dysfunctional body

I remember years ago, especially early on in the pandemic (2020/2021), I was still not diagnosed with my illnesses ( Bechterew's disease and Crohn's disease ). For a decade at least, I had dealt with a variety of symptoms, most of it around joints, my spine, and my digestive tract, and separately from those, also hormonal issues. Food was unpredictable and made me feel sick and caused me a lot of pain, and all the inflammation showed on my skin too: Around that time, gut health information was really booming online (probably still is, but I keep away from that content now and I'm less online). The idea was that by cutting out certain stuff or mostly eating this or that diet or taking these supplements would regenerate your gut health and make all the symptoms go away - the joint pain, the sluggishness, the acid reflux, the rashes, the hormone imbalances, the allergies and intolerances, and so on. If you see your body as a naturally wholesome and healthy body that is just temporarily imbalanced by some exposure and can be brought back into balance, these products and lifestyle changes are basically the magic pill. Just do this and avoid expensive pharmaceutical drugs with side effects! I'm not trying to act like that can never happen; people have successfully reversed or lessened some illnesses and issues by eating differently, working out, losing weight or limiting their exposure to something. But for me, this approach just led to disordered eating habits and holding off on effective treatment in some things for a while. The thing is, lots of people online peddling this stuff are in the business of snake oil. Buy their classes, their book, their supplements to finally be free from all these issues that doctor's can't or won't diagnose or only have evil medicines for that have side effects! Your body is good as is, it just needs a nudge in the right direction! It puts so much responsibility on you. Yes, we should limit our exposure to pesticides, PFAS etc., but you go insane in the grocery store thinking: " I can't buy this, it's not organic, can't buy this, it's wrapped in plastic, can't buy this, it's canned, can't buy this, it's high inflammatory/against FODMAP diet, can't buy this because it's too processed, can't buy this because it has so much sugar... ". Back then, every grocery store trip had me on the verge of a mental breakdown or actually breaking down. Everything felt contaminated, unsafe, or something my body couldn't tolerate. It felt impossible to " treat my body naturally " or bring it " back into balance ". Even when you do manage for a while, it significantly inhibits your ability to socialize with people because so much of it is about food: going out to eat together, attending festivities, being invited to dinner, being gifted food, traveling. A very restrictive diet can also cause deficiencies or starve you. It's also a bottomless pit: If it doesn't work for you and you don't see results, they say you need to try harder, also cut out this and that, buy this other supplement, and now consider other areas of your life too. Aggressively filter all your water, move away from any kind of busy street to limit the exhaust fume exposure, have your home checked for mold, switch out all your synthetic dyed clothes for unbleached undyed linen, switch out all your cooking utensils and pans to the "non-toxic" varieties, check if you live near some kind of coal plant or electricity lines or so, and if you are in the really weird circles, you will hear about chemtrails and Electromagnetic Hypersensitivity and all that. Yes, mold exposure and harmful substances in water are a problem, but I'm just saying: Doing all this next to everything else in life is a huge undertaking, mentally taxing, making people extremely paranoid and isolated, and bleeding them dry when it's often not even the issue . It's taking advantage of vulnerable people who either have no access to healthcare or aren't taken seriously or cannot afford the testing or medication required. It's good when one simple change can genuinely help you - for example, I know what foods not to eat to avoid triggering acid reflux. I love it for you if you figured out that eating gluten was behind it all and are now happy and healthy. But my body was never a naturally healthy and balanced one that got out of whack by some behavior or exposure, and even if it happened because of exposure in utero, or as a child, or just living in our modern environment nowadays, I can't undo or change that. My body, in its natural state, is not normal or healthy, and all that helps is proper medication. It's not temporary, this is just how my body functions. The baseline I was born with isn't the norm, and as experience showed, no amount of gut health stuff or limiting exposure or other lifestyle changes were going to change that. All that helped was finally getting properly diagnosed and receiving treatment . It was easy for me to accept treatment for the above issues because life had become unlivable with my intense flare ups and affected by daily ability to function all the time, and any possible side effect was worth the risk. I still don't regret any of it, and it works fine for me. Where I struggled to seek and accept help was for my hormone issues, as they only affected me every other month or so and were easier to ignore otherwise. As I talked about in a different post, I received hormone therapy as early as 11 years old because my periods and hormone levels were not normal and I otherwise wouldn't have developed how I am expected to as a cis woman 1 . I needed T-blockers like cypro to have the puberty my body and mind needed 2 . I stopped at 19 or 20 because I had started having issues with pain and spotting for a while and thought I could try and see whether after puberty, my situation had resolved and I'd naturally have the hormone levels I needed. It hadn't. So since then, I either took nothing, or tried reigning in my PCOS and endometriosis with things like Maca root powder. It did bring down my cycle days from 60-70 down to 30, but other issues still persisted. Lots of menstrual pain, flareups of my other issues, PMDD , and so on. When I still went to therapy years ago, my therapist suggested getting antidepressants to take just for the phase between ovulation and period, so I'd stop feeling the awful effects of PMDD. I declined, because while I had been on antidepressants previously for a while and they helped, I also knew what it was like to start and stop them, and I didn't want to constantly put my body through that; plus, the scary side effects! The same happened with hormone treatment. Even though I had spent years of my life on artificial hormones, I was scared to go back on it because I couldn't rule out that they had played a part in my depression back then (or at least amplified it). I was also scared of thrombosis, meningioma and other issues 3 . I thought it would just naturally fade away, or I could make without until menopause, or later: My treatment for my Bechterew's and Crohn's will finally bring my body into natural alignment! At first, it looked like it; I suddenly experienced cycles like a normal person. On time, barely or no pain, very light bleeding. But it went back to how it was over months, even after switching from infliximab to adalimumab. So turns out, fighting the inflammation in my body didn't do anything to normalize my hormones. I wrote something about accepting my natural menstrual cycle that retroactively is just a huge cope. There it was again, the idea that there is a natural state a body can return to and that everyone's default state is automatically healthy, now warped into the idea that I was just naturally meant to have elevated androgens and all this, and that I should just accept how it is. The idea that natural is automatically good is such an easy fallacy to fall prey to, and natural also meant unmedicated to me. I tried to find so many reasons for why being so destroyed by my cycle every time was actually somehow a good thing or had any advantages. There's no shortage of supposedly empowering and encouraging content online about this as well: People who present having a cycle as something magical and romanticizing it as living with the moon tides or living in tune with nature. Just be proud of it and feel like those TikTok witches brewing your own herbal solution and gulping it down with some pumpkin seed oil. Ugh! Recently, I just grew tired of it all. The weeks of feeling sluggish, moody, forgetful and weak; my Crohn's and Bechterew's flaring up with it every time; feeling suicidal and calling in sick due to menstrual pain. 2-3 weeks until I felt normal again derailed good routines and fitness goals all the time, and it was hard to plan around such an irregular cycle. These times could fall on important dates at work or in my degree (exam season etc.) and jeopardize my reliability and skills. If I wanted to reach the goals I had set myself and would thrive in and feel the happiest in, I needed to address this. I owe myself that. No one will ever notice your avoidable suffering and pat you on the back for enduring it when there is another way. You aren't impressing anyone with choosing "natural" over comfortable and happy. All people will see and remember are the times you seemed unhappy, uncomfortable, snappy or missed out on being even being there. In that one post about accepting my cycle, I wasn't actually accepting it. I see now that to actually accept my sick body, it also means accepting treatment where possible . Everything else is not acceptance, it's just giving up and ignoring the issue. So recently, I had my yearly checkup at the gynecologist and finally got help. I am very lucky to have a very attentive and knowledgeable gynecologist 4 , and we went through all the options with pros and cons, also in connection with my Crohn's that can affect absorption, and we settled on dienogest daily and skipping my period altogether. Independently of that, I finally accepted that my hair needs additional help as I am prone to telogen effluvium and androgenic alopecia , and if I am regrowing it now since cutting it off in October 2024 due to losing like half my hair back then, I need to do something. So I am trying out minoxidil on top of going back to scalp massages and all. I know seeking medical help can be daunting, stressful, humiliating, costly, inaccessible, and scary. I almost cancelled that appointment about four times. But I hope it motivates you to seek help for the thing you put off or gave up on. You don't need to suffer, you don't need to self-sabotage or prove it to yourself, and you weren't " meant to be like this ". If " natural remedies " or snake oil and obsessive rules don't work for you, allow yourself to accept proper help. Reply via email Published 17 Mar, 2026 This is also why I have very small hands and feet, and remained at an average size. I was expected to become 1,80m tall, now I am just 1,66m, with a EU shoe size of 36/37. I didn't change that much from that age in terms of size. ↩ Yes, they do that for cis children, so stop clutching your pearls about trans children getting the same care! ↩ This is unfortunately what happens when you work with medical data, particularly side effects and adverse events; you know way too much about some meds. ↩ She's always been great, but it felt like in the year since we last saw each other, she went extra hard in researching how my illnesses can interact with my cycle before I showed up. ↩ This is also why I have very small hands and feet, and remained at an average size. I was expected to become 1,80m tall, now I am just 1,66m, with a EU shoe size of 36/37. I didn't change that much from that age in terms of size. ↩ Yes, they do that for cis children, so stop clutching your pearls about trans children getting the same care! ↩ This is unfortunately what happens when you work with medical data, particularly side effects and adverse events; you know way too much about some meds. ↩ She's always been great, but it felt like in the year since we last saw each other, she went extra hard in researching how my illnesses can interact with my cycle before I showed up. ↩

0 views
Stratechery Yesterday

An Interview with Nvidia CEO Jensen Huang About Accelerated Computing

Listen to this post: Good morning, This week’s Stratechery Interview is running early this week, as I had the chance to speak in person with Nvidia CEO Jensen Huang at the conclusion of his GTC 2026 keynote , which took place yesterday in San Jose . I have spoken to Huang four times previously, in May 2025 , March 2023 , September 2022 , and March 2022 . In this interview we talk about a keynote that came across like a bit of a history lesson, and what that says about a company that still feels small even as it’s the most valuable in the world, as well as what has changed in AI over the last year. Then we discuss a number of announcements that might feel like a change in approach (although Huang disagrees), including Nvidia’s burgeoning CPU business and the Groq acquisition. Finally we discuss scarcity in the AI stack and how that affects Nvidia, the China question, and Huang’s frustration with doomers and their influence in Washington. As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for clarity. Jensen Huang, welcome back to Stratechery. JH: It’s great to be with you. You literally just walked off the stage, went a little long, I think, but you spent a lot of this keynote , which I quite enjoyed, explaining what Nvidia is, starting with the history of the programmable shader, the launch of CUDA 20 years ago. We don’t need to spend too much time recounting this, you did a good job, and Stratechery readers are certainly familiar — sorry, this is a bit of a lead up here — Stratechery readers are familiar , and I remember this exactly, someone asked me to explain how is it that Nvidia can announce so many things at a single GTC, this is like six, seven years ago, maybe even longer than that, and I explained the whole thing with CUDA and all the libraries is it’s just sort of doing the same thing again and again , but for specific industries. That’s the story you told today, and it’s kind of a back-to-the-future moment after the last few GTC keynotes have kind of just been pretty AI-centered, CES was pretty AI-centered . Why did you feel the need tell that story now? To recast CUDA and why is it important? JH: Well, because we’re going into a whole lot of new new industries and because AI is going to use tools, and when AI uses tools, those are tools that we created for humans. AI is going to use Excel, AI is going to use Photoshop, AI is going to use logic synthesis tools, Synopsis tools, and Cadence tools. Those tools have to be super-accelerated, they’re going to use databases they have to be super-accelerated because AI’s are fast. And so I think in this era, we need to get all of the world’s software now as fast as possible accelerated, and then put them in front of AI so that AI could agentically use them. So is that a bit where we’ve already done this for a bunch of sectors and now we’re going to do it for a bunch more? JH: Yeah, a whole bunch more. For example, data processing. Well, that was sort of a surprise. I didn’t expect you to be opening with an IBM partnership . JH: Yeah, right, that kind of puts it in perspective. I mean, they really started it all. You wrote last week that AI is a five-layer cake : power, chips, infrastructure, models, and applications. Is there a concern that in the last four or five years, that you are worried about being squeezed into the chips box, so it’s important to both remind people and also yourselves about you being this vertically integrated company — not just in terms of building systems, but into the entire software stack, you’re not just a chip company. JH: I guess my mind doesn’t start with, “What I’m not”, it starts with, “What do we need to be?”. And back then, we realized that accelerated computing was a full stack problem, you have to understand the application to accelerate it. We realized that we had to understand the application, we had to have the developer ecosystem, we needed to have excellent expertise in algorithm development, because the old algorithms that were developed for CPUs don’t work well for GPUs, so we had to rewrite, refactor algorithms so that they could be accelerated by our GPUs. If we do that, though, you get 50 times speed up, 100 times speed up, 10 times speed up, and so it’s totally worth it. I think since the very beginning, we realized, “Ok, what do we want to do, and what does it take to achieve that?”. Now, today we’re building AI factories, we’re building AI infrastructure all over the world. That’s a lot more than building chips, and building chips is obviously important, it’s the foundation of it. Right, that’s like one full stack of doing the networking and doing the storage, and now you’re into CPUs. JH: Now you’ve got to put it all together into these giant systems — a gigawatt factory is probably $50, $60 billion. Out of that $50, $60 billion, probably about, call it $15, $17 or so, is infrastructure: land, power, and shell. The rest of it is compute and networking and storage and things like that, and so that level of investment, unless you’re helping customers achieve the level of confidence that they’re going to succeed in building it, you just have no hope, nobody’s going to risk $50 billion. So I think that that’s the big idea, that we need to help customers not just build chips, but build systems and then after we build systems, not just build systems, but build AI factories. AI Factories has a lot of software inside, it’s not just our software, it’s a ton of software for cooling management and electricals and things like that, and redundancies and a lot of it is over-designed, it’s over-designed because nobody talked to each other. When you have a lot of people who don’t talk to each other, integrate systems, you have to, by definition, over-design your part of it. But if we’re working together as one team, we’ll make sure that we can push the limits and get more throughput out of the power that we have or save money for whatever throughput you want to have. Just to go back to that software bit, you mentioned Excel wasn’t designed to be used by AI. You have things like Claude has this new functionality to use Excel , so when you talk about that, you want to invest in these libraries, is that to enable models like that to do better? Or is that something for Microsoft or for enterprises — you want to use this, you don’t want to be beholden to this sort of other player in the world? JH: Well, SQL’s a good example. SQL’s used by people, and we bang on the SQL systems like anybody else, and it is the ground truth of businesses. Well, it’s not just gonna be people banging on our SQL database now, it’s gonna be a whole bunch of agents banging on it. Right, they’re gonna do it way faster. JH: They’re gonna need to do it way faster. And so the first thing we have to do is accelerate SQL, that’s kind of the simple logic of it. That makes sense. In terms of models, you noted that language models are only one category. “Some of the most transformative work is happening in protein AI, chemical AI, physical simulation, robots, and autonomous systems”, and this is from the piece you wrote last week. You’ve previously made this point while noting in other keynotes, “Everything is a token”, I think, is a phrase that you’ve used before. Do you see transformers as being the key to everything, or do we need new fundamental breakthroughs to enable these applications? JH: We need all kinds of new models. For example, transformers, its ability to do attention scales quadratically, and so how do you have quite long memory? How can you have a conversation that lasts a very long time and not have the KV cache essentially become, over time, garbage? Or have entire racks of solid-state drives that are holding KV cache . JH: And of course, let’s say that you were able to record all of our conversation, when you go back and reference some conversation, which part of the reference is most important? There needs to be some new architecture that thinks about attention properly and be able to process that very quickly. We came up with a hybrid architecture of a transformer with an SSM, and that is what enables Nemotron 3 to be super intelligent and super efficient at the same time, that’s an example. Another example is coming up with models that are geometry aware, meaning a lot of things in life, in nature, are symmetrical. And so when you’re generating these models, you don’t want it to generate what is just statistically plausible, it has to also be physically based, and so it has to come out symmetrical. And so cuEquivariance , for example, allows you to do things like that. So we have all these different technologies that are designed — or, for example, when we’re generating tokens in words, it comes out in chunks at a time, little bits, tokens at a time, when you’re generating motion, you need it to be continuous. And so there’s discrete information that you generate and understand, and there’s continuous information that you want to generate and understand. Transformers is not ideal for both. Right, that makes sense. One more quote from the piece, you write, “In the past year, AI crossed an important threshold. Models became good enough to be useful at scale. Reasoning improved. Hallucinations dropped. Grounding improved dramatically. For the first time, applications built on AI began generating real economic value”. What specifically was that change? Because I think about the timing, I feel like this upcoming year is definitely about agents, I just wrote about it today — but for last year, was that the reasoning? Was that the big breakthrough? JH: Generative, of course, was a big breakthrough, but it hallucinated a lot and so we had to ground it, and the way to ground it is reasoning, reflection, retrieval, search, so we helped it ground. Without reasoning, you couldn’t do any of that, and so reasoning allowed us to ground the generative AI. And once you ground it, then you could use that system to reason through problems and decompose it, and decompose it into things that you could actually do something about, and so the next generation was tool use. Turns out it probably tells you something that search was a service that nobody paid for, and the reason for that is getting information is very important and very useful but it’s not something you pay for. The bar to reach to get somebody to pay you for something has to be higher than just information. “Where’s a good restaurant?” — information is just, I don’t think is worthy enough to get paid for. Some people pay for it, I pay for it. We now know that we’ve now crossed that threshold. Not only is it able to converse with us and generate information for us, it can now, of course, do things for us. Coding is just a perfect example for that. If you think about it for a second, you realize this, coding is not really the same modality as language, you have to teach it empty spaces and indentations and symbols, it’s almost like a new modality and you can’t generate code just one token at a time, you have to reflect on the chunk of code. That chunk of code has to be factored properly and has to be optimal and has to obviously compile, it has to be grounded not on probable truth, it has to be grounded on execution. Right, does it run or not? JH: It has to run or not. And so I think the code, learning that modality was a big deal. Once you’re able to now do — we pay engineers several hundred thousand dollars a year to code, and so now they have a coding assistant. They could think about architecture. Instead of describe programs in code, which is very laborious, they can now describe software in specification, which is much more abstract and allows them to be much more productive. And so they describe specification, architecture, they’re able to use their time to solve and innovate, and so our software engineers 100% use coding agents now. Many of them haven’t generated a line of code in a while, but they’re super productive and super busy. Do you think there is a temptation to over-extrapolate from coding, though, precisely because it’s verifiable? You have this agent idea where they can go — it’s not just that they will generate code, then they can actually verify it, see if it works, if it doesn’t, they can go back and do it again, and this can happen all without humans because there’s a clear, “Does it work or not?”. JH: Well, because you can reflect, you could have, let’s say, design a house. Designing a house or designing a kitchen used to be the work of architects, designers, but now you could have carpenters do that. So now you elevated the capability of a carpenter, now you use an agent for that carpenter to go design a house, design a kitchen, come up with some interesting styles. The agent doesn’t have some tool to execute. However, you could give an example. You say, “these are the styles I’m looking for, I want it to be aesthetic like that”. Because the agent is able to reflect, is able to compare its quality of code, its quality of result against some reference, it could say, “You know what, it didn’t turn out as well as I hope, I’m going to go back at it again”, and so it iterates. It doesn’t have to be fully executable, in fact, the more probabilistic, the more aesthetic, the more subjective, if you will, AI actually does better. Right, well that’s why you almost have two extremes. You have generating images where there’s no right answer and then you have coding where there is a right answer and AI seems to do good on those sides and the question is how much will it collapse into the middle there. JH: We’re fairly certain it could do architecture now, we’re fairly certain it could design kitchens and living rooms. Well, to this point, one of the big things with agents coming online is, you’ve talked a lot about accelerated computing, I think you’ve trash talked as it were, maybe the CPUs to the day, they’re all gonna be removed, like everything’s gonna be accelerated. Suddenly CPUs are hot again. It turns out they’re pretty useful and important to the extent you are selling CPUs now, how’s it feel to be a CPU salesman ? JH: There’s no question that Moore’s law is over. Accelerated computing is not parallel computing. Go back in time — 30 years ago, there were probably 10, 20, 30 parallel computing companies, only one survived, Nvidia and the reason why is because we had the good wisdom of recognizing the goal wasn’t to get rid of the CPU, the goal was to accelerate the application. So what I just falsely accused you of was actually true for everybody else. JH: We were never against CPUs, we don’t want to violate Amdahl’s Law . Accelerated computing, in fact, inside our systems, we choose the best CPUs, we buy the most expensive CPUs, and the reason for that is because that CPU, if not the best and not the most performant, holds back millions of dollars of chips. When it comes to branch prediction , you worried about wasting CPU time, now you’re worried about wasting GPU time. JH: That’s right, you just never can have GPUs be squandered, GPU time be idle. And so we always use the best CPUs to the point where we went and built Grace so that we could have the highest performance single-threaded CPU and move data around a lot faster. And so accelerated computing was never against CPUs, my basis is still true that Amdahl’s Law is over, the idea that you would use general purpose computing and just keep adding transistors, that is so dead, and so I think fundamentally we’re not against CPUs. However, these agents are now able to do tool use, and the tools that they want to use are tools created for humans and they’re basically two types. There’s the stuff that we run in data centers and most of it is SQL, most of it is database related, and the other type is personal computers. We’re now going to have AIs that are able to learn unstructured tool use, the first type of tool use is structured. CLIs are tool use, APIs, they’re all structured tool use, the commands are very explicit, the arguments are explicit, the way you talk to that application is very specific. However, there’s a whole bunch of applications that were never designed to have CLIs and APIs and those tools need AIs to learn multi-modality, unstructured, and it has to go and be able to go surf a website and it has to be able to recognize buttons and pull down menus and just kind of work its way through it like we do. That tool use are going to want to use PCs and we have both sides, we have incredibly great data processing systems, and as you know, Nvidia’s PCs are the most performant in the world. So what makes an agent-focused CPU different from other CPUs? So you’re going to have a rack of just Vera CPUs. JH: Oh, really good, excellent. So the way that CPUs were designed in the last decade, they were all designed for hyperscale cloud and the way that hyperscale cloud monetizes CPUs is by the CPU core. So you want to design CPUs that have as many cores as possible that are rentable, the performance of it is kind of secondary. You’re dealing with web latency by and large. JH: That’s exactly right, exactly. And so the number of CPU instances is what you’re optimizing for. That’s why you see these CPUs with a couple of hundred, 300, 400 cores coming. Well, they’re not performant and for tool use, where you have this GPU waiting for the tool use— And you’re going over NVLink. JH: That’s right, you want the fastest single-threaded computer you can possibly get. So is it just the speed? Or does the CPU itself need to be increasingly parallel so it doesn’t have misses and things like that? Or so it’s like just all the way down the pipeline is very different? JH: Yeah, the most important thing is single-threaded performance and the I/O has to be really great. Because it’s now in the data center, the number of single-threaded instances running is going to be quite high and therefore, it’s going to bang on the I/O system, it’s going to bang on the memory controller really hard. Vera’s bandwidth-per-CPU core, bandwidth-per-CPU, is three times higher than any CPU that’s ever been designed, and so it’s designed so that it has lots and lots of I/O bandwidth and lots and lots of memory bandwidth, so that it never throttles the CPU. If the CPU gets throttled, then we’re holding back a whole bunch of GPUs. Is this Vera rack, is it still, you talked about it being very tightly linked to the GPU rack, but is it still disaggregated so that the GPUs can be serving multiple different Vera cores? Whereas you have a Vera core on a board with- Okay, got it, that makes sense. How does your Intel partnership and the NVLink thing fit into this, if at all? JH: Excellent. Some of the world is happy with Arm, some of the world still needs, particularly, you know, enterprise computing, a whole bunch of stacks that people don’t want to move and so x86 is really important to that. Has the resiliency of x86 code been surprising to you? JH: No. Nvidia’s PC is still x86, all of our workstations are x86. I did want to congratulate you, as you talked about in the keynote today, you are the token king . So in your article, you also talked about that energy is the first principle of AI infrastructure and the constraint on how much intelligence the system can produce. If that’s the case, if it’s the amount of tokens you can produce and you’re constrained by how much energy is in the data center, why do companies even try to compete with the token king? JH: It’s going to be hard because it’s not reasonable to build a chip and somehow achieve results that are fairly dramatic. Even in the case of Groq , Groq couldn’t deliver the results unless we paired it with Vera Rubin . Well tell me about this, my next question was about Groq. JH: So if you look at the entire envelope of inference, on the one hand, you want to deliver as much throughput as possible, on the other hand, you want to deliver as many smart tokens as possible — the smarter the token, the higher price you could charge. These two balance, this tension of maximizing throughput on the one hand, maximizing intelligence on the other hand, is really, really tough to work out. I do have to say, last year you had a slide talking about this Pareto Curve , and you talked about, I think it was when you introduced Dynamo, how your GPUs could cover the whole thing, and so you didn’t have to think about it, just buy an Nvidia GPU, and Dynamo will do both. But now you’re here saying, “Well, it doesn’t quite cover the whole thing”. JH: We cover the whole thing still better than any system that can do it. Where we could extend that Pareto is particularly on the extremely high token rates and extremely low latency, but it also reduces the throughput. However, because of coding agents, because they’re now AI agents that are producing really, really great economics, and because the agents are being attached to humans that are actually making extremely, I mean, they’re extremely valuable. Right, they’re even more expensive than GPUs. JH: And so I want to give my software engineers the highest token rate service, and so if Anthropic has a tier of Anthropic Claude Code that increases coding rate by a factor of 10, I would pay for it, I would absolutely pay for it. So you’re building this product for yourself? JH: I think most great products are kind of because you see a pain point and you feel the pain point and you know that that’s where the market’s going to go. We would love for our coding agents to run 10 times faster, but in order to do that, it’s just very, very difficult to do that in a high throughput system and so we decided to add the Groq low latency system to it and then we basically co-run, co-process. Right. And is this just separating decode and prefill ? JH: We’re going to do even the high processing, high FLOPS part of decode, attention part of decode. So you’re disaggregating even down to the decode level. JH: That’s right, and that requires really tight coupling and really, really close integration of software. So how are you able to do that? You say you’re shipping later this year, this deal was just announced a couple of months ago. JH: Well, we started working on disaggregated inferencing, Dynamo really put Nvidia’s ideas on the table. The day that I announced Dynamo, everybody should have internalized that, I was already thinking about, “How do we disaggregate inference across a heterogeneous infrastructure more finely?”, and Groq’s architecture is such an extreme version of ours, they had a very hard time. Dynamo was a year ago, and Groq just happened sort of over Christmas. Was there an event that sort of made you think this needed to happen? JH: Well remember, I announced Dynamo a year ago, we’ve been working on Dynamo for two years, so we’ve been thinking about disaggregated inference thing for two, three years, and we started working with Groq maybe before we announced the deal, maybe six months earlier. So we’ve been thinking about working with them about unifying Grace Blackwell and Groq fairly early on. So the interaction with them, I really like the team and we don’t want their cloud service. They had another business that they really believe in and they still believe in, they’re doing really well with it and that wasn’t a part of the business that we wanted, so we decided to acquire the team and license the technology. Then we’ll take the fundamental architecture and we’ll evolve it from here. So it was just a happy coincidence or not a happy coincidence, maybe not a happy coincidence. JH: Strategic serendipity. Because OpenAI, you know, has an instance now with Cerebras that they announced in January . JH: That was done completely independent of us and frankly, I didn’t even know about it, but it wouldn’t have changed anything. I think the Groq architecture is the one I would have chosen anyways, it’s much more sensible to us. Was this the first time where there was sort of an ASIC approach that sort of made you raise your eyebrows like, “Oh, that’s actually fundamentally different”? JH: No, Mellanox . That’s a good example. JH: Yeah, Mellanox. We took a bunch of our computing stack and we put it into the Mellanox stack. NVLink wouldn’t be possible, you know, at the scale we’re talking about without the in-network fabric computing that we did with Mellanox. Taking the software stack, disaggregating it, and putting it where it needs to be, is a specialty of Nvidia. We’re not obsessed about where computing is done, we just want to accelerate the application. Remember, Nvidia is an accelerated computing company, not a GPU company. Right. So you talk about power being the constraint. When your customers are thinking about what to buy, we could buy all sort of traditional GPUs, or we could buy these LPU racks. Is that just, they should be thinking about it in terms of you’re just confident they can drive way more revenue? JH: It really depends on the kind of products they have. Suppose you really don’t have enterprise use cases at the moment, I don’t really think that adding Groq makes much sense, and the reason for that is because most of your customers are free tier customers, and they’re moving towards paying. So it might be two-thirds free tier, one-third paid, in that case, adding Groq to it, you’re adding a lot of expense. You’re taking some power, it’s not worth it. Complexity. And you’re taking away servers, the opportunity cost. JH: What you could actually be serving the free tier, yeah. However, if you have Anthropic-like business and you have OpenAI-like business where Codex is capturing really great economics, but you just wish you could generate more tokens, this is where adding that accelerator can really boost your revenues. Are we actually constrained by power right now in 2026 or by fab capacity or what? Everyone’s saying we don’t have enough supply. What’s the actual limiting factor? JH: I think it’s probably close on everything. You couldn’t double anything, really. Because you’ll hit some other constraints. It does feel like, though, the U.S. has I think done a pretty good job of scrounging up power , maybe more than people expected a couple years ago, it feels like chips are really much more of a limiter right now . JH: Our supply chain is fairly well planned. You know, we were planning for a very, very big year, and we’re planning for a very big year next year. We saw all the soju drinking and fried chickens. JH: (laughing) Yeah, right. We’re planning, we plan for, in our supply chain, we have got, you know, a couple of hundred partners in our supply chain and we’ve got long-term partnerships with them. So I feel pretty good about that part of it. I don’t think we have twice as much power as we need, I don’t think we have twice as much chip supply as we need, I don’t think we have twice of anything as we need. But I think everything is, everything that I see in the horizon, we will be able to support from a supply chain perspective and the thing that I wish probably more than anything is that all the land, power, and shell would just get stood up faster. Is it fair to say, is there a bit where Nvidia is actually the biggest beneficiary of scarcity, though, to the extent it exists? Like, if there’s a power scarcity, you’re the most efficient chip, so you’re going to be utilizing that power better. Or if there’s fab capacity, like you just said, you’ve been out there securing the supply chain, you got it sort of sorted, are you the big winners in that regard? JH: Well, we’re the largest company in this space, and we did a good job planning. And we plan upstream of the supply chain, we plan downstream of the supply chain and so I think we’ve done a really good job preparing everyone for growth. Right, but is this a bit where, at its core, why not having access to the Chinese market maybe is a threat? Like if China ends up with plenty of power and plenty of chips, even though those chips are only 7nm, they have the capacity to build up an ecosystem to potentially rival CUDA in the long run, is that the concern that you have? JH: There’s no question we need to have American tech stack in China, and I’ve been very consistent about that since the very beginning recognizing that open source software will come. No country contributes more to open source software than China does and we also know that 50% of the world’s AI researchers come from China, and we also know that they’re really inventive. DeepSeek is not a nominal piece of technology, it’s really, really good. And Kimi is really good, and Qwen is really good and they make unique contributions to architecture, and they make unique contributions to the AI stack so I think we have to take these companies seriously. To the extent that American tech stack is what the world builds on top of, then when that technology diffuses out of China, which it will, because it’s open source, and when it comes out of China, it goes into American industries, it goes into Southeast Asia, it goes into Europe, the American tech stack will be prepared to receive them. I’ve been really consistent that this is probably the single most geopolitical strategic issue for the American tech industry. Yeah, when we talked last time , the Trump administration had banned the H20. Were you surprised you were able to get the Trump administration to see your point of view? And then were you even more surprised that now you’re stymied by the Chinese government ? JH: I’m not surprised by us being stymied by them and the reason for that is because, of course, China would like to have their tech stack develop. In the time that we’ve left that market, you know how fast the Chinese industry moves, and Huawei achieved a record year for their company’s history. This is a very long-running company, and they had a record year. They had, what, five, six IPOs of chip companies that are addressing the AI industry. I think we need to be more strategic in how we think about American leadership and American geopolitical and technology leadership. AI is not just a model, and that’s a deep misunderstanding — AI, as I said and as you mentioned in the beginning, AI is a five-layer cake and we have to win the infrastructure layer, we have to win the chips layer, we have to win the platform layer, we have to win the model layer and we have to win the application layer. Some of the things that we do are jeopardizing our ability as a country to lead in each one of those five layers. I think it’s a terrible mistake to think that the way to win is to bundle all of it top-to-bottom and tie every company together into one holistic stack so that we can only win or win at the limits of what any one of the layers can win. We’ve got to let all the layers go out and try to win the market. Have those other layers maybe benefited from their longer experience in Washington and you sort of showed up a little late to the scene? JH: Yeah, maybe. What have you learned? What’s been the biggest thing you’ve learned about Washington? JH: Well, the thing that I was surprised by is how deep the doomers were integrated into Washington D.C. and how the messages of doomers affected the psychology of the policy makers. Everyone was scared instead of optimistic. JH: That’s right, and I think it has two fundamental problems. In this Industrial Revolution, if we don’t allow the technology to diffuse across the United States and we don’t take advantage of it ourselves, what will happen to us is what happened to Europe in the last Industrial Revolution — we left them behind. And they, in a lot of ways, they invented all the technologies of the last Industrial Revolution and we just took advantage of it. I hope that we have the historic wisdom, that we have the technological understanding and not get trapped in science fiction, doomerism, these incredible stories that are being invented to scare the living daylights out of policy makers who don’t understand technology very well and they give them these science fiction embodiments that are just not helpful. One of the situations that is most concerning to me is when you poll the United States, the population, the popularity of AI is decreasing, that’s a real problem. It’s no different than the popularity of electricity, the popularity of electric motors, the popularity of gasoline engines, in the last Industrial Revolution became less popular. The popularity of the Internet, could you just imagine? Other countries took advantage of it much more quickly than we did and then technology diffused into its industries and society much more quickly and so we just have to be much, much more concerned that we don’t give this technology some kind of a mystical science fiction embodiment that’s just not helpful and scaring people. And so I don’t like it when doomers are out scaring people, I think there’s a difference between genuinely being concerned and warning people versus is creating rhetoric that scares people. I think a characteristic you see all the time is people put on their big thinking hats and try to tease out all these nuances and forget the fact that actual popular communication is done in broad strokes. You don’t get to say, “Oh, you’re a little scared of this, but not this XYZ”— you’re just communicating fear as opposed to communicating optimism. JH: Yeah, and somehow it makes them sound smarter. People love to sound smart. JH: Sometimes it’s maybe, and we now know, it helps them with their fundraising and sometimes it helps them secure regulatory capture. So there’s a lot of different reasons why they do it, and these are incredibly smart people but I would just warn them that most of these things will likely backlash and will likely come back and they’ll be probably disappointed that they did it someday. I’m gonna tie a few questions together because I know we’re a little short on time. In the self-driving car space , you’re working with multiple automakers, you have your Alpamayo model , while still supplying chips to Tesla. You had a big bit about OpenClaw today in your presentation — meanwhile, a huge thing driving the Vera chips, for example, we talk about agents, is what’s happening with say, Claude Code and happening with Codex from OpenAI. Am I right to tease out a consistent element here, and your investment in your open source models goes with that, where you’re happy to supply the leading provider, or the inventor in a space with chips, but then you’re going to fast follow what they do for everyone else that is threatened by them? So you simultaneously broaden your customer base, you’re not just dependent on the leaders, but then also the leaders are helping you sell to everyone else because they’re worried about being left behind. JH: No, nothing like that. We’re at the frontier on so many different domains. In a lot of ways, we are the leader in many of these domains, but we never turn them into products. We’re a technology stack and so we have to be at the frontier, we have to be the world leader of the technology stack, but we’re not a solutions manufacturer, we’re not a service provider. And so that’s number one. Will that always be the case? JH: Yeah, always be the case. There’s no reason to, and we’re delighted not to. And so we create all this technology, we make it available to everybody. Well, it’s funny though, if you go back to like your boards, for example, like the products you ship, more and more of that, there’s what, 30,000 specific SKUs in a rack today or something like that. More and more of those are defined by you, “This is what it’s going to be”, in part to make it easier to assemble, all those sorts of pieces. Is there a bit where that’s gonna happen on the software side too, as you talk about those vertical bits and your open source model? JH: We create a thing vertically and then we open it horizontally and so everybody could use whatever piece they would like. As long as they’re running on Nvidia chips? JH: Whatever piece they would like, they don’t have to use all Nvidia chips, they don’t have to use all Nvidia software. We have to build it vertically, we have to integrate it vertically and optimize it vertically. But afterwards, we give them source, we give them — they just figure out how they want to do it. Do you think Nvidia can actually produce and keep up in terms of having a frontier model that can win that space or be a necessary provider of that space given that folks like Meta seem to have fallen off or the alternative is, seems to be by and large Chinese models. JH: Winning that space is not important to us. Right, well important not in terms of winning, but important in terms of there needs to be an open source frontier model, so if not you, then who? JH: That’s right, that’s right, somebody has to create open source models and Nvidia has a real capability in doing so. Whenever we create these open source models, we also learn a lot about the computation. Was that a bit of a problem with Blackwell? I’ve heard mutters that the training runs were maybe a little more difficult than they were sort of previously. JH: The challenge with Blackwell was 100% NVLink 72, NVLink 72 was backbreaking work. And it was the only time that I thanked the audience for working with us. Yeah, I noticed when you said that today, it came across as very sincere. JH: Yeah, because we tortured everybody, but everybody loves it now. This is the second time we’ve had a chance to talk in person, and my takeaway when I met you previously in Taipei was the extent that Nvidia still feels like a small company. Are you worried about getting stretched too thin, or do you still think you have sort of that CUDA-esque flywheel where, “It looks like we’re doing a lot, we’re just kind of doing the same thing over and over again?”. JH: The reason why Nvidia can move so fast is because we always have a unifying theory for the company, and that’s my job, I need to come up with a unifying theory for what’s important and why things connect together and how they connect together and then create an organization, an organism that’s really, really good at delivering on that unifying theory. And so the unifying theory for Nvidia is actually fairly simple. On the one hand, we have the computing platform, the software platform that’s related to CUDA-X . On the other hand, we’re a computing systems company, we optimize things vertically, we apply extreme co-design across the stack and all the different components of a computer and now that computer is a platform of ours and we integrate that platform into all the clouds and to all the OEMs and then we have another platform that’s now the data center platform, or the AI factory platform. So once you have a unifying theory about what Nvidia builds and how it goes about doing it — and I used the keynote to kind of tell that story even partly to our own employees. That’s what it felt like. That whole first hour of the keynote felt like you talking to your employees, reminding them of what you do. JH: It’s important that we’re always constantly reminded of what’s important to us and AI is important to us, but of course CUDA-X and all of the solvers and all of the applications that we can accelerate is really important to us. Thank you very much. JH: Thank you. It’s great to see you, Ben. Keep up the good work. This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day!

0 views
Lalit Maganti 2 days ago

syntaqlite: high-fidelity devtools that SQLite deserves

Most SQL tools treat SQLite as a “flavor” of a generic SQL parser. They approximate the language, which means they break on SQLite-exclusive features like virtual tables , miss syntax like UPSERT , and ignore the 22 compile-time flags that change the syntax SQLite accepts. So I built syntaqlite : an open-source parser, formatter, validator, and LSP built directly on SQLite’s own Lemon-generated grammar. It sees SQL exactly how SQLite sees it, no matter which version of SQLite you’re using or which feature flags you compiled with. It ships as a CLI , VS Code extension , Claude Code LSP plugin , and C / Rust libraries. There’s also a web playground which you can try now: paste any SQLite SQL and see parsing, formatting, and validation live in the browser, no install needed. Full documentation is available here . Here’s syntaqlite in action: Formatting with the CLI Validation with the CLI

0 views
Michael Lynch 2 days ago

Refactoring English: Month 15

Hi, I’m Michael. I’m a software developer and founder of small, indie tech businesses. I’m currently working on a book called Refactoring English: Effective Writing for Software Developers . Every month, I publish a retrospective like this one to share how things are going with my book and my professional life overall. At the start of each month, I declare what I’d like to accomplish. Here’s how I did against those goals: Visits and orders are down, but mainly because January was such an outlier due to “The Most Popular Blogs of Hacker News in 2025.” I got another lucky bump from the HN moderators putting “My Eighth Year as a Bootstrapped Founder” on the front page. I mentioned in January that I added regional pricing for my book. I wasn’t tracking data carefully, but just based on order notifications, it seemed like most of my orders were coming from countries outside the US, so I took a closer look at the data. The first question was: is it really true that the majority of orders use regional pricing now? It’s true. The majority of Refactoring English customers are now outside of the US. The US accounts for only 28% of orders by volume and 40% by revenue. I was also surprised to see how many customers purchase from countries like India and Brazil, where English is not the primary language, so I checked English vs. non-English primary countries: Surprisingly, the majority of orders for Refactoring English come from countries where English is not the primary language, though English-speaking countries are a small majority revenue-wise. Next question: Do readers from certain countries purchase at a higher rate than others relative to total website visitors? Wow! One out of every six readers in Kazakhstan purchases the book! I need to start advertising in Kazakhstan. Okay, the extreme Kazakhstan result is based on a single customer, so that’s probably an outlier. And I bet my website analytics undercount visitors from Kazakhstan. What if I focus on the top countries based on website visitors? The US is my top country for website visitors, but a relatively low share (0.5%) purchase the book. Indian readers purchase at the highest rate, with 2.5% of website visitors purchasing the book. Canadian readers purchase the most by revenue, with every Canadian reader giving me about $0.47 in additional book sales. Clearly, I need to start pandering more to India and Canada in the book. I could change all the Docker examples to cricket examples and look for more opportunities to praise Shopify. After the US, most website visitors come from China (5.9% of total), but I’ve had zero sales in China. At first, I thought buying ebooks was not so popular in China, but I just checked what regional discount I was offering in China and was surprised to find it was zero. I wasn’t offering a regional discount in China at all. I made two mistakes in my price generation scripts that excluded a huge number of countries: The local currency thing is silly in retrospect because I can still offer a discount and just accept payment in USD. And I’m not sure how I ended up missing so many Stripe-supported countries. I even missed Kazakhstan, my new favorite country! I was only offering regional discounts in about 39 countries. After my fixes, the list grew to 156. And within 12 hours, I got a new order from Kazakhstan. With the majority of Refactoring English readers coming from countries where English is a second language, should I adjust the book to better serve non-native speakers? A few readers have asked about English tips for non-native speakers. I’d like to tackle the subject, but I have no experience writing as a non-native speaker. I want everything in the book to be techniques I personally use rather than things I’ve heard secondhand . My best idea is to find editing clients who are non-native speakers and look for patterns in their writing to include in the book. But right now, I’d like to get the v1 finished. The beauty of an ebook is that you can keep iterating on it and find ways to improve it even after official release. I’ve been using AI for software development for about a year and a half, but there have been two major inflection points: Since December, I’ve been spending more and more time doing AI-assisted coding. It’s become an ever-increasing part of my workday and non-work time. I used to have a bad habit of checking email and social media excessively. During the past month, I’ve repeatedly had the experience of noticing that it’s 4pm, but I haven’t checked email or social media. Except it’s because I’ve fallen into an AI vortex and forgot everything else. Every month, I think, “Is this a problem?” And in the past few weeks, I’ve had to face the fact that, yes, it’s a problem. I generally start each workday by writing a schedule on a little notepad on my desk. I break the day into 30-minute blocks and write down how I’ll spend that block. Historically, I stick to the schedule when I’m disciplined. When I have less will power, I let fun tasks exceed their budgets by a block or two. With AI-assisted coding, I was getting to the point where I’d make a schedule and then completely ignore it and play with AI all day. I wouldn’t say that I have an “addiction” to AI in the way people develop addictions to drugs or alcohol, but I am letting AI-assisted coding distract me from work that I recognize is more important, like finishing my book. There are a few factors that make AI especially compelling and easy for me to get sucked into: I feel like I can integrate any technology, write in any programming language, install any tool. There used to be an annoying level of friction in using any new software, but now I can mostly just hand it to AI and ask it to figure out how to install it or debug it, and it just works. In the 90s, Bill Gates published a book called Business @ the Speed of Thought . I’ve never read it, but I keep thinking back to that book title as I use AI. It’s not literally at the speed of thought, but it’s closer than anything I ever imagined. I can have an idea for a feature, give a brief explanation to an AI agent, and see the feature materialize in minutes. Even before AI, I’d often intend to spend an hour coding and instead spent three. But there were natural limits to how long I could code. A few hours of intense dev work fries my brain, and work becomes unpleasant, unproductive, or both. With AI, you can build for hours without doing any deep thought. And even when something does require thought, AI makes it easier than ever to take on tech debt. When I’m coding myself, I don’t want to do something the ugly way because then I’m the one who has to maintain that hack. But if I’m making AI do everything, I don’t feel the pain of hacky, ugly code. One of the things that makes gambling addictive is variable rewards . Our brains are more captivated by a system that gives you $10 at random intervals than one that delivers you money on a fixed, predictable schedule. Whether intentional or not, my experience with AI agents varies wildly. Sometimes, I point it at a 2,000 line log file and it diagnoses the issue before I’ve even asked a question. Other times, I give it a simple task, and it spends the next 20 minutes aimlessly roaming my codebase. Because I don’t know if the wait will be 5 seconds or 20 minutes, I sit there staring at the agent for a minute, then compulsively check it every few minutes, then start some other AI task while I’m waiting. And then I’m cycling between multiple agents and don’t even remember what they’re all doing. One of the most maddening experiences I have with AI is when I’ve set up the AI agent to complete a long task, and I come back hours later to find the AI paused its work a few minutes after I left and asked, “Okay, the next step is to try a full build, but that will take 30-60 minutes. Would you like me to continue?” Yes! That’s why I left the task to you! It’s hard to predict exactly what effect AI will have on the software industry, but I feel confident that it will completely upend the ecosystem. We’re in the early stages of a massive shake-up. Depending on how things turn out, there are paths forward for me as a software developer, but I also think there’s at least a 20% chance that we’re in the last year or two of “software developer” being a job that requires any special knowledge or skill. It could be like what happened to elevator operators . Right now, there are a few factors that make AI-assisted development especially attractive for developers in my position: The current situation with AI can’t last. The AI bubble could burst, and I’ll have to start paying the non-subsidized, metered rate. Or AI will continue to improve to the point where I have no advantage over junior engineers or even people with no software experience. I’ve found a few techniques for getting my AI usage back to a manageable place: It turns out that most of Refactoring English ’s readers come from outside the US. I’m using AI-assisted coding too much. Result : Published “Why Improve Your Writing?” and “Improve Your Grammar Incrementally” Result : Scheduled a discussion about design reviews I only included countries where Stripe supports the local currency. Even with this filter, I accidentally omitted a lot of countries where Stripe supports the local currency. In February 2025, I started using an integrated AI agent in my code editor In December 2025, I started running AI agents with full permissions (within isolated environments) AI is helpful for junior engineers, but senior engineers are the ones who can use it best There are multiple AI companies competing heavily on price and using VC money to subsidize costs. I use flat-rate plans, but I consume the equivalent of about $4k/month in API costs, and even those rates are probably VC-subsidized. Don’t start the day with an AI project If I start with AI and then work on my book, then I’m switching from an exciting, easy task to a hard, unsexy task. If I instead start the day with an hour of writing , I’ve done my hard task for the day and don’t have to move uphill. This is challenging because I often set up long AI tasks overnight, and I’m always curious in the morning to see how they turned out. Reduce parallel AI-driven projects. Parallel work feels appealing because I can cycle between agents. In practice, I find it sucks me in too much because there’s a spinning plates mentality of some agent always needing attention. Published two new book chapters Published “Eversource EV Rebate Program Exposed Massachusetts Customer Data” and complained to the MA Department of Public Utilities Don’t start the day with an AI coding project. It’s too distracting and too hard to switch to something harder but more important. Finish Refactoring English It won’t be fully polished and edited, but I want to complete all the chapters.

0 views
Anton Sten 2 days ago

Onboarding is a transaction

A design post that's not about AI. I know. Rare. I've been working with two different teams lately, both early-stage, both building something genuinely useful. And both had made the same decision before I arrived: keep onboarding as short as possible. Fewer screens. Fewer questions. Get users to the product fast. I understand the instinct. Churn during onboarding is the thing that keeps founders up at night. Every extra screen feels like a risk. So you cut, and cut, and cut until what's left is a signup flow so frictionless it almost feels rude — like meeting someone and immediately handing them a set of keys. But here's the thing they were both missing: onboarding is one of the few moments where you have a user's complete attention *and* their clear intent. They just decided they want what you're building. They're motivated. They're present. That is not a moment to rush through. ## Efficient doesn't mean fewer questions When people talk about efficient onboarding, they usually mean fewer steps. But that's not what efficient actually means. Efficient means getting as much value as possible — for both sides — while keeping the user willing and engaged. A user who flies through a four-screen signup and lands in a generic empty state is not a success story. You got them in the door, sure. But you know nothing about them, and they're already wondering what to do next. Compare that to a user who spends two more minutes during onboarding, answers a few specific questions, and arrives in an experience that already feels like it was made for them. That's efficient. Not because it was fast, but because it worked. ## The deal people are actually willing to make At Summer Health, we asked parents for their home address during onboarding. On paper, that sounds like exactly the kind of friction you'd want to cut. A home address? For a telehealth service treating your kids? That's not just friction — that's a trust test. But we didn't just ask for it — we explained why. If you give us your address, we can route prescriptions to your closest pharmacy. You give us something, we give something back. We also asked parents, early in the flow, whether they had an urgent question right now. If they said yes, we stopped onboarding entirely and connected them straight to a pediatrician. If they said no, we'd say great — and carry on. That's not a question designed to collect data. It's a signal to the user that we understand why they're here, and that we'll drop everything if they need us to. The onboarding can wait. Then we asked about medical history and allergies. Heavy questions. The kind that make people hesitate. But we were upfront about why: we're asking now so we already know when something urgent is happening. Nobody wants to answer questions about their child's penicillin allergy while they're panicking at midnight. We ask during onboarding so we never have to ask then. People don't mind sharing. The problem isn't the questions — it's when it feels like a company collecting data points rather than actually caring about the answer. The moment it feels like a form versus a conversation, people shut down. Onboarding is a transaction. You're asking for information, time, and trust. In return, you owe them a better experience. When that exchange is clear and honest, users lean in. When it's not, they abandon. ## What the questions you skip are telling you There's a useful test here that I keep coming back to: if you're not sure whether to include a question in onboarding, ask yourself whether you can justify *why* you're asking it — not to yourself, but out loud, to the user. "We're asking for your role so we can show you the features most relevant to how you work." "We're asking about your team size so we don't waste your time on things that don't apply." If you can say it plainly and it sounds reasonable, ask it. If you find yourself reaching for vague justifications, or worse, deciding you don't actually need the answer for anything specific — that's telling you something. Either the question shouldn't be there, or you haven't yet figured out what you'd do with the answer. Both are worth knowing. ## The moment won't come back The startup instinct to minimize onboarding comes from a real fear, and I'm not saying ignore it. Drop-off during signup is real, and a bloated onboarding flow with irrelevant questions is genuinely a problem. But so is the missed opportunity. You will never again have this user's attention the way you have it right now. They signed up. They're curious. They want to be here. Ask them something. Make it worth answering. Tell them why. That's not friction. That's just a conversation.

0 views