Latest Posts (20 found)
Jim Nielsen -22 days ago

You Might Debate It — If You Could See It

Imagine I’m the design leader at your org and I present the following guidelines I want us to adopt as a team for doing design work: How do you think that conversation would go? I can easily imagine a spirited debate where some folks disagree with any or all of my points, arguing that they should be struck as guidelines from our collective ethos of craft. Perhaps some are boring, or too opinionated, or too reliant on trends. There are lots of valid, defensible reasons. I can easily see this discussion being an exercise in frustration, where we debate for hours and get nowhere — “I suppose we can all agree to disagree”. And yet — thanks to a link to Codex’s front-end tool guidelines in Simon Willison’s article about how coding agents work — I see that these are exactly the kind of guidelines that are tucked away inside an LLM that’s generating output for many teams. It’s like a Trojan Horse of craft: guidelines you might never agree to explicitly are guiding LLM outputs, which means you are agreeing to them implicitly. It’s a good reminder about the opacity of the instructions baked in to generative tools. We would debate an open set of guidelines for hours, but if there’re opaquely baked in to a tool without our knowledge does anybody even care? When you offload your thinking, you might be on-loading someone else’s you’d never agree to — personally or collectively. Reply via: Email · Mastodon · Bluesky Typography: Use expressive, purposeful fonts and avoid default stacks (Inter, Roboto, Arial, system). Motion: Use a few meaningful animations (page-load, staggered reveals) instead of generic micro-motions. Background: Don't rely on flat, single-color backgrounds; use gradients, shapes, or subtle patterns to build atmosphere. Overall: Avoid boilerplate layouts and interchangeable UI patterns. Vary themes, type families, and visual languages.

1 views

Fragments: March 26

Anthropic carried a study, done by getting its model to interview some 80,000 users to understand their opinions about AI, what they hope from it, and what they fear. Two things stood out to me. It’s easy to assume there are AI optimists and AI pessimists, divided into separate camps. But what we actually found were people organized around what they value—financial security, learning, human connection— watching advancing AI capabilities while managing both hope and fear at once. That makes sense, if asked whether I’m a an AI booster or an AI doomer, I answer “yes”. I am both fascinated by its impact on my profession, expectant of the benefits it will bring to our world, and worried by the harms that will come from it. Powerful technologies rarely yield simple consequences. The other thing that struck me was that, despite most people mixing the two, there was an overall variance between optimism and pessimism with AI by geography. In general, the less developed the country, the more optimism about AI. ❄                ❄                ❄                ❄                ❄ Julias Shaw describes how to fix a gap in many people’s use of specs to drive LLMs: Here’s what I keep seeing: the specification-driven development (SDD) conversation has exploded. The internet is overflowing with people saying you should write a spec before prompting. Describe the behavior you want. Define the constraints. Give the agent guardrails. Good advice. I often follow it myself. But almost nobody takes the next step. Encoding those specifications into automated tests that actually enforce the contract. And the strange part is, most developers outside the extreme programming crowd don’t realize they need to. They genuinely believe the spec document is the safety net. It isn’t. The spec document is the blueprint. The safety net is the test suite that catches the moment your code drifts away from it. As well as explaining why it’s important to have such a test suite, he provides an astute five-step checklist to turn spec documents into executable tests. ❄                ❄                ❄                ❄                ❄ Lawfare has a long article on potential problems countering covert action by Iran . It’s a long article, and I confess I only skip-read it. It begins by outlining a bunch of plots hatched in the last few years. Then it says: If these examples seem repetitive, it’s because they are. Iran has proved itself relentless in its efforts to carry out attacks on U.S. soil—and the U.S., for its part, has demonstrated that it is capable of countering those efforts. The above examples show how robustly the U.S. national security apparatus was able to respond, largely through the FBI and the Justice Department…. That is, potentially, until now. The current administration has decimated the national security elements of both agencies through firings and forced resignations. People with decades of experience in building interagency and critical source relationships around the world, handling high-pressure, complicated investigations straddling classified and unclassified spaces, and acting in time to prevent violence and preserve evidence have been pushed out the door. Those who remain not only have to stretch to make up for the personnel deficit but also are being pulled away by White House priorities not tied to the increasing threat of an Iranian response. The article goes into detail about these cuts, and the threats that may exploit the resulting gaps. It’s the nature of national security people to highlight potential threats and call for more resources and power. But it’s also the nature of enemies to find weak spots and look to cause havoc. I wonder what we’ll think should we read this article again in a few years time

0 views

Don’t trust, verify

Software and digital security should rely on verification , rather than trust. I want to strongly encourage more users and consumers of software to verify curl. And ideally require that you could do at least this level of verification of other software components in your dependency chains. With every source code commit and every release of software, there are risks. Also entirely independent of those. Some of the things a widely used project can become the victim of, include… In the event any of these would happen, they could of course also happen in combinations and in a rapid sequence. curl, mostly in the shape of libcurl, runs in tens of billions of devices. Clearly one of the most widely used software components in the world. People ask me how I sleep at night given the vast amount of nasty things that could occur virtually at any point. There is only one way to combat this kind of insomnia: do everything possible and do it openly and transparently. Make it a little better this week than it was last week. Do software engineering right. Provide means for everyone to verify what we do and what we ship. Iterate, iterate, iterate. If even just a few users verify that they got a curl release signed by the curl release manager and they verify that the release contents is untainted and only contains bits that originate from the git repository, then we are in a pretty good state. We need enough independent outside users to do this, so that one of them can blow the whistle if anything at any point would look wrong. I can’t tell you who these users are, or in fact if they actually exist, as they are and must be completely independent from me and from the curl project. We do however provide all the means and we make it easy for such users to do this verification . The few outsiders who verify that nothing was tampered with in the releases can only validate that the releases are made from what exists in git. It is our own job to make sure that what exists in git is the real thing . The secure and safe curl. We must do a lot to make sure that whatever we land in git is okay. Here’s a list of activities we do. All this done in the open with full transparency and full accountability. Anyone can follow along and verify that we follow this. Require this for all your dependencies. We plan for the event when someone actually wants and tries to hurt us and our users really bad. Or when that happens by mistake. A successful attack on curl can in theory reach widely . This is not paranoia. This setup allows us to sleep well at night. This is why users still rely on curl after thirty years in the making. I recently added a verify page to the curl website explaining some of what I write about in this post. Jia Tan is a skilled and friendly member of the project team but is deliberately merging malicious content disguised as something else. An established committer might have been breached unknowingly and now their commits or releases contain tainted bits. A rando convinced us to merge what looks like a bugfix but is a small step in a long chain of tiny pieces building up a planted vulnerability or even backdoor Someone blackmails or extorts an existing curl team member into performing changes not otherwise accepted in the project A change by an established and well-meaning project member that adds a feature or fixes a bug mistakenly creates a security vulnerability. The website on which tarballs are normally distributed gets hacked and now evil alternative versions of the latest release are provided, spreading malware. Credentials of a known curl project member is breached and misinformation gets distributed appearing to be from a known and trusted source . Via email, social media or websites. Could even be this blog! Something in this list is backed up by an online deep-fake video where a known project member seemingly repeats something incorrect to aid a malicious actor. A tool used in CI, hosted by a cloud provider, is hacked and runs something malicious While the primary curl git repository has a downtime, someone online (impersonating a curl team member?) offers a temporary “curl mirror” that contains tainted code. we have a consistent code style (invalid style causes errors). This reduces the risk for mistakes and makes it easier to debug existing code. we ban and avoid a number of “sensitive” and “hard-to-use” C functions (use of such functions causes errors) we have a ceiling for complexity in functions to keep them easy to follow, read and understand (failing to do so causes errors) we review all pull requests before merging, both with humans and with bots. We link back commits to their origin pull requests in commit messages. we ban use of “binary blobs” in git to not provide means for malicious actors to bundle encrypted payloads (trying to include a blob causes errors) we actively avoid base64 encoded chunks as they too could function as ways to obfuscate malicious contents we ban most uses of Unicode in code and documentation to avoid easily mixed up characters that look like other characters. (adding Unicode characters causes errors) we document everything to make it clear how things are supposed to work. No surprises. Lots of documentation is tested and verified in addition to spellchecks and consistent wording. we have thousands of tests and we add test cases for (ideally) every functionality. Finding “white spots” and adding coverage is a top priority. curl runs on countless operating systems, CPU architectures and you can build curl in billions of different configuration setups: not every combination is practically possible to test we build curl and run tests in over two hundred CI jobs that are run for every commit and every PR. We do not merge commits that have unexplained test failures. we build curl in CI with the most picky compiler options enabled and we never allow compiler warnings to linger. We always use that converts warnings to errors and fail the builds. we run all tests using valgrind and several combinations of sanitizers to find and reduce the risk for memory problems, undefined behavior and similar we run all tests as “torture tests”, where each test case is rerun to have every invoked fallible function call fail once each, to make sure curl never leaks memory or crashes due to this. we run fuzzing on curl: non-stop as part of Google’s OSS-Fuzz project, but also briefly as part of the CI setup for every commit and PR we make sure that the CI jobs we have for curl never “write back” to curl. They access the source repository read-only and even if they would be breached, they cannot infect or taint source code. we run and other code analyzer tools on the CI job config scripts to reduce the risk of us running or using insecure CI jobs. we are committed to always fix reported vulnerabilities in the following release. Security problems never linger around once they have been reported. we document everything and every detail about all curl vulnerabilities ever reported our commitment to never breaking ABI or API allows all users to easily upgrade to new releases. This enables users to run recent security-fixed versions instead of legacy insecure versions. our code has been audited several times by external security experts, and the few issues that have been detected in those were immediately addressed Two-factor authentication on GitHub is mandatory for all committers

0 views

An Interview with Arm CEO Rene Haas About Selling Chips

Listen to this post: Good morning, This week’s Stratechery Interview is with Arm CEO Rene Haas, who I previously spoke to in January 2024 , and who recently made a major announcement at Arm’s first-ever standalone keynote : the long-time IP-licensing company is undergoing a dramatic shift in its business model and selling its own chips for the first time. We dive deep into that decision in this interview, including the meta of the keynote, Arm’s history, and how the company has evolved, particularly under Haas’ leadership. Then we get into why CPUs matter for AI, and how Arm’s CPU compares to Nvidia’s, x86, and other custom Arm silicon. At the end we discuss the risks Arm faces, including a maxed-out supply chain, and how the company will need to change to support this new direction. As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for clarity. Rene Haas, welcome back to Stratechery. RH: Ben Thompson, thank you. Well, you used to be someone special, I think you were the only CEO I talked to who did nothing other than license IP, now you’re just another fabless chip guy like [Nvidia CEO] Jensen [Huang] or [Qualcomm CEO] Cristiano [Amon]. RH: (laugh) Yeah, you can put me in that category, I guess. Well the reason to talk this week is about the momentous announcements you made at the Arm Everywhere keynote — you will be selling your own chip. But before I get to the chip, i’m kind of interested in the meta of the keynote itself, is this Arm Everywhere concept new like as far as being a keynote? Why have your own event? RH: You know, we were talking a little bit about this going into the day. I don’t think we’ve ever as a company done anything like this. Yeah I didn’t think so either, I was trying to verify just to make sure my memory was correct, but yes it’s usually like at Computex or something like that. RH: Our product launches have usually been lower key, we try to use them usually around OEM products that are using our IP that use our partner’s chips, but we just felt like this was such a momentous day for the company/very different day for the company that we want to do something very, very unique. So it was very intentional, we were chatting about it prior, I don’t think we’ve done anything like before. Who was the customer for the keynote specifically? Because you’re making a chip — Meta is your first customer, they knew about this, they don’t need to be told — what was the motivation here? Who are you targeting? RH: When you prepare for these things, that’s one of the first questions you ask yourself, “Who is this for?”, “Is it for the ecosystem?”, “Is it for customers?”, “Is it for investors?”, “Is it for employees?”, and I think under the umbrella of Arm Everywhere, the answer to those questions was “Yes”, everybody. We felt we needed to, because a lot of questions come up on this, right, Ben, in terms of, “What are we doing?” “Why are we doing?”, “What’s this all about?”, the answer to that question was “Yes”, it was for everyone. One more question: Why the name “Arm Everywhere”? RH: We were trying to come up with something that was going to thematically remind people a bit about who Arm was and what we are and what we encompass, but not actually tease out that we were going to be announcing something. Right, you can’t say “Arm’s New Chip Event”. RH: (laughing) Yes, exactly, “Come to the new product launch that we’ve not yet announced”. So we just decided that that would be enough of a teaser to get people interested. Just to note you said, “What Arm was “, what was Arm? You used the past tense there. RH: Yeah, and I will say, we are still doing IP licensing, you can still buy CSSs [Compute Subsystem Platforms], so we are still offering all of the products we did before that day and plus chips, so I’m not yet just another chip CEO, I think I’m still very different than the other folks you talked to. Actually, back up, give me the whole Rene Haas version of the history of Arm. RH: Oh, my goodness gracious. The company was born out of a joint venture way back in the day between Acorn Computer and then ultimately Apple and VLSI to design a low-power CPU to power PDAs. The thing that was kind of important was, “I need something that is going to run in a plastic package” — you may remember back then just about everything was in ceramic — “I can’t melt the PDA, and oh, by the way, this thing’s got to run off a battery”. So they chose a RISC architecture, and that’s where the ARM ISA [ instruction set architecture ] was born and that’s what the first chip was intended to do, and the thing wasn’t very successful. So fast forward, however, the founders and then a very, very important guy in Arm’s history, Robin Saxby , put out a goal to make the ARM ISA the global standard for CPUs. And if you go back to early 1990s, there were a lot of CPUs out there and also there was not an IP business, there really wasn’t a very good fabless semiconductor model, and there was not a very good set of tools to develop SoCs [system on a chip] . So in some ways, and this is what I love about the company, it was a bit of a crazy idea because you didn’t really have all the things in place necessary to go off and do that. But back then, there were a lot of companies designing their own CPUs, if you will, and the idea there being that ultimately this would be something that customers could be able to access, acquire, and build, and then ultimately build a standard upon it. It was ultimately the killer design win for the company, and I know you’re a strategist and historian as well around this area, is the classic accidental example of TI was developing the baseband modem for an applications processor for the Nokia GSM phone and they needed a microcontroller, something to kind of manage the overall process, and they stumbled across what we were doing, and we licensed them the IP. That was kind of the first killer license that got the company off the ground and that’s what really got us into mobile. People may think, “You were the heart of the smartphone and you had this premonition to design around iOS” or, “You worked really closely in the early days of Android”, it was the accidental, we found ourselves into the Nokia phone, GSM phone, Symbian gets ported to ARM, and then there starts to be at least enough of a buzz around nascent software, but that’s how the company was born. I did enjoy for the keynote, you had a bunch of different Arm devices in the run-up running on the screen, and my heart did do a little pitter-patter when the Nokia phones popped on. Another day, to be sure. RH: Yeah, cool stuff right? But that’s kind of how the company got off the ground, and as it was a general purpose CPU which meant we didn’t really have it designed for, “It’s going to be good at X”, or, “It’s going to be good at Y, it’s going to be good at Z”, it turned out that because it was low power, it was pretty good to run in a mobile application. I think the historic design win where the company took off was obviously the iPhone, and the precursor to the iPhone was the iPod was using a chipset from PortalPlayer that used the ARM7 and the Mac OS was all x86, and then inside the company, it was Tony Fadell’s team arguing , “Let’s use this PortalPlayer architecture”, versus, “Do we go with Intel’s x86 and a derivative atom”, back in the day, and once a decision was made that “We’re going to port to ARM for iOS”, that’s where the tailwind took off. So is it definitely making up too much history to go back and say, “The reason Arm was a joint venture to start is because people knew you needed to have an ecosystem and not be owned by any one company”, or whatever it might be, that’s being too cute about things — the reality is it was just stumbling around, barely surviving, and just fell backwards into this? RH: Which, by the way, every good startup that’s really been successful, that’s kind of how the formula works. You stumble around in the dark, you find something you’re good at and then you engage with a customer and you find what ultimately is sticky and that’s really what happened with Arm. When you consider the changes that you’ve made at Arm, and I want to get your description of the changes that you’ve made, but how many of the challenges that you face were based on legitimate market fears about, “We’re going to alienate customers” or whatever it might be versus maybe more cultural values like, “We serve everyone”, versus almost like a fear like, “This is just the market we’ve got, let’s hold on to it”? RH: I think, Ben, we thought about it much more broadly, and when I took over and you and I met not long after that, there were a couple of things that were happening in the market in terms of a need to develop SoCs faster, a need to get to market more quickly and we knew that intuitively that no one knew how to combine 128 Arm cores together with a mesh network and have it perform better than we could because that’s what we had to do to go off and verify the cores. So we knew that doing compute subsystems really mattered, but I came from a bit of a different belief that if you own the ISA at the end of the day, you are the platform, you are the compute platform and it is incumbent upon you to think about how to have a closer connection between the hardware and the software, that is just table stakes. I don’t think it’s anything new, if you think about what Steve Jobs thought about with Apple and everything we’ve seen with Microsoft, with Wintel. I felt with Arm, particularly not long after I started, in 2023 and 2024, this was only getting accelerated with AI. Because with AI, the models and innovation moving way, way faster than the hardware could possibly keep up. I just felt for the company in the long term that this was a direction that we had to strongly consider, because if you are the ISA and you are the platform, the chip is not the product, the system is. That’s the thing that I was sort of driving at when I was writing about your launch. There’s an aspect where you’ve made these big changes, you’re originally just the ISA, then you’re doing your own cores, not selling them, but you’re basically designing the cores, then you’re moving to these systems on a chip designs and now you’re selling your own chips. But it feels like your portion of the overall, “What is a computer?”, has stayed fairly stable, actually, because, “What is a computer?”, is just becoming dramatically more expansive. RH: I think that’s exactly right. Again, if you are a curator of the architecture and you are an owner of the ISA, as good as the performance-per-watt is, as interesting as the microarchitecture is, as cool as it is in terms of how you do branch prediction, the software ecosystem determines your destiny. And the software ecosystem for anyone building a platform needs to have a much closer relationship between hardware and software, simply in terms of just how fast can you bring features to market, how fast can you accelerate the ecosystem, and how can you move with the direction of travel in terms of how things are evolving. You mentioned the big turning point or biggest design win was the iPhone way back in the day, and the way I’ve thought about Arm versus x86 — there’s been, you could make the case, ARM/RISC has been theoretically more efficient then CISC, and I’ve talked to Pat Gelsinger about how there was a big debate in Intel way back in the 80s about should we switch from CISC to RISC, and he was on the side of and won the argument that by the time we port everything to RISC we could have just built a faster CISC chip that is going to make up all the difference and that carried the day for a very long time. However, mobile required a total restart, you had to rebuild everything from scratch to deliver the power efficiency, and I guess the question is, you’ve had a similar dynamic for a long time about Arm in the data center theoretically is better, you care about power efficiency etc, is there something now — is this an iPhone-type moment where there’s actually an opportunity for a total reset to get all the software rewritten that needs to be done? Or have companies like Amazon and Qualcomm or whatever efforts they’ve done paved the ground that it’s not so stark of a change? RH: It’s a combination of both. One of the big advantages we got with Amazon doing Graviton in 2019, and then subsequently the designs we had with Google, with Axion, and Microsoft with Cobalt, is it just really accelerated everything going on with cloud-native, and anything that moves to cloud-native has kind of started with ARM. What do you mean by cloud native? RH: Cloud-native meaning these are applications that are starting from scratch to be ported to ARM. Built on a Linux distro, but not having to carry anything about running super old legacy software or running COBOL or something of that nature on-prem, so that was a huge benefit for us in terms of the go-forward. Certainly we got a huge interjection of growth when Nvidia went from the generation before Hopper, which I think was Volta or Pascal, I may be mixing up their versions, which was an x86 connect to Grace. So when they went to Grace Hopper, then Grace Blackwell, and now Vera, the AI stack for the head node now starts to look like ARM, that helps a lot in terms of how the data center is organized, so we certainly got a benefit with that. I think for us, the penny drop moment was when, and it’s probably 2018, 19 timeframe, is when Red Hat had production Linux distros for ARM and that really also accelerated things in terms of the open source community, the uploads and things that made things a lot, a lot easier from the software standpoint. Give me the timeline of this chip. When did you make the decision to build this chip? You can tell me now, when did this start? RH: You know, it started with a CSS, right? And we were talking to Meta about the CSS implementation. Right. And just for listeners, CSS is where you’re basically delivering the design for a whole system on a chip sort of thing. RH: Compute subsystem, yeah, so it’s the whole system on a chip. And by the way, it’s probably 95% of the IP that sits on a chip. What doesn’t include? It doesn’t include the I/O, the PCIe controllers, the memory controllers, but it’s most of the IP. And this is what undergirds — is Cobalt really the first real shipping CSS chip? Or does Graviton fall under this as well? RH: Cobalt’s probably the first incarnation of using that, so Meta was looking at using that and I think the discussions were taking place in the 2025 timeframe, mid-2025 timeframe. Here’s the key thing, Ben, not that long ago. Right. Well, that was my sense it was not that long ago, so I’m glad to hear that confirmed. RH: Not that long ago. Because CSS takes you a lot of the way there so that discussion in around the 2025 timeframe that we were going back and forth of, “Are you licensing CSS”, versus, “Could you build something for us?”, and we had been musing about, “Was this the right thing for us to do from a strategy standpoint?”, and how we thought about it, but ultimately it came down to Meta saying, “We really want you to do this for us, we think this is going to be the best way to accelerate time to market and give us a chip that’s performant and in the schedule that we need”, so somewhere in the 2025-ish timeframe, we agreed that, yes, we’ll do this for you. Why did Meta want you to do it instead of them finishing it off themselves? RH: I think they just did the ROI, in terms of, “I’ve got a lot of people working on things like MTIA , I’ve got a whole bunch of different projects internally, is it better that you do it versus we do it”? “How much can we actually differentiate a CPU”? RH: Yeah and by the way, that is ultimately what it comes down to at some point in time and the fact that the first one that came back works, it’s going to be able to go into production, and it’s ready to go. I’m not going to say they were shocked, but we kind of knew that was going to happen because we knew how to do this stuff and the products were highly performant and tested in the CSS, so it happened fast is the short answer. So if we talk about Arm crossing the Rubicon, was it actually not you selling this chip it was when you did CSS? RH: One could say that that was a big step. When we started talking about doing CSSs, let me step back, we made a decision to do CSSs— Explain CSSs and that decision because I think that’s actually quite interesting. RH: What is a CSS? It’s a compute subsystem, it takes all of the blocks of IP that we sold individually and puts them together in a fully configured, verified, performant deliverable that we can just hand to the customer and they can go off and complete the SoC. Some customers have told us it saves a year, some say a year-and-a-half and this is really around the test and verification in terms of the flow. One of the examples I gave, it’s a little cheeky, but it kind of worked during the road show, was when we were trying to explain to investors, “What’s IP, what’s a CSS?”, I said, go to the Lego store, and you’ve got a bin of Legos, yellow Legos, red Legos, blue Legos, trying to buy all those Legos and building the Statue of Liberty is a pain, or you can go over to the boxes where it’s the Statue of Liberty and just put those pieces together, and the Statue of Liberty is going to look beautiful. This is what the CSS was. I just want to jump in on that, because I was actually thinking about this, the Lego block concept is a common one that’s used when talking about semiconductors, but I remember being back in business school, and this was 2010, somewhere around then, and one of the case studies that we did was actually Lego, and the case study was the thought process of Lego deciding whether or not to pursue IP licensing as opposed to sticking with their traditional model, and all these trade-offs about, “We’re going to change our market”, “We’re going to lose what Lego is”, the creativity aspect, “It’s going to become these set pieces”. I just thought about that in this context where I came down very firmly on the side of, “Of course they should do this IP licensing”, but it was almost the counter was this sort of traditionalist argument which is kind of true — Legos today are kind of like toys for adults to a certain extent, and you build it once, reading directions and you think back to when I was a kid and you had all the Legos and it was just your creativity and your imagination and I’m like, “Maybe this analogy with Arm is actually more apt than it seems”. There’s a very romantic notion of IP licensing, you go out and make new things, “We got this for you”, versus, “No we’re just giving you the whole chip”, or in this case of CSS you, to your point, you could go get The Statue of Liberty, don’t even bother building it yourself. RH: And I think I came across this in the early days. In the 1990s, I was working with ASIC design at Compaq Computer, and they were doing all their ASICs for Northbridge , Southbridge , VGA controllers, and this is when the whole chipset industry took off. And I remember one of the senior guys at Compaq explaining why you’re doing this, he said, “I’m all about differentiation, but there needs to be a difference”. And to some extent, that’s a little bit of this, right? You can spend all the time building it, but if it’s all built and you spent all this time and it’s not functionally different nor performant different, but you spent time — well, if you’re playing around with Legos and you got all day, that’s fine — but if you’re running a business and you’re trying to get products out quickly, then time is everything, and that’s really what CSS did. It kind of established to folks that, “My gosh, I can save a lot of time on the work I was doing that was not highly differentiated”, and in fact, in some case, it was undifferentiated because we could get to a solution faster in such a way that it was much more performant than what folks might be trying to get to the last mile. So when we started talking about this to investors back in 2023 during the roadshow, their first question was, “Aren’t you going to be competing with your customers?”, and, “Isn’t this what your customers do?”, and, “Aren’t they going to be annoyed by it?”, and my answer was, “If it provides them benefit, they’ll buy it, if it does not present a benefit, they won’t buy it”, that’s it. And what we found is a lot of people are taking it, even in mobile, where people where we were told was, “No, no, these are the black belts and they’re going to grind out the last mile and you can’t really add a lot of value” — we’ve done a bunch in the mobile space, too. So with Meta, was the deal like, “Okay, we’ll do the whole thing for you, but then we get a sell to everyone?”, and they’re like, “That’s fine, we don’t care, it doesn’t matter”? RH: Yes, exactly. We said, “If we’re going to do this, how do you feel about us selling it to other customers?”, and they said, “We’re fine with that”. When did you realize that the CPU was going to be critical to AI? RH: Oh, I think we always thought it was. I had a cheeky little slide in the keynote about the demise of the CPU, and I had to spend a lot of time. I mean, I don’t know, I might have talked to someone recently who I swear was pretty adamant that a lot of CPUs should be replaced with GPUs, and now they’re selling CPUs, too. RH: I had to talk to investors and media to explain to them why a CPU was even needed. They were a little bit like, “Can’t the GPU run by itself?”, it’s like a kite that doesn’t need anything to hang on to. First off, on table stakes, obviously you need the data center but particularly as AI moves into smaller form factors, physical AI, edge, where you obviously have to have a CPU because you’re running display, you have I/O, you have human interface. It’s how do you add accelerated AI onto the CPU? So yeah, I think we kind of always knew it was going to be there, and there was going to be continued demand for it. Right, but there’s a difference between everyone on the edge is going to have a CPU so we can layer on some AI capabilities. It doesn’t have the power envelope or the cost structure to support a dedicated GPU, that’s fair, that’s all correct. It’s also correct that, to your point, a GPU needs a CPU to manage its scheduling and its I/O and all those sorts of things, but what I’m asking about specifically is actually, we’re going to have these agentic workflows, all of which what the agent does is CPU tasks and so it’s not just that we will continue to need CPUs, we might actually need an astronomical more amount of CPUs. Was that part of your thesis all along? RH: I think we have instinctively thought that to be the case. And what drives that? The sheer generation of tokens, tokens by the pound, tokens by the dump truck, if you will. The more tokens that the accelerators are generating, whether that’s done by agentic input, human input, whatever the input is, the more tokens that are generated, those tokens have to be distributed. And the distribution of those tokens, how they are managed, how they are orchestrated, how they are scheduled, that is a CPU task purely. So we kind of intuitively felt that over time, as these data centers go from hundreds of megawatts to gigawatts, you are going to need, at a minimum, CPUs that have more cores, period. There was this belief of 64 cores might be enough and maybe 128 cores would be the limit, Graviton 5 is 192 cores, the Arm AGI CPU is 136, we were already starting to see core counts go up, and we started thinking about, “What’s driving all these core counts going up, is it agentic AI?”. A proxy for it was just sheer tokens being generated in a larger fashion that needed to be distributed in a fast way and what was layered onto that was things like Codex, where latency matters, performance matters, delivering the token at speed matters. So I think all of that was bringing us to a place that we thought, “Yeah, you know what?”, we’re seeing this core count thing really starting to go up, we were seeing that about a year ago, Ben. So am I surprised that the CPU demand is exploding the way it is? Not really. Agentic AI, just the acceleration of how these agents have been launched, certainly is another tailwind kicker. Which happens to line up with your mid-2025 decision that, “Maybe we should sell CPUs”. RH: Yeah, it all kind of lines up. We were seeing that, you know what, we think that this is going to be a potentially really, really large market where not only core count matters, but number of cores matters, efficiency matters because we could imagine a world where each one of these cores is running an agent or a hypervisor and the number of cores can really, really matter in the system, which laid claim to what we were thinking about in terms of, “Okay, we can see a path here in terms of where things are going”. So CSSs with greater than 128 cores in the implementation? Absolutely. Do I think, could I see 256? Absolutely. Could I see 512? Possibly. I think then it comes down to the memory subsystem, how you keep them fed, etc., but yeah, so short answer, about a year ago we started seeing this. Do you think that core count is going to be most important or is it going to be performance-per-core? RH: I think core count is going to be quite important because I think, again, I have a belief that each one of these cores will want to potentially run their own agent, launch a hypervisor job, launch a job that can be run independently, launch it, get the work done, go to sleep. The performance of the core is going to matter, no doubt about it, but I think the efficiency of that core is probably going to matter just as much as the performance is. Well, the reason I ask is because you talked a lot in this presentation about the efficiency advantage, where the company born from a battery or whatever your phrase was, and that certainly, I think, rings true, particularly in isolation. But in a large data center, if the biggest cost is the GPUs, then isn’t it more important to keep the GPUs fed? Which basically to say, is a chip’s capability to feed GPUs actually more important on a systemic level than necessarily the chip’s efficiency on its own? RH: I’m going to plead the fifth and say yes to both. You’ve got to pick one! RH: Well, what’s important? I think the design choice that Nvidia made with Vera was very important, Vera is designed to feed Rubin, it has a very specific interface, NVLink Fusion or NVLink chip-to-chip, provides a blazing fast interface, and has the right number of cores in terms of to keep that GPU fed optimally. But at the same time, is it the right configuration in a general-purpose application where you want to run an air-cooled rack in the same data hall? If you think about a data hall where you might have a Vera Rubin liquid-cooled rack sitting right next to a liquid-cooled Vera rack, but somewhere else inside the data center, you’ve got room for multiple air-cooled racks. That space that you may have not used in the past for CPU, you want to because of the problem statement that I just gave. So I actually think it’s a “both” world, which is why when people ask me, “Oh my gosh, aren’t you competing with Nvidia Vera, and aren’t people going to get confused?” — not particularly, I think there’s ample space for both. So you feel like Nvidia might be selling standalone Vera racks but that’s not necessarily what Vera was designed for, that’s what you’re designed for, and you think that’s where you’re going to be different. RH: Yes, and I mean, if you look at what’s been announced so far from Nvidia, they announced a giant 256-CPU liquid-cooled rack and the first implementation that we’re doing with Meta is a much smaller air-cooled rack. So very, very different right off the get-go. But you will have a liquid-cooled option? RH: If customers want that, we can do that too. I think that differentiation makes sense. Well, speaking of differentiation, why ARM versus x86? Why is there an opportunity here? RH: Performance-per-watt, period. Graviton sort of started it, and they’ve been very public about their 40% to 50%, Cobalt stated the same with Microsoft, Axion, Google stated the same, Nvidia has stated the same. Just on table stakes, 2x performance-per-watt is pretty undeniable. And that, I think, it starts there as probably the primary value proposition. What is x86 still better at? You can’t say legacy software, other than legacy software. RH: Go back to our earlier part of our conversation, right? The ISA, what is the value of the ISA? It is the software that it runs, right? It is the software that it runs. So if you were to look at where does x86 have a stronghold, x86 is very good at legacy on-prem software. Ok, fine, we’ll give you legacy on-prem software and I think part of the thesis here to your point a lot of this agentic work, it’s on Linux, it’s using containers, it’s all relatively new, it all by and large works well in ARM already, but you did have a bit in the presentation where you interviewed a guy from Meta that was about porting software. How much work still needs to be done there? RH: There’s a delta between the porting work and the optimization work. Graviton, what Amazon will tell you, is that greater than 50% of their new deployments and accelerating is ARM-based. And, yes, am I the CEO of Arm and do I have a biased opinion? Of course. But I find it hard to, on a clean sheet design, if you were starting from scratch and the software porting was done and you had either cloud-native or the application space was established or as a head node, I don’t know why you’d start with x86. What about, why are you doing ARM? We did ARM versus x86, I’m sort of working my way down the chain here — actually, I did backwards, we stuck in Vera already — but why you versus custom silicon generally? You talked about Amazon. Why do you need to do the whole thing? RH: So let’s think about an Amazon, for example. Amazon does Graviton, would I like Amazon to buy the Arm AGI CPU? Yes. Am I going to be heartbroken if they never buy one? No, I’m perfectly fine if they stay building what they’re building. Are they ever going to buy one? No. RH: I hope they do! But if they don’t, it’s not going to be the end of the world. SAP — SAP runs a lot of software on Amazon, they run SAP HANA on Amazon, they also have a desire to do stuff on-prem and if they’re doing something on-prem in a smaller space and they’re looking to leverage that work, they’d love to have something that is ARM-based. Prior to us doing this product, there was no option at all, right? So that’s a very, very good example. Similar with a Cloudflare. Is Cloudflare going to do their own implementation? Likely not. Do they run on other people’s clouds? Sure, they do. Do they have an application that could be on-prem running on ARM? Absolutely. So we think that, and I don’t want to prefetch this, Ben, but we had a lot of questions from folks like, “Amazon won’t buy from you”, “Google won’t buy from you”, “Microsoft won’t buy from you”, because you’re competing with them. And we say, well, Google builds TPUs, yet they buy a lot of Nvidia GPUs, so it’s not so binary. That’s true. They’ll buy what their customers ask them to buy. RH: 100%. And if we solve a problem with an implementation that theirs does not, they’ll buy it, and if we don’t, they won’t. Just you know between you and me, is the only customer silicon that is truly potentially competitive Qualcomm and you’re just not too worried about making them mad? RH: This is off the record here? (laughing) I didn’t say off the record. RH: Qualcomm, it’s funny, I had a question at the investor conference about competing with Nvidia. And I said, you know, a month ago, no one would have asked about any Arm person competing with anybody. So it’s wonderful to have these kind of conversations, the market is underserved and there aren’t choices. There isn’t a product from Qualcomm, there isn’t a product from MediaTek, there isn’t a product from Infineon, there just isn’t. Is that sort of your case? If there were a bunch of options in the market, would you still be entering? RH: We entered this because Meta asked us to and because Meta asked us to we did. So if I was to answer your question, would we have entered if those other four guys were there or five hypotheticals? I don’t know that Meta would have asked us. If the Arm AGI CPU, it’s being built on TSMC’s 3-nm node, which is kind of impossible to get allocation for. How’d you get allocation? If you started this in 2025, how’d you pull that off? RH: We’re working through a back-end ASIC partner that helps secure the allocation for us. Oh, interesting. Are you concerned about that in the long run ? Like this business blows up and actually you just can’t make enough chips? RH: I’m probably less worried about that at the moment than I am about memory. I think that the business, the demand is very, very high actually for the chip, Ben and through our partner, we’re able to secure upside through TSMC, that has not been a problem. But memory is quite challenging and I think if there’s any limit to how big this business can get and I would say that what we provided to investors as a financial forecast is based upon the capacity we’ve secured on both memory and logic but if there was more memory could we sell more? Yes. This is sort of the sweet spot though of making predictions, everyone gets to say, “Wow, how are your predictions so accurate?”, it’s like, “Well it’s because I knew exactly how much what I would be able to make”. RH: Yeah, if there was more memory we’d be even more aggressive on the numbers. How did you make the memory decisions that you did in terms of memory bandwidth and all those sorts of pieces, particularly given the short timeline which you made this you. That wasn’t necessarily part of the CSS spec before, so how were you thinking about that? RH: The things we kind of looked at was, we sort of started with LP versus standard DRAM . Because Vera’s doing LP and you decided to do standard. RH: We’re doing standard DRAM, yeah. We thought we’d be a little bit better on the cost side that could help and at the same time, a little bit better on the capacity side. So it really kind of drove down to, we’re going to solve for capacity because we thought that that might matter in a more generalized application space to give the broader width of use, which then brought us to standard DDR versus LP. I think the reason we talked last time was in the context of you making a deal with Intel to get Arm working on 18A, and this was going to be a multi-generational partnership. What happened to that? Is that still around? RH: It’s still around. We did a lot of work on 18A because we felt that it was going to be really, really important if someone wanted to build on Intel 18A, that the Arm IP was available. So we did our part relative to if someone wants to go build an ARM-based SoC on Intel process, but that unfortunately hasn’t come to pass just yet. It’s interesting you mentioned that you’re actually not worried about TSMC capacity but you are worried about memory — I didn’t fully think through that being another headwind for Intel where they could really use TSMC having insufficient capacity to help them, but if memory is the first constraint then no one’s even getting there. RH: First off, obviously HBM [ high bandwidth memory ] being such a capacity hog, and then people moving from LP into HBM at the memory guys, then compounding on it, all of the explosion of the CPU demand drives up memory demand. So it all kind of adds on to itself, which makes the memory problem pretty acute. What exactly is in the bill of materials that you’re selling? You showed racks but you mentioned a partnership with Super Micro for example — if I buy a chip from Arm what exactly am I buying? You’ve mentioned memory obviously, so what else is in that? And what are you getting from partners? RH: Yeah, so we’ll send you a voucher code after the show, and you can place your orders. Just the SoCs. If you need to secure the memory, that’s on you, we’re not securing memory at this point in time. We did a lot of work with Super Micro, with Lenovo, with ASRock. So there’s a full 1U, 2U server blade reference architecture so the full BOM relative to all the passives and everything you need from an interconnect standpoint is all there. There’s a full BOM, which, as we mentioned in the session, the rack physically itself complies with OCP standards and then we’ve done all the work in terms of the reference design. So we can provide the full BOM of the reference platform, memory, but what we are selling only is the SoC. Very nerdy question here, but how are you going to report this from an accounting perspective? Just right off the top chips have a very different margin profile, is this going to all be broken out? How are you thinking about that? RH: We’ll probably do that. Today we break down licensing and royalty of the IP business, we’ll probably break out chips as a separate revenue stream. To go back to, you did call this event Arm Everywhere, will you ever sell a smartphone chip? RH: I don’t know, that’s a really hard question. I think we’re going to look at areas where we think we could add significant value to a market that’s underserved, that market’s pretty well served. It’s very well served and this agentic AI, potentially a new market, fresh software stack, makes sense to me. What risks are you worried about with this? You come across as very confident, “This is very obviously what we should”, how does this go wrong? RH: Most of my career has been spent actually in companies that have chips as their end business as opposed to IP. I’ve been at Arm 12 years, 13 years, I’ve been the CEO for about four-and-a-half. I did a couple of years, two, three years at a company called Tensilica that was doing, or actually the longer, five years, but most of my career was either NEC Semiconductor, Texas Instruments, Nvidia. Chip business is not easy, right? You introduce a whole different new set of characteristics. You have to introduce this term called “inventory” to your company. RH: RMAs, inventory, customer field failures, just a whole cadre of things that’s very new for our company, there certainly is execution risk that we’ve added that has not existed before. We had a 35-year machine being built that is incredibly good at delivering world-class IP to customers — doing chips is a whole different deal. I don’t want to minimize that, but at the same time, I don’t want to communicate that that’s something that we haven’t thought about deeply over the years and we’ve got a lot of people who have done that work inside the company. A lot of my senior executive team, ex-Broadcom, ex-Marvell, ex-Nvidia, we’ve got a lot of people inside the engineering organization who have come from that world, we’ve built up an operations team to go off and support that. So while there is risk, we’ve been taking a lot of steps inside the company to be adding the resources. We’ve been increasing our OpEx quite a bit in the quarters leading up to this, about 25% year-on-year, investors were asking a ton of questions about, “When are we going to see why you’re adding all those people?”, and Arm Everywhere explained that. We also told investors that that’s now going to taper off because we’ve got, we think what we need to go off and execute on all this. But I think that’s the biggest thing, Ben. And the upside is just absolute revenue dollars, I guess absolute profit dollars. RH: I think there’s a financial upside, certainly, in terms of financial dollars. But I think back to the platform, I think by being closer to the hardware and the software and the systems, we can develop even better products around IP, CSS, etc. because I think when you are the compute platform, it is incumbent upon you to have as close a relationship as you can between the software that’s developed on your platform. What’s the state of the business in China these days, by the way? RH: China still represents probably 15% of our revenue, we still have a joint venture in China, the majority of our businesses is royalties, royalties is much bigger than licensing in China. We still have a lot of design wins coming in the mobile space for people doing their own SoCs like a Xiaomi. The hyperscaler market is strong between Alibaba, ByteDance, Tencent, and then most of the robotics and EV guys are doing stuff based on ARM, whether it’s XPeng, BYD, Horizon Robotics. So our business is pretty healthy in China. You do have the Immortalis and Mali GPUs. Are those good at AI? RH: Yes they can be very good, we’ve added a lot of things to to our GPUs around what we call neural graphics so this is adding essentially a convolution and vector engine that can can help with AI. Right now the focus has been really more around AI in a graphics application, whether it’s around things like DLSS and things of other area, but we’ve got a lot of ingredients in those GPUs. So we should stay tuned, sounds very interesting. You did have one moment in the presentation that was a little weird, you were trying to say that this AI thing is definitely a real thing but you’re like, “Well it might be a financial bubble, but the AI is real”. Are you worried about all this money that is going into this that you’re making a play for a piece of, but is there some consternation in that regard? RH: No, what I was trying to indicate was when people talk about bubbles, typically it’s either valuation bubbles or investment bubbles. The valuation bubbles, those come and go over time. The investment bubble, I’m not as worried about in the sense of, “Is there going to be real ROI on the investment being made?”, I actually worry more about the, “Can you get all the stuff required to build out all of the scale?” — we just talked about memory, there’s TSMC capacity. I think the memory will be solved, they will ultimately not be able to help themselves, they will build more capacity, I’m worried about leading edge. TSMC will help themselves if they don’t have any challengers. RH: Turbines, right? You’ve got companies who are like GE Vernova or Mitsubishi, this is not their world of building factories well ahead to go serve an extra 5 to 10 gigawatts of power. So I think TSMC is super disciplined, and they’ve been world class at that throughout their history. Will the memory guys be able to help themselves? The numbers are now so large that even the Sandisk’s of the world and storage, everything has kind of gotten bananas, and that is a concern in terms of if just one of those key components of the supply chain blinks and decides not to invest to provide the capacity, then things kind of slow down. But the numbers, Ben, the numbers we’re talking about are numbers we’ve never seen before. $200 billion CapEx from an Amazon or $200 billion CapEx from a Google. And then you have companies like Anthropic talking about $6 billion revenue increases over a three-to-four month period, which are the size of some software companies. So we are in some very stratospheric levels in terms of spend that would I be surprised if there was a pause in something just as people calibrate? Yeah, I wouldn’t be surprised at all. But if I think about the 5 to 10-year trajectory, there’s no way you can say this is a bubble. If you said, “I think machines that can think as well as humans and make us more productive, that’s kind of a fad”, I don’t actually think that’s going to happen, it’s almost nonsensical. Just to sort of go full circle, you’ve been on the edge, and now this new product that gets the Arm Everywhere moniker but it’s about being in the data center — is the edge dead? Or if not dead is it are we in a fundamental shift where the most important compute is going to be in data centers or is there a bit where AI is real but it actually does leave the data center, go to the edge and that’s a bigger challenge? RH: I think until something is invented that is different than the transformer, and we talk about some very different model as to how AI is trained and inferred, then we’re looking at a lot of compute in the data center and some level of compute on the edge. I think if you just suspend animation for a second and we say, you know what, the transformer is it, and that’s what the world looks like for the next number of, the next 5 to 10 years, the edge is not going to be dead. The edge is going to have to run some level of native compute for whatever the thing has to do, and it’s going to run some AI acceleration, of course. But is everything going to happen in your pocket? No. I mean, that’s not going to happen. I’ve come down to that side too. I think in the fullness of time, at least for now, the thin client model, it looks like it’s going to be it. I guess that seems to be your case as well because you had a big event, it is for a data center GPU. Arm is Everywhere, but not everyone can buy it. RH: And power efficiency was a nice to have in the data center, but I would say it wasn’t existential. It is now, though. And I say that’s another big change because, again, one of the examples I gave, if you’re 4x-ing or 5x-ing or 6x-ing the CPUs in a given data center and you don’t want to give up one ounce of GPU accelerator power, then you’re going to squeeze everywhere you can and that, I think, is a thing that’s in our favor. Where’s Arm in 10 years? RH: I would like to think of as one of the most important semiconductor companies on the planet. We’re not there yet, but that’s how I would like the company to be thought about. Rene Haas, congratulations, great to talk. RH: Thank you, Ben. This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day!

0 views
DHH Yesterday

Basecamp becomes agent accessible

In the past 18 months, we've experimented with a ton of AI-infused features at 37signals. Fizzy had all sorts of attempts. As did Basecamp. But as Microsoft and many others have realized, it's not that easy to make something that's actually good and would welcomed by users. So we didn't ship. In the meantime, agents have emerged has the killer app for AI. Not only are LLMs much smarter when they can check their thinking using tools, but the file system also gives them the memory implant they needed to learn between prompts. And now they can actually do stuff! So while we keep cooking on actually-useful native AI features in Basecamp, we're launching a fully agent-accessible version today. We've revamped our API, created a brand-new CLI, and wrapped it all in a skill to teach agents how best to use it all. It works remarkably well, and it's really fast too. Not only can you have your agent look through everything in Basecamp, summarize whatever you need, but it can also set up to-do lists, post message updates, chat with humans and clankers alike, upload reference files, and arrange a project schedule. Anything you can do in Basecamp, agents can now do too. This becomes extra powerful when you combine Basecamp with all the other tools you might be using that are also agent accessible. For software development, you can use the MCP from Sentry to trawl through major sources of bugs, then have the agent summarize that in a message for Basecamp. Or you have it download, analyze, and highlight key customer complaints by giving it access to your help desk system. All this was possible in the past with APIs, hand-written integrations, and human data scientists. But it was cumbersome, slow, and expensive, so most people just didn't. A vanishingly small portion of Basecamp customers have ever directly interacted with our API. But agents? I think adoption is going to be swift. Not because everyone is going to run OpenCode, Claude Code, or Gemini CLI. But because agents are going to be incorporated into ChatGPT, Gemini, Grok, and all the other mainstream interfaces who were collectively embarrassed by OpenClaw's meteoric ascent  and popularity very quickly. There's a huge demand out there for a personal agent that can act as your private executive assistant. This is where the puck is going, and we're skating to meet it with agent accessibility across the board. Basecamp is first, Fizzy is next, and we'll hit HEY before long too. Revamped APIs, comprehensive CLIs, and the skills to use them whatever your harness or claws look like.

0 views

Porting Go's io package to C

Creating a subset of Go that translates to C was never my end goal. I liked writing C code with Go, but without the standard library it felt pretty limited. So, the next logical step was to port Go's stdlib to C. Of course, this isn't something I could do all at once. So I started with the standard library packages that had the fewest dependencies, and one of them was the package. This post is about how that went. io package • Slices • Multiple returns • Errors • Interfaces • Type assertion • Specialized readers • Copy • Wrapping up is one of the core Go packages. It introduces the concepts of readers and writers , which are also common in other programming languages. In Go, a reader is anything that can read some raw data (bytes) from a source into a slice: A writer is anything that can take some raw data from a slice and write it to a destination: The package defines many other interfaces, like and , as well as combinations like and . It also provides several functions, the most well-known being , which copies all data from a source (represented by a reader) to a destination (represented by a writer): C, of course, doesn't have interfaces. But before I get into that, I had to make several other design decisions. In general, a slice is a linear container that holds N elements of type T. Typically, a slice is a view of some underlying data. In Go, a slice consists of a pointer to a block of allocated memory, a length (the number of elements in the slice), and a capacity (the total number of elements that can fit in the backing memory before the runtime needs to re-allocate): Interfaces in the package work with fixed-length slices (readers and writers should never append to a slice), and they only use byte slices. So, the simplest way to represent this in C could be: But since I needed a general-purpose slice type, I decided to do it the Go way instead: Plus a bound-checking helper to access slice elements: Usage example: So far, so good. Let's look at the method again: It returns two values: an and an . C functions can only return one value, so I needed to figure out how to handle this. The classic approach would be to pass output parameters by pointer, like or . But that doesn't compose well and looks nothing like Go. Instead, I went with a result struct: The union can store any primitive type, as well as strings, slices, and pointers. The type combines a value with an error. So, our method (let's assume it's just a regular function for now): Translates to: And the caller can access the result like this: For the error type itself, I went with a simple pointer to an immutable string: Plus a constructor macro: I wanted to avoid heap allocations as much as possible, so decided not to support dynamic errors. Only sentinel errors are used, and they're defined at the file level like this: Errors are compared by pointer identity ( ), not by string content — just like sentinel errors in Go. A error is a pointer. This keeps error handling cheap and straightforward. This was the big one. In Go, an interface is a type that specifies a set of methods. Any concrete type that implements those methods satisfies the interface — no explicit declaration needed. In C, there's no such mechanism. For interfaces, I decided to use "fat" structs with function pointers. That way, Go's : Becomes an struct in C: The pointer holds the concrete value, and each method becomes a function pointer that takes as its first argument. This is less efficient than using a static method table, especially if the interface has a lot of methods, but it's simpler. So I decided it was good enough for the first version. Now functions can work with interfaces without knowing the specific implementation: Calling a method on the interface just goes through the function pointer: Go's interface is more than just a value wrapper with a method table. It also stores type information about the value it holds: Since the runtime knows the exact type inside the interface, it can try to "upgrade" the interface (for example, a regular ) to another interface (like ) using a type assertion : The last thing I wanted to do was reinvent Go's dynamic type system in C, so dropping this feature was an easy decision. There's another kind of type assertion, though — when we unwrap the interface to get the value of a specific type: And this kind of assertion is quite possible in C. All we have to do is compare function pointers: If two different types happened to share the same method implementation, this would break. In practice, each concrete type has its own methods, so the function pointer serves as a reliable type tag. After I decided on the interface approach, porting the actual types was pretty easy. For example, wraps a reader and stops with EOF after reading N bytes: The logic is straightforward: if there are no bytes left, return EOF. Otherwise, if the buffer is bigger than the remaining size, shorten it. Then, call the underlying reader, and decrease the remaining size. Here's what the ported C code looks like: A bit more verbose, but nothing special. The multiple return values, the interface call with , and the slice handling are all implemented as described in previous sections. is where everything comes together. Here's the simplified Go version: In Go, allocates its buffer on the heap with . I could take a similar approach in C — make take an allocator and use it to create the buffer like this: But since this is just a temporary buffer that only exists during the function call, I decided stack allocation was a better choice: allocates memory on a stack with a bounds-checking macro that wraps C's . It moves the stack pointer and gives you a chunk of memory that's automatically freed when the function returns. People often avoid using because it can cause a stack overflow, but using a bounds-checking wrapper fixes this issue. Another common concern with is that it's not block-scoped — the memory stays allocated until the function exits. However, since we only allocate once, this isn't a problem. Here's the simplified C version of : Here, you can see all the parts from this post working together: a function accepting interfaces, slices passed to interface methods, a result type wrapping multiple return values, error sentinels compared by identity, and a stack-allocated buffer used for the copy. Porting Go's package to C meant solving a few problems: representing slices, handling multiple return values, modeling errors, and implementing interfaces using function pointers. None of this needed anything fancy — just structs, unions, functions, and some macros. The resulting C code is more verbose than Go, but it's structurally similar, easy enough to read, and this approach should work well for other Go packages too. The package isn't very useful on its own — it mainly defines interfaces and doesn't provide concrete implementations. So, the next two packages to port were naturally and — I'll talk about those in the next post. In the meantime, if you'd like to write Go that translates to C — with no runtime and manual memory management — I invite you to try Solod . The package is included, of course.

0 views
HeyDingus Yesterday

ADK Climb Club is now web-friendly!

Just finished up a project that I’ve been meaning to get to for a year: bringing ADK Climb Club to the open web. We’ve had a landing page for a while, but all the info about our meetups was going out via Instagram and WhatsApp . But not everyone wants to use those apps, and I heard from them! So, I buckled down and imported all the old posts, and hooked up my auto-crossposter . Now, everything that we post to Instagram shows up on our website as a native, web-friendly blog post. And I enabled (free) email subscriptions (thanks Micro.blog!), so folks can get an email each time that we share information about a meetup. Although Instagram is still our “ primary” platform — that’s where our biggest audience is and where we pick up new members - I feel much better about the club being more accessible on the open web, and that people can stay in the loop with posts pushed out to them without having to sign up for a Meta app. If you’re a climber (or are climbing curious) and near Lake Placid, NY on a Wednesday night, you should come check us out! HeyDingus is a blog by Jarrod Blundy about technology, the great outdoors, and other musings. If you like what you see — the blog posts , shortcuts , wallpapers , scripts , or anything — please consider leaving a tip , checking out my store , or just sharing my work. Your support is much appreciated! I’m always happy to hear from you on social , or by good ol' email .

0 views
Andre Garzia Yesterday

Apple Just Lost Me

# Apple Just Lost Me Apple has just lost me as an user. It will take me a while before I can fully migrate away from their devices, and I suspect I might need to keep a mac around for my work, but I will move all my personal computing to Linux and Android again. I been an Apple user since MacOS 8. I had both a Newton MessagePad 2000 and an eMate 300. I got the original blue toilet-seat iBook G3. I was there for the developer road show introducing MacOS X. I paid for my developer account since then. Recently, I had a Macbook Air, iPhone 17, iPad Mini. I'm gonna throw all of them away — not literally ofc — because of recent slop this company been shipping. It is death through a thousand papercuts. To summarise for yous there are three main issues for me and the last one happened today and is what pushed me through the threshold. ### Gatekeeper I absolutely hate Apple quarantine and gatekeeping of software. As a developer, I should just be able to ship software to those interested in my apps. Be aware that I don't give a flying fuck about mobile development, I'm talking about desktop apps here. I gave in to the Apple racketering scheme and got myself a developer account from the very start. I had to *fax my card details to them*, that is how long I had my account. Even though my software is packaged and notarised as per their requirements, they still show my users a dialog box confirming they want to run my app, something they do not for apps installed through their walled garden. This is just friction to punish developers outside their store. I am very tired of it. ### macOS 26 That has been an absolute fiasco. Liquid glass is completely broken from a design point of view. I have no idea how that got out of the door, and now multiple updates in, it still just as bad. Not only it looks ugly, and that is subjective of course, but it is visually broken. Interfaces built with AppKit or SwiftUI that rendered perfect, are now overlapping controls and clipping stuff. They have no consistency at all in terms of icons, placement, corners... I am not a designer, I don't even care about design much, but when a bad design spreads like ink on a glass of water poisoning my workflows, it is when I notice it. ### Age verification My iPhone updated last night and per UK laws, it introduced age verification. The way Apple decided to implement this is through credit card checking. First it attempted to check my Apple Wallet, it failed even though I have five cards in it and am able to use the App Store fine. Then it moved onto wanting me to manually add a card to verify myself. It failed with all my five cards. Four were debit cards, and one was a credit card from another country, cause you know I am an immigrant who has accounts still in my own original birth place. So it failed age verification and locked me out of many features. Bear in mind, I am 45 years old. I have an Apple account for 25 years, the age of my personal account alone should already verify my age. Credit cards are not documents. Many people don't have them. Apple don't provide any other way to verify your age because they are a stupid American company with American values in which you're just as human as your credit score. Age verification is a scam, but checking it with a credit card is even worse. ## Next steps for me I was already done with Apple for some months now, but due to that happening today, I am angry af and will speed up my plans. I'm tired of devices that are not actually mine, of workflows that without blessing from a higher corporate authority won't work. I'm gonna move back to Linux and Android. > Yeah, I know Google gonna fuck Android soon the same way, but at least with Android you tend to have more options. For my computing needs, I purchased a [MNT Pocket Reform](https://www.crowdsupply.com/mnt/pocket-reform). It will take them a while to assemble and send it to me, but once I have it, my macbook will become a work laptop only. All software I make already ships for Linux. I am considering getting a [Fairphone Gen 6](https://www.fairphone.com/the-fairphone-gen-6). Not sure if I will go with stock Android or their Murena /e/OS version. It depends how the degoggled version handles my banking apps. I might need to go with stock Android. After those two, I plan to assemble a little *homelab* using either a TinyMiniMicro form factor PC running Linux and if I have the budget an ugreen NAS. On those machines, I want to have something to handle my photo backup and shared drive. Will probably use either tailscale or some cloudflare bullshit to connect them to each other. This is it, moving back towards taking control of my computing again.

0 views
Stratechery Yesterday

Arm Launches Own CPU, Arm’s Motivation, Constraints and Systems

Arm is selling its own chips, not just licensing IP. It's a big change compared to Arm's history, but not surprising given how computing is evolving.

0 views

One hundred weirdo emails

I hope I don’t have to spell it out but I will do it anyway: in these cases I don’t know anything about their products and I cannot help them. Quite often I first need to search around only to figure out what the product is or does, that the person asks me about. Over the years I have collected such emails that end up in my inbox. Out of those that I have received, I have cherry-picked my favorites: the best, the weirdest, the most offensive and the most confused ones and I put them up online . A few of then also triggered separate blog posts of their own in the past. They help us remember that the world is complicated and hard to understand . Today, my online collection reached the magical amount: 100 emails. The first one in the stash was received in 2009 and the latest arrived just the other day. I expect I’ll keep adding occasional new ones going forward as well. My email address is spelled out in the curl license The curl license appears in many products Some people have problems with their products and need someone to email A few of these discover my email in their product Occasionally, the person in need of help emails me about their product. I collect some of those and make them public

0 views
(think) Yesterday

Neocaml 0.6: Opam, Dune, and More

When I released neocaml 0.1 last month, I thought I was more or less done with the (main) features for the foreseeable future. The original scope was deliberately small — a couple of Tree-sitter-powered OCaml major modes (for and ), a REPL integration, and not much else. I was quite happy with how things turned out and figured the next steps would be mostly polish and bug fixes. Versions 0.2-0.5 brought polish and bug fixes, but fundamentally the feature set stayed the same. I was even more convinced a grand 1.0 release was just around the corner. I was wrong. Of course, OCaml files don’t exist in isolation. They live alongside Opam files that describe packages and Dune files that configure builds. And as I was poking around the Tree-sitter ecosystem, I discovered that there were already grammars for both Opam and Dune files. Given how simple both formats are (Opam is mostly key-value pairs, Dune is s-expressions), adding support for them turned out to be fairly straightforward. So here we are with neocaml 0.6, which is quite a bit bigger than I expected originally. Note: One thing worth mentioning — all the new modes are completely isolated from the core OCaml modes. They’re separate files with no hard dependency on , loaded only when you open the relevant file types. I didn’t want to force them upon anyone — for me it’s convenient to get Opam and Dune support out-of-the-box (given how ubiquitous they are in the OCaml ecosystem), but I totally get it if someone doesn’t care about this. Let me walk you through what’s new. The new activates automatically for and files. It provides: The flymake backend registers automatically when is found in your PATH, but you need to enable yourself: Flycheck users get support out of the box via Flycheck’s built-in checker — no extra configuration needed. This bridges some of the gap with Tuareg , which also bundles an Opam major mode ( ). The Tree-sitter-based approach gives us more accurate highlighting, and the flymake integration is a nice bonus on top. handles , , and files — all three use the same s-expression syntax and share a single Tree-sitter grammar. You get: This removes the need to install the separate dune package (the standalone maintained by the Dune developers) from MELPA. If you prefer to keep using it, that’s fine too — neocaml’s README has instructions for overriding the entries. Beyond editing Dune files, I wanted a simple way to run Dune commands from any neocaml buffer. is a minor mode that provides keybindings (under ) and a “Dune” menu for common operations: All commands run via Emacs’s , so you get error navigation, clickable source locations, and the full interface for free. With a prefix argument ( ), build, test, and fmt run in watch mode ( ), automatically rebuilding when files change. The command is special — it launches through , so you get the full REPL integration (send region, send definition, etc.) with your project’s libraries preloaded. This mode is completely independent from — it doesn’t care which major mode you’re using. You can enable it in OCaml buffers like this: Both the Opam and Dune Tree-sitter grammars are relatively young and will need some more work for optimal results. I’ve been filing issues and contributing patches upstream to improve them — for instance, the Dune grammar currently flattens field-value pairs in a way that makes indentation less precise than it could be, and neither grammar supports variable interpolation ( ) yet. These are very solvable problems and I expect the grammars to improve over time. At this point I think I’m (finally!) out of ideas for new functionality. This time I mean it! Neocaml now covers pretty much everything I ever wanted, especially when paired with the awesome ocaml-eglot . Down the road there might be support for OCamllex ( ) or Menhir ( ) files, but only if adding them doesn’t bring significant complexity — both are mixed languages with embedded OCaml code, which makes them fundamentally harder to support well than the simple Opam and Dune formats. I hope OCaml programmers will find the new functionality useful. If you’re using neocaml, I’d love to hear how it’s working for you — bug reports, feature requests, and general feedback are all welcome on GitHub . You can find the full list of changes in the changelog . As usual — update from MELPA , kick the tires, and let me know what you think. That’s all I have for you today! Keep hacking! Tree-sitter-based font-lock (field names, strings, operators, version constraints, filter expressions, etc.) Indentation (lists, sections, option braces) Imenu for navigating variables and sections A flymake backend that runs on the current buffer, giving you inline diagnostics for missing fields, deprecated constructs, and syntax errors Font-lock for stanza names, field names, action keywords, strings, module names, library names, operators, and brackets Indentation with 1-space offset Imenu for stanza navigation Defun navigation and support

0 views
André Arko Yesterday

How to Install a Gem

This post was originally given as a talk at SF Ruby Meetup . The slides are also available. Hello, and welcome to How To Install A Gem . My name is André Arko, and I go by @indirect on all the internet services. You might know me from being 1/3 of the team that shipped Bundler 1.0, or perhaps the 10+ years I spent trying to keep RubyGems.org up and running for everyone to use. More recently, I’ve been working on new projects: , a CLI to install Ruby versions and gems at unprecedented speeds, and gem.coop , a community gem server designed from the ground up so Bundler an can install gems faster and more securely than ever before. So, with that introduction out of the way, let’s get started: do you know how to install a gem? Okay, that’s great! You can come up and give this talk instead of me. I’ll just sit over here while you write the rest of this post. Slightly more seriously, do you know how converts the name that you give it into a URL to download a .gem file? It’s called the “compact index”, and we’ll see how it works very soon. Next, who in the audience knows how to unpack a .gem file? Do you know what format .gem files use, and what’s inside them? We’ll look at gem structure and gemspec files as well. Then, do you know where to put the files from inside the gem? Where do all of these files and directories get put on disk so we can use them later? Does anyone know off the top of their head? Once those files have been unpacked into the correct places, the last thing we need to know is how to require them. How do these unpacked files on disk get found by Ruby, so you can and have that actually work? This exercise was mostly to show that using gems every day actually skips over most of the way they work underneath. So let’s look at what a gem is, and examine how they work. By the end of this talk, you’ll know what’s inside a gem, where how RubyGems figures out what to download, and where and how that download gets installed so you can use it. And if you already everything we just talked about, please feel free to go straight to rv.dev and start sending us pull requests! First, we’re going to look at how the name of a gem becomes a URL for a .gem file. Let’s use as our example. Historically, there have been at least five or six different ways to look up information about a gem based on its name, but today there is one canonical way: the compact index. It’s so simple that you can do it yourself using curl. Just run , and you’ll be able to read the exact output that every tool uses to look up the versions of a gem that exist. Each line in the file describes one version of the gem, so let’s look at one line. We can break down that line with , and tackle each part one at a time. First, . That’s the version of that this line is about. So we now know for sure that exists. Next, a list of dependencies. The gem (version ) declares dependencies on a bunch of other gems: , , , , , , , , , , , , and . Each dependency has a version requirement attached, and for almost every gem it is exactly version , and only version . For , Rails is a little bit more flexible, and allows any version and up. The final section contains a checksum, a ruby requirement, and a rubygems requirement. The checksum is a sha256 hash of the .gem file that contains the gem, so after we download the gem we can check to make sure we have the right file by comparing that checksum. For this version of Rails, the required Ruby version is or greater, and the required RubyGems version is or greater. It’s up to the client to do something with that information, but hopefully you’ll see an error if you are using Ruby or RubyGems that’s too old. Great! So now we know the important information: Rails version is real, and strong, and is our friend. We can download it, and check the checksum against the checksum we were given in the info file line. Let’s do that now: Notice that the checksum produced by exactly matches the checksum we previously saw in our line from the info file: . That lets us know that we got the right file, and there were no network or disk errors. Now that we have the gem, we can investigate: what exactly is inside a gem? At this point, we’re going to pivot from the gem to the gem. There’s a good reason for that, and the reason is… the gem doesn’t actually have any files in it. So it’s a bad example. In order to show off what a gem looks like when it has files in it, we’ll use instead. So, we have our .gem file downloaded with curl. What do we do now? The first piece of secret knowledge that we need: gems are tarballs. That means we can open them up with regular old . Let’s try it. So what’s inside the .gem tarball is… another tarball. And also two gzipped files. Let’s look at the files first. As you might expect from its name, the file is a gzipped YAML file, containing checksums for the other two files. It’s maybe a bit silly to have multiple layers of checksumming here, but it does confirm that the outer layer of tarball and zip was removed without any errors. Okay, so what’s inside ? The answer is… Ruby, sort of. It’s a YAML-serialized instance of the class. We can see exactly what was put into this object at the time the gem was built. After snipping out the YAML that lists the dependencies (which we already looked at, because they are included in the info file), what’s left is some relatively simple information about the gem. Author, author’s email, description, homepage, license, various URLs. For the purposes of installing and using the gem, we care about exactly six pieces of information: , , , , , and . We’re going to combine those items with the files in the remaining file to get our unpacked and installed gem. Now that we know what’s in the gem specification, let’s look at what’s inside the data tarball. It matches up very closely with the long list of entries in the array in the gemspec. So now we have a bunch of files. Where are we going to put these files? Enter: the magic of RubyGems. The scheme that RubyGems has come up with is largely shaped by the constraints of how Ruby finds files to require, which we’re going to look at soon. For now, it is enough for us to know that RubyGems keeps track of a list of directories, a lot like the way works for your shell to find commands to run. To find the current directory, you can run . Here’s what that looks like: From this list, we can see that RubyGems organizes its own files into a few directories. To install a gem, we’re going to need to put the files we have into each of those directories, with specific paths and filenames. Just to recap, the files we need to place somewhere are: So let’s move the files into the directories we see RubyGems offers. First, cache the .gem file so RubyGems doesn’t need to download it again later: Then, add the gem specification so that RubyGems will be able to find it. There’s a small twist here, which is that the directory doesn’t contain YAML files, it contains Ruby files. So we also need to convert the YAML file back into a Ruby object, and then write out the Ruby code to create that object into a file that RubyGems can load later. Next, we need to put the files that make up the contents of the gem into the directory. One more thing we need to do: set up the executables provided by the gem. You can check out the files that RubyGems generates by looking in , but for our purposes we just need to tell RubyGems what gem and executable it needs to run, so we can do that: And with that, we’ve installed the gem! You can run the file that we just created to prove it: As we wrap up here, there are three aspects of gems that we haven’t touched on at all: docs, extensions, and plugins. We don’t have time to talk about them today in this meetup talk slot. Hopefully a future (longer) version of this talk will have space to include all of those things, because they are all super interesting, I promise. In the meantime, I will have to direct you to the docs for RDoc to learn more about docs, to the source code of or RubyGems itself if you want to learn more about gem extensions and plugins. There’s one last thing to figure out before we wrap up: how does find a gem for us to be able to use it? To explain that, we’ll have to drop down to some basic Ruby, and then look at the ways that RubyGems monkeypatches Ruby’s basic to make it possible to have gems with versions. The first thing to know about is that it works exactly like does in your shell. There’s a global Ruby variable named , and it’s an array of paths on disk. When you try to require something, Ruby goes and looks inside each of those paths to see if the thing you asked for is there. You can test this out for yourself in just a few seconds! Let’s try it. The Ruby CLI flag lets you add directories to the variable, and then the function looks inside that directory to find a file with the name that you gave to require. No magic, just a list to check against for files on disk. Now that you understand how the variable makes work, how does RubyGems work? You can’t just put ten different versions of into the and expect to still work. RubyGems handles multiple versions of the same file by monkeypatching . Let’s look at what happens when we , which is a file located inside the gem that we just installed. RubyGems starts by looking at all of the gem specifications, including the one we saved earlier. In each specification, it combines the name and version with the values in to come up with a path on disk. So for our just-installed gem, that would mean a path of: . RubyGems knows that directory contains a file named , so it is a candidate to be “activated”, which is what RubyGems calls it when a gem is added to your . As long as internal bookkeeping shows that no other versions of have already been added to the , we’re good! RubyGems adds this specific directory to the , and delegates to the original implementation of . Require finds the file at , reads it, and evaluates it. With that, we’ve done it! We have found, downloaded, unpacked, and installed a gem so that Ruby is able to run a command and load ruby files, without ever touching the command. If you’re interested in contributing to an open source project that works a lot with gems, we would love to work with you on , where we are working to create the fastest Ruby and gem manager in the world. And of course, if your company could use faster, easier, or more secure gems for developers, for CI, and for production deployments, we can help. We’d love to talk to you and you can find our contact information at spinel.coop . railties-8.1.3.gem (the .gem file itself) metadata.gz (the YAML Gem::Specification object from inside the gem) the unpacked data.tar.gz files (the contents of the gem)

0 views

Troubleshooting Your Claude MCP Configuration

These days I add MCP support for pretty much every software product I build, including most recently IterOps and SecurityBot.dev . Creating the MCP server is very easy because I build all of my SaaS products using Laravel, and Laravel offers native MCP support . What's less clear is how to configure the MCP client to talk to the MCP server. This is because many MCP servers use to call the MCP server URL. This is easy enough, however if you're running NVM to assist with handling Node version discrepancies across multiple projects, then you might need to explicitly define the npx path inside the file, like this: If you're using Laravel Herd and the MCP client is crashing once Claude loads, it might be because you're using Herd's locally generated SSL certificates. The mcp-remote package doesn't like this and will complain about the certificate not being signed. You can tell mcp-remote to ignore this by adding the environment variable:

0 views

Classical Software

One of the things that keeps coming up in my conversations about AI and software is the difference between software with an agent or LLM in the loop and what we've always thought of as "software". It's really hard to talk about because, as far as I can tell, there's not yet a distinct name for what we used to just call "software." So I'm picking one. Agentic software is comparatively expensive to operate and sometimes has opinions, making it a lot harder to reason about. Classical software without any AI in the loop is cheaper and easier to reason about. It's much more likely to be deterministic. It's not capable of some of the neat tricks that agentic software is. And there are a lot of times when you want that. I'd go so far as to say that if a given piece of software can be reasonably built to operate correctly without any runtime AI, it should be. This is something I run into a lot as I build tools on top of a coding agent using skills. Part of what makes skills so powerful is that they are subject to judgement calls by the agent using them. But that's also what can make them kind of a disaster. If a new model suddenly decides to interpret the text through a slightly different lens, suddenly your reliable process becomes less than reliable. This is a thing I've seen with Opus 4.6 and Superpowers 4. Sometimes, the Opus agent coordinating an implementation run would decide that instead of letting a subagent do code review, it should do the code review itself because the code was "straightforward." That led to the coordinating agent blowing out its context window. In that particular instance, the quick fix was adding additional context to the "subagent driven development" skill the coordinator was using to explain why we told it to use disposable subagents for code reviews. Moving delegation decisions out of the agentic loop that's role-playing as an orchestrator and into a classical program that can't deviate from the prescribed process would absolutely fix that class of problem, even if it made a bunch of other stuff tricker. The reason I'm writing about this isn't so much about when to pick agentic software and when to pick classical software as about explaining what I mean when I say "classical software." Classical Software is software that's expected to be deterministic, written in a programming language, and executed by a computer. (Before you email me about how computers actually work, I'm deliberately ignoring things like floating point math, explicit random number generation, and race conditions. You know what I meant.)

0 views

Kaktovik Numerals

Read on the website: Kaktovik numerals are a surprisingly good counting system. It allows many arithmetic operations to be done visually and effortlessly. Though it takes some getting used to. Thus this page!

0 views
Giles's blog Yesterday

Writing an LLM from scratch, part 32g -- Interventions: weight tying

In Sebastian Raschka 's book " Build a Large Language Model (from Scratch) ", he writes that weight tying, while it reduces the parameter count of a model, in his experience makes it worse. As such, apparently people don't use it in modern LLMs. Intuitively, that makes sense -- I'll explain why in this post. But as I'm trying various interventions to see if I can get my model -- based on Raschka's code, but trained for a fraction of the time that the original GPT-2 model was -- to perform as well as the original in terms of the loss it gets on a test set, I thought it would be worth seeing if it really is a negative for this particular tiny model of 163M parameters. After all, the original weights use weight tying, and I did find that QKV bias appeared to help -- and that's another old-school technique that they used, which has since dropped out of fashion. Might this one help too? Worth a try! Let's give it a go. I'll start with a quick refresher on what weight tying is, and how it works. This is really targeted at people who've been reading along with this series -- if it's all new to you, you might find my post on Maths for LLMs a useful catch-up guide first. In our LLM code, right at the start, we use an embedding layer to take our input token IDs, and turn them into embeddings -- each token becomes a vector in a high-dimensional space (768 in our case), which we see as representing in some manner the "meaning" of the token. A useful way to think about that is that we could start with a one-hot vector for the token -- that is, with our 50,257-token vocabulary, it would be 50,257 items long, and have zeros in every position apart from the position corresponding to the token's ID. We'll treat that as being a vector in a "vocab space". The process of converting the token into an embedding turns out to be equivalent to multiplying that vocab space representation by an embedding matrix -- one with one row per possible token, the values in that row being the values for the appropriate embedding. 1 Because matrix multiplications can be seen as projections between different spaces, we can see that as a projection from our vocab space to the embedding space. Once we've projected our sequence of tokens into a sequence of embeddings, we do all of the steps required for the LLM -- we add in positional information, run it through the Transformers layers, normalise it, and then we have a new sequence of embeddings. The embedding at position n in that output sequence, if our model is working well, should be something that represents an appropriate next-token prediction for the portion of the input sequence from zero to position n . What we want as our final output is to map that back to the vocab space. We want logits: a list of numbers that (after being run through softmax) will represent the probability that our next token is a particular one. Just as we mapped from vocab space to embedding space with (conceptually) a matrix multiplication at the start of the process, we can map back with another one. More specifically, if we treat the embedding matrix as having the same number of rows as there are input tokens (which we'll call d vocab ) and columns as there are embedding dimensions ( d emb ), then the original vocab-space-to-embedding-space matrix will have this shape: So it's projecting from a d vocab -dimensional space to a d emb -dimensional one. Similarly, our matrix to do the projection at the end is just a matrix with the numbers of rows and columns swapped around: ...to do a projection in the other direction. The trick with weight tying is to see that these two projections can potentially be just the opposite of each other. If we assume that the embedding space on the way in to the LLM is essentially the same as the embedding space on the way out, then we can use one projection to go into it from vocab space, and the opposite to go back. The "opposite" in this case is the transpose -- that is, if we use W emb for our embedding matrix and W out for the output one, we have: That means we can re-use all of the embedding parameters for the output projection matrix, and fewer parameters means not only a smaller model, but hopefully faster training. Sounds like a win! But of course, there's no such thing as a free lunch. By constraining the output head to be the transpose of the input one, we're essentially enforcing that assumption above: we're saying that the embedding space on the way out must be the same as the embedding space on the way in. That limits what the LLM can do -- if it were able to use different embedding spaces at each end, it would have more flexibility, which might help it learn to model things better. That's the theory: what does it mean in practice? Let's take a quick look at the GPT-2 code -- just the for the top level class: For our embedding layer, we use PyTorch's class, and for the output head we use . Now, provides us with access to the underlying matrix with a field: (Tensor) -- the learnable weights of the module of shape ( , ) initialized from 𝒩 ( 0 , 1 ) . So, that's exactly the d vocab × d emb matrix that we'd expect -- it's the input dimension as the rows, and the output dimension as the columns. If we look at , we see something very similar: weight (torch.Tensor) – the learnable weights of the module of shape ( , ) The values are initialized from 𝒰 ( − k , k ) where k = 1 in_features That's actually the other way around, output dimension as the rows and input as the columns. If you're wondering why, remember that we transpose the weights matrix for a neural network before using it . But that's actually really convenient in our situation, because if we want to use the same weights for both, they're already "compatible"! And that means that adding weight tying to our code above is as simple as adding two lines at the end: For the model code, it literally is just that! There is a tiny inefficiency in that PyTorch is going to spend a bit of time initialising the weights in to appropriately-sized random values, only to have them all replaced -- but that actually works in our favour, because it means that we'll use up the same amount of the random number stream when creating the LLM in both the weight-tying and non-weight-tying cases, which is a bit better for reproducibility. There is one other change needed, though. I ran a test train with that code, and checkpointing failed like this: Safetensors doesn't like it when you reuse weights like we're doing here. The good news is that the help page the error links to is exactly about this problem with weight tying, and the suggested fix -- to replace ...and similarly for loading -- appears to work fine. Saving and loading checkpoints works, and it's compatible with the old checkpoint files too. So that's good news :-) So, that's how we code it. How much actual saving do we get in terms of the parameter count by doing this? A quick-and-easy way to count the parameters is just to create an instance of the model and see: So, we've gone from a 163M-parameter model to a 124M-parameter one. That's certainly quite some saving -- 38,597,376 fewer parameters, which is a reduction of almost a quarter. We can also sanity check the size of that saving -- our output head was, as we know, a d emb × d vocab matrix, so it should have 50257 × 768 parameters -- which is, indeed, 38,597,376. Excellent. Now, there's one thing we should consider here. We're training on a Chinchilla-optimal number of tokens, 20x our parameter count. Is that what we want to keep stable? Or is the total number of training tokens the important bit, so we wind up technically overtraining? My instinct is that the total training tokens is the important thing. Chinchilla optimality is a training heuristic rather than a true aspect of the model, so sticking with it would mean that we're training a model with fewer parameters on less data. It seems very unlikely that would do anything other than produce a worse model! So: we'll keep the same number of training tokens, and just introduce weight tying. How does it train? I kicked it off on the usual 8x A100 40 GiB machine, and after a little while I checked the loss chart. It looked like this: Yikes! It started off with a loss of about 460. Normally, we start with a loss of about 11. The normal loss makes a lot of sense. If you consider it in terms of perplexity, that value of 11 comes out at e 11 ≈ 59 , 874 -- that is, the model is giving pretty much equal probabilities to every one of the 50,257 possible tokens. A loss of 460 means that the model is making incorrect predictions and is very certain about them. How could that be? Well, let's look at the documentation again. (Tensor) -- the learnable weights of the module of shape ( , ) initialized from 𝒩 ( 0 , 1 ) . weight (torch.Tensor) – the learnable weights of the module of shape ( , ) The values are initialized from 𝒰 ( − k , k ) where k = 1 in_features They're initialised completely differently. Embeddings are set to values in a normal distribution (that is, a Gaussian bell curve) with a mean of 0 and a standard deviation of 1. But linear layers are set to random values in a uniform distribution (that is, a completely flat one) within a range based on the number of input features. In particular, those numbers for the linear layer are really small! Our output head has set to 768, so that means that the k would be: So instead of getting that kind of "ideal" linear layer initialisation within the range ( − 0.0360 , 0.0360 ) , we're getting numbers which roughly 2/3 of the time will be in the range ( − 1 , 1 ) , and the rest of the time will be even further from zero -- we could be getting -3 or +4, or potentially even crazier numbers! That means that the output logits (coming from a linear layer with higher weights) will be larger, which in turn will push softmax to come up with higher probabilities: I considered changing things to initialise the weights differently, but given that the loss had fallen to 8 or so by the second checkpoint, I decided to just let the run complete. Here's the final loss chart, with the Y axis fixed to run from 0 to 12: That's a nice smooth curve, at least! The output is: Timing-wise, that's about 180 seconds faster than our baseline model training run, only a 1.5% speedup -- clearly the lower number of parameters doesn't actually save us much time. Loss-wise, the final train loss on the baseline model was 3.743, so that's not particularly promising. Still, the proof is, as ever, in the evals. Smoke test first: Borderline coherent, but maybe worse than normal? Let's see what our test set loss looks like. That's bad -- let's see it in our comparison table: Our worst model so far :-( Weight tying certainly didn't help our train. It is worth noting that the GPT-2 small weights -- which do use it -- got 3.500 on the same test set as we're using for that table, so it is possible to get a better model with weight tying. But there was clearly something different about their train, and my suspicion, as I've said before, is that it was trained for many more epochs ( I estimated 40 ), slowly grinding that loss down. But what I'm trying to do in this mini-series of interventions is find tricks that will allow us to approach the original weights' loss without a very long training run. And for the purposes of that, I think we can safely say that weight-tying is not one of those. Next time around, our last intervention test! What happens if we switch off the use of automated mixed precision (AMP)? That is something I added right back at the start as a performance enhancement; it means that PyTorch can do certain calculations in 16-bit rather than 32-bit if it thinks there's no harm in doing so. Might we get better loss by training without it? In reality we don't multiply a one-hot vector by a matrix, as that would be extremely inefficient -- PyTorch just does a lookup into the embedding matrix. If we get token ID 1234, then it just reads out the contents of row 1234, and that's our embedding. But for the purposes of this post, it's best to see that as more of a (extremely effective) performance tweak rather than what's happening conceptually.  ↩ In reality we don't multiply a one-hot vector by a matrix, as that would be extremely inefficient -- PyTorch just does a lookup into the embedding matrix. If we get token ID 1234, then it just reads out the contents of row 1234, and that's our embedding. But for the purposes of this post, it's best to see that as more of a (extremely effective) performance tweak rather than what's happening conceptually.  ↩

0 views
Jim Nielsen Yesterday

Code as a Tool of Process

Steve Krouse wrote a piece that has me nodding along: Programming, like writing, is an activity, where one iteratively sharpens what they're doing as they do it. (You wouldn't believe how many drafts I've written of this essay.) There’s an incredible amount of learning and improvement, i.e. sharpening , to be had through the process of iteratively building something. As you bring each aspect of a feature into reality, it consistently confronts you with questions like, “But how will this here work?” And “Did you think of that there ?” If you jump over the process of iteratively building each part and just ask AI to generate a solution, you miss the opportunity of understanding the intricacies of each part which amounts to the summation of the whole. I think there are a lot of details that never bubble to the surface when you generate code from English as it’s simply not precise enough for computers . Writing code is a process that confronts you with questions about the details. If you gloss over the details, things are going to work unexpectedly and users will discover the ambiguity in your thinking rather than you (see also: “bugs”). Writing code is a tool of process. As you go, it sharpens your thinking and helps you discover and then formulate the correctness of your program. If you stop writing code and start generating it, you lose a process which helped sharpen and refine your thinking. That’s why code generation can seem so fast: it allows you to skip over the slow, painful process of sharpening without making it obvious what you’re losing along the way. You can’t understand the trade-offs you’re making, if you’re not explicitly confronted with making them. To help me try to explain my thinking (and understand it myself), allow me a metaphor. Imagine mining for gold. There are gold nuggets in the hills. And we used to discover them by using pick axes and shovels. Then dynamite came along. Now we just blow the hillside away. Nuggets are fragmented into smaller pieces. Quite frankly, we didn’t even know if there were big nuggets or small flecks in the hillside because we just blasted everything before we found anything. After blasting, we take the dirt and process it until all we have left is a bunch of gold — most likely in the form of dust. So we turn to people, our users, and say “Here’s your gold dust!” But what if they don’t want dust? What if they want nuggets? Our tools and their processes don’t allow us to find and discover that anymore. Dynamite is the wrong tool for that kind of work. It’s great in other contexts. If you just want a bunch of dust and you’re gonna melt it all down, maybe that works fine. But for finding intact, golden nuggets? Probably not. It’s not just the tool that helps you, it’s the process the tool requires. Picks and shovels facilitate a certain kind of process. Dynamite another. Code generation is an incredible tool, but it comes with a process too. Does that process help or hurt you achieve your goals? It’s important to be cognizant of the trade-offs we make as we choose tools and their corresponding processes for working because it’s trade-offs all the way down. Reply via: Email · Mastodon · Bluesky

0 views

The AI Industry Is Lying To You

Hi! If you like this piece and want to support my independent reporting and analysis, why not subscribe to my premium newsletter? It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5000 to 18,000 words, including vast, detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large . I just put out a massive Hater’s Guide To The SaaSpocalypse , as well as the Hater’s Guide to Adobe . It helps support free newsletters like these! The entire AI bubble is built on a vague sense of inevitability — that if everybody just believes hard enough that none of this can ever, ever go wrong that at some point all of the very obvious problems will just go away. Sadly, one cannot beat physics. Last week, economist Paul Kedrosky put out an excellent piece centered around a chart that showed new data center capacity additions (as in additions to the pipeline, not brought online ) halved in the fourth quarter of 2025 (per data from Wood Mackenzie ):   Wood Mackenzie’s report framed it in harsh terms: As I said above, this refers only to capacity that’s been announced rather than stuff that’s actually been brought online , and Kedrosky missed arguably the craziest chart — that of the 241GW of disclosed data center capacity, only 33% of it is actually under active development: The report also adds that the majority of committed power (58%) is for “wires-only utilities,” which means the utility provider is only responsible for getting power to the facility, not generating the power itself, which is a big problem when you’re building entire campuses made up of power-hungry AI servers.  WoodMac also adds that PJM, one of the largest utility providers in America, “...remains in trouble, with utility large load commitments three times as large as the accredited capacity in PJM’s risked generation queue,” which is a complex way of saying “it doesn’t have enough power.”  This means that fifty eight god damn percent of data centers need to work out their own power somehow. WoodMac also adds there is around $948 billion in capex being spent in totality on US-based data centers, but capex growth decelerated for the first time since 2023 . Kedrosky adds: Let’s simplify: The term you’re looking for there is data center absorption, which is (to quote Data Center Dynamics) “...the net growth in occupied, revenue-producing IT load,” which grew in America’s primary markets from 1.8GW in new capacity in 2024 to 2.5GW of new capacity in 2025 according to CBRE .   The problem is, this number doesn’t actually express newly-turned-on data centers. Somebody expanding a project to take on another 50MW still counts as “new absorption.”  Things get more confusing when you add in other reports. Avison Young’s reports about data center absorption found 700MW of new capacity in Q1 2025 , 1.173GW in Q2 , a little over 1.5GW in Q3 and 2.033GW in Q4 (I cannot find its Q3 report anywhere), for a total of 5.44GW, entirely in “colocation,” meaning buildings built to be leased to others. Yet there’s another problem with that methodology: these are facilities that have been “delivered” or have a “committed tenant.” “Delivered” could mean “the facility has been turned over to the client, but it’s literally a powered shell (a warehouse) waiting for installation,” or it could mean “the client is up and running.” A “committed tenant” could mean anything from “we’ve signed a contract and we’re raising funds” (such as is the case with Nebius raising money off of a Meta contract to build data centers at some point in the future ). We can get a little closer by using the definitions from DataCenterHawk (from whichAvison Young gets its data), which defines absorption as follows :  That’s great! Except Avison Young has chosen to define absorption in an entirely different way — that a data center (in whatever state of construction it’s in) has been leased, or “delivered,” which means “a fully ready-to-go data center” or “an empty warehouse with power in it.”  CBRE, on the other hand, defines absorption as “net growth in occupied, revenue-producing IT load,” and is inclusive of hyperscaler data centers. Its report also includes smaller markets like Charlotte, Seattle and Minneapolis, adding a further 216MW in absorption of actual new, existing, revenue-generating capacity. So that’s about 2.716GW of actual, new data centers brought online. It doesn’t include areas like Southern Virginia or Columbus, Ohio — two massive hotspots from Avison Young’s report — and I cannot find a single bit of actual evidence of significant revenue-generating, turned-on, real data center capacity being stood up at scale. DataCenterMap shows 134 data centers in Columbus , but as of August 2025, the Columbus area had around 506MW in total according to the Columbus Dispatch, though Cushman and Wakefield claimed in February 2026 that it had 1.8GW . Things get even more confusing when you read that Cushman and Wakefield estimates that around 4GW of new colocation supply was “delivered” in 2025, a term it does not define in its actual report, and for whatever reason lacks absorption numbers. Its H1 2025 report , however, includes absorption numbers that add up to around 1.95GW of capacity…without defining absorption, leaving us in exactly the same problem we have with Avison Young.  Nevertheless, based on these data points, I’m comfortable estimating that North American data center absorption — as the IT load of data centers actually turned on and in operation — was at around 3GW for 2025 , which would work out to about 3.9GW of total power. And that number is a fucking disaster. Earlier in the year, TD Cowen’s Jerome Darling told me that GPUs and their associated hardware cost about $30 million a megawatt. 3GW of IT load (as in the GPUs and their associated gear’s power draw) works out to around $90 billion of NVIDIA GPUs and the associated hardware, which would be covered under NVIDIA’s “data center” revenue segment: America makes up about 69.2% of NVIDIA’s revenue, or around $149.6 billion in FY2026 (which runs, annoyingly, from February 2025 to January 2026). NVIDIA’s overall data center segment revenue was $195.7 billion, which puts America’s data center purchases at around $135 billion, leaving around $44 billion of GPUs and associated technology uninstalled. With the acceleration of NVIDIA’s GPU sales, it now takes about 6 months to install and operationalize a single quarter’s worth of sales. Because these are Blackwell (and I imagine some of the new next generation Vera Rubin) GPUs, they are more than likely going to new builds thanks to their greater power and cooling requirements, and while some could in theory be going to old builds retrofitted to fit them, NVIDIA’s increasingly-centralized (as in focused on a few very large customers) revenue heavily suggests the presence of large resellers like Dell or Supermicro (which I’ll get to in a bit) or the Taiwanese ODMs like Foxconn and Quanta who manufacture massive amounts of servers for hyperscaler buildouts.  I should also add that it’s commonplace for hyperscalers to buy the GPUs for their colocation partners to install, which is why Nebius and Nscale and other partners never raise more than a few billion dollars to cover construction costs.  It’s becoming very obvious that data center construction is dramatically slower than NVIDIA’s GPU sales, which continue to accelerate dramatically every single quarter. Even if you think AI is the biggest most hugest and most special boy: what’s the fucking point of buying these things two to four years in advance? Jensen Huang is announcing a new GPU every year!  By the time they actually get all the Blackwells in Vera Rubin will be two years old! And by the time we install those Vera Rubins, some other new GPU will be beating it!  Before we go any further, I want to be clear how difficult it is to answer the question “how long does a data center take to build?”. You can’t really say “[time] per megawatt” because things become ever-more complicated with every 100MW or so. As I’ll get into, it’s taken Stargate Abilene two years to hit 200MW of power . Not IT load. Power .  Anyway, the question of “how much data center capacity came online?” is pretty annoying too.  Sightline ’s research — which estimated that “almost 6GW of [global data center power] capacity came online last year” — found that while 16GW of capacity was slated to come online in 2026 across 140 projects, only 5GW is currently under construction, and somehow doesn’t say that “maybe everybody is lying about timelines.” Sightline believes that half of 2026’s supposed data center pipeline may never materialize, with 11GW of capacity in the “announced” stage with “...no visible construction progress despite typical build timelines of 12-18 months.” “Under construction” also can mean anything from “ a single steel beam ” to “nearly finished.” These numbers also are based on 5GW of capacity , meaning about 3.84GW of IT load, or about $111.5 billion in GPUs and associated gear, or roughly 57.5% of NVIDIA’s FY2026 revenue that’s actually getting built. Sightline (and basically everyone else) argues that there’s a power bottleneck holding back data center development, and Camus explains that the biggest problem is a lack of transmission capacity (the amount of power that can be moved) and power generation (creating the power itself):  Camus adds that America also isn’t really prepared to add this much power at once: Nevertheless, I also think there’s another more-obvious reason: it takes way longer to build a data center than anybody is letting on, as evidenced by the fact that we only added 3GW or so of actual capacity in America in 2025. NVIDIA is selling GPUs years into the future, and its ability to grow, or even just maintain its current revenues, depends wholly on its ability to convince people that this is somehow rational. Let me give you an example. OpenAI and Oracle’s Stargate Abilene data center project was first announced in July 2024 as a 200MW data center . In October 2024, the joint venture between Crusoe, Blue Owl and Primary Digital Infrastructure raised $3.4 billion , with the 200MW of capacity due to be delivered “in 2025.” A mid-2025 presentation from land developer Lancium said it would have “1.2GW online by YE2025.” In a May 2025 announcement , Crusoe, Blue Owl, and Primary Digital Infrastructure announced the creation of a $15 billion joint vehicle, and said that Abilene would now be 8 buildings, with the first two buildings being energized by the “first half of 2025,” and that the rest would be “energized by mid-2026.” Each building would have 50,000 GPUs, and the total IT load is meant to be 880MW or so, with a total power draw of 1.2GW.  I’m not interested in discussing OpenAI not taking the supposedly-planned extensions to Abilene because it never existed and was never going to happen .  In December 2025, Oracle stated that it had “delivered” 96,000 GPUs , and in February, Oracle was still only referring to two buildings , likely because that’s all that’s been finished. My sources in Abilene tell me that Building Three is nearly done, but…this thing is meant to be turned on in mid-2026. Developer Mortensen claims the entire project will be completed by October 2026 , which it obviously, blatantly won’t. I hate to speak in conspiratorial terms, but this feels like a blatant coverup with the active participation of the press. CNBC reported in September 2025 that “ the first data center in $500 billion Stargate project is open in Texas ,” referring to a data center with an eighth of its IT load operational as “online” and “up and running,” with Crusoe adding two weeks later that it was “live,” “up and running” and “continuing to progress rapidly,” all so that readers and viewers would think “wow, Stargate Abilene is up and running” despite it being months if not years behind schedule. At its current rate of construction, Stargate Abilene will be fully built sometime in late 2027. Oracle’s Port Washington Data Center, as of March 6 2026, consisted of a single steel beam . Stargate Shackelford Texas broke ground on December 15 2025 , and as of December 2025, construction barely appears to have begun in Stargate New Mexico . Meta’s 1GW data center campus in Indiana only started construction in February 2026 .  And, despite Microsoft trying to mislead everybody that its Wisconsin data center had ‘arrived” and “been built,” looking even an inch deeper suggests very little has actually come online” — and, considering the first data center was $3.3 billion ( remember: $14 million a megawatt just for construction), I imagine Microsoft has successfully brought online about 235MW of power for Fairwater. What Microsoft wants you to think is it brought online gigawatts of power (always referred to in the future tense), because Microsoft, like everybody else, is building data centers at a glacial pace, because construction takes forever, even if you have the power, which nobody does! The concept of a hundred-megawatt data center is barely a few years old, and I cannot actually find a built, in-service gigawatt data center of any kind, just vague promises about theoretical Stargate campuses built for OpenAI, a company that cannot afford to pay its bills.  Everybody keeps yammering on about “what if data centers don’t have power” when they should be thinking about whether data centers are actually getting built. Microsoft proudly boasted in September 2025 about its intent to build “the UK’s largest supercomputer” in Loughton, England with Nscale, and as of March 2026, it’s literally a scaffolding yard full of pylons and scrap metal . Stargate Abilene has been stuck at two buildings for upwards of six months.  Here’s what’s actually happening: data center deals are being funded by eager private credit gargoyles that don’t know shit about fuck. These deals are announced, usually by overly-eager reporters that don’t bother to check whether the previous data centers ever got built, as massive “multi-gigawatt deals,” and then nobody follows up to check whether anything actually happened.  All that anybody needs to fund one of these projects is an eager-enough financier and a connection to NVIDIA. All Nebius had to do to raise $3.75 billion in debt was to sign a deal with Meta for data center capacity that doesn’t exist and will likely take three to four years to build (it’s never happening). Nebius has yet to finish its Vineland, New Jersey data center for Microsoft , which was meant to be “ at 100MW ” by the end of 2025, but appears to have only had 50MW (the first phase) available as of February 2026 .  I’m just gonna come out and say it: I think a lot of these data center deals are trash, will never get built, and thus will never get paid. The tech industry has taken advantage of an understandable lack of knowledge about construction or power timelines in the media to pump out endless stories about “data center capacity in progress” as a means of obfuscating an ever-growing scandal: that hundreds of billions of NVIDIA GPUs got sold to go in projects that may never be built. These things aren’t getting built, or if they’re getting built, it’s taking way, way longer than expected, which means that interest on that debt is piling up. The longer it takes, the less rational it becomes to buy further NVIDIA GPUs — after all, if data centers are taking anywhere from 18 months to three years to build, why would you be buying more of them? Where are you going to put them, Jensen? This also seriously brings into question the appetite that private credit and other financiers have for funding these projects, because much of the economic potential comes from the idea that these projects get built and have stable tenants. Furthermore, if the supply of AI compute is a bottleneck, this suggests that when (or if) that bottleneck is ever cleared, there will suddenly be a massive supply glut, lowering the overall value of the data centers in progress…which are, by the way, all filled with Blackwell GPUs, which will be two or three-years-old by the time the data centers are finally turned on. That’s before you get to the fact that the ruinous debt behind AI data centers makes them all remarkably unprofitable , or that their customers are AI startups that lose hundreds of millions or billions of dollars a year , or that NVIDIA is the largest company on the stock market, and said valuation is a result of a data center construction boom that appears to be decelerating and even if it wasn’t operating at a glacial pace compared to NVIDIA’s sales . Not to sound unprofessional or nothing, but what the fuck is going on? We have 241GW of “planned” capacity in America, of which only 79.5GW of which is “under active development,” but when you dig deeper, only 5GW of capacity is actually under construction?   The entire AI bubble is a god damn mirage. Every single “multi-gigawatt” data center you hear about is a pipedream, little more than a few contracts and some guys with their hands on their hips saying “brother we’re gonna be so fuckin’ rich!” as they siphon money from private credit — and, by extension, you, because where does private credit get its capital from? That’s right. A lot comes from pension funds and insurance companies. Here’s the reality: data centers take forever. Every hyperscaler and neocloud talking about “contracted compute” or “planned capacity” may as well be telling you about their planned dinners with The Grinch and Godot. The insanity of the AI buildout will be seen as one of the largest wastes of capital of all time ( to paraphrase JustDario ), and I anticipate that the majority of the data center deals you’re reading about simply never get built. The fact that there’s so much data about data center construction and so little data about completed construction suggests that those preparing the reports are in on the con. I give credit to CBRE, Sightline and Wood Mackenzie for having the courage to even lightly push back on the narrative, even if they do so by obfuscating terms like “capacity” or “power” in ways that reporters and other analysts are sure to misinterpret. Hundreds of billions of dollars have been sunk into buying GPUs, in some cases years in advance, to put into data centers that are being built at a rate that means that NVIDIA’s 2025 and 2026 revenues will take until 2028 to 2029 to actually operationalize, and that’s making the big assumption that any of it actually gets built. I think it’s also fair to ask where the money is actually going. 2025’s $178.5 billion in US-based data center deals doesn’t appear to be resulting in any immediate (or even future) benefit to anybody involved. I also wonder whether the demand actually exists to make any of this worthwhile, or what people are actually paying for this compute.  If we assume 3GW of IT load capacity was brought online in America, that should (theoretically) mean tens of billions of dollars of revenue thanks to the “insatiable demand for AI” — except nobody appears to be showing massive amounts of revenue from these data centers.  Applied Digital only had $144 million in revenue in FY2025 (and lost $231 million making it). CoreWeave, which claimed to have “ 850MW of active power (or around 653MW of IT load)” at the end of 2025 (up from 420MW in Q1 FY2025 , or 323MW of IT load), made $5.13 billion of revenue (and lost $1.2 billion before tax ) in FY2025 .  Nebius? $228 million, for a loss of $122.9 million on 170MW of active power (or around 130MW of IT load). Iren lost $155.4 million on $184.7 million last quarter , and that’s with a release of deferred tax liabilities of $182.5 million. Equinix made about $9.2 billion in revenue in its last fiscal year , and while it made a profit , it’s unclear how much of that came from its large and already-existent data center portfolio , though it’s likely a lot considering Equinix is boasting about its “multi-megawatt” data center plans with no discussion of its actual capacity . And, of course, Google, Amazon, and Microsoft refuse to break out their AI revenues. Based on my reporting from last year , OpenAI spent about $8.67 billion on Azure through September 2025, and Anthropic around $2.66 billion in the same period on Amazon Web Services . As the two largest consumers of AI compute, this heavily suggests that the actual demand for AI services is pretty weak, and mostly taken up by a few companies (or hyperscalers running their own services.)  At some point reality will set in and spending on NVIDIA GPUs will have to decline. It’s truly insane how much has been invested so many years in the future, and it’s remarkable that nobody else seems this concerned. Simple questions like “where are the GPUs going?” and “how many actual GPUs have been installed?” are left unanswered as article after article gets written about massive, multi-billion dollar compute deals for data centers that won’t be built before, at this rate, 2030.  And I’d argue it’s convenient to blame this solely on power issues, when the reality is clearly based on construction timelines that never made any sense to begin with. If it was just a power issue, more data centers would be near or at the finish line, waiting for power to be turned on. Instead, well-known projects like Stargate Abilene are built at a glacial pace as eager reporters claim that a quarter of the buildings being functional nearly a year after they were meant to be turned on is some sort of achievement. Then there’s the very, very obvious scandal that NVIDIA, the largest company on the stock market, is making hundreds of billions of dollars of revenue on chips that aren’t being installed. It’s fucking strange, and I simply do not understand how it keeps beating and raising expectations every quarter given the fact that the majority of its customers are likely going to be able to use their current purchases in the next decade. Assuming that Vera Rubin actually ships in 2026, it’s reasonable to believe that people will be installing these things well into 2028, if not further, and that’s assuming everything doesn’t collapse by then. Why would you bother? What’s the point, especially if you’re sitting on a pile of Blackwell GPUs?  Why are we doing any of this?  Last week also featured a truly bonkers story about Supermicro, a reseller of GPUs used by CoreWeave and Crusoe, where co-founder Wally Liaw and several other co-conspirators were arrested for selling hundreds of millions of dollars of NVIDIA GPUs to China , with the intent to sell billions more.  Liaw, one of Supermicro’s co-founders, previously resigned in a 2018 accounting scandal where Supermicro couldn’t file its annual reports, only to be (per Hindenburg Research’s excellent report ) rehired in 2021 as a consultant , and restored to the board in 2023, per a filed 8K .  Mere days before his arrest, Liaw was parading around NVIDIA’s GTC conference , pouring unnamed liquids in ice luges and standing two people away from NVIDIA CEO Jensen Huang. Liaw was also seen congratulating the CEO of Lambda on its new CFO appointment on LinkedIn , as well as shaking hands (along with Supermicro CEO Charles Liang, who has not been arrested or indicted) with Crusoe (the company building OpenAI’s Abilene data center) CEO Chase Lochmiller .  Supermicro isn’t named in the indictment for reasons I imagine are perfectly normal and not related to keeping the AI party going . Nevertheless, Liaw and his co-conspirators are accused of shipping hundreds of millions of dollars’ worth of NVIDIA GPUs to China through a web of counterparties and brokers, with over $510 million of them shipped between April and mid-May 2025. While the indictment isn’t specific as to the breakdown, it confirms that some Blackwell GPUs made it to China, and I’d wager quite a few. The mainstream media has already stopped thinking about this story, despite Supermicro being a huge reseller of NVIDIA gear, contributing billions of dollars of revenue, with at least $500 million of that apparently going to China. The fact that Supermicro wasn’t specifically named in the case is enough to erase the entire tale from their minds, along with any wonder about how NVIDIA, and specifically Jensen Huang, didn’t know. This also isn’t even close to the only time this has happened. Late last year, Bloomberg reported on Singapore-based Megaspeed — a (to quote Bloomberg) “once-obscure spinoff of a Chinese gaming enterprise [that] evolved into the single largest Southeast Asian buyer of NVIDIA chips” — and highlighted odd signs that suggest it might be operating as a front for China.  As a neocloud, Megaspeed rents out AI compute capacity like CoreWeave, and while NVIDIA (and Megaspeed) both deny any of their GPUs are going to China, Megaspeed, to quote Bloomberg, has “something of a Chinese corporate twin”: Bloomberg reported that Megaspeed imported goods “worth more than a thousand times its cash balance in 2023,” with two-thirds of its imports being NVIDIA products. The investigation got weirder when Bloomberg tried to track down specific circuit boards that NVIDIA had told the US government were in specific sites: Things get weirder throughout the article, with a Chinese company called “Shanghai Shuoyao” having a near-identical website and investor deck (as mentioned) to Megaspeed, with several of the “computing clusters under construction” actually being in China.  Things get a lot weirder as Bloomberg digs in, including a woman called “Huang” that may or may not be both the CEO of Megaspeed and an associated company called “Shanghai Hexi,” which is also owned by the Yangtze River Delta project… who was also photographed sitting next to Jensen Huang at an event in Taipei in 2024. While all of this is extremely weird and suspicious, I must be clear there is no declarative answer as to what’s going on, other than that NVIDIA GPUs are absolutely making it to China, somehow. I also think that it would be really tough for Jensen Huang to not know about it, or for billions of dollars of GPUs to be somewhere without NVIDIA’s knowledge.  Anyway, Supermicro CEO Charles Liang has yet to comment on Wally Liaw or his alleged co-conspirators, other than a statement from the company that says that their acts were “ a contravention of the Company’s policies and compliance controls .”  Jensen Huang does not appear to have been asked if he knew anything about this — not Megaspeed, not Supermicro, or really any challenging question of any kind for the last few years of his life.  Huang did, however, say back in May 2025 that there was “no evidence of any AI chip diversion,’ and that the countries in question “monitor themselves very carefully.”  For legal reasons I am going to speak very carefully: I cannot say that Jensen is wrong, or lying, but I think it’s incredible, remarkable even, that he had no idea that any of this was going on. Really? Hundreds of millions if not billions of dollars of GPUs are making it to China — as reported by The Information in December 2025 — and Jensen Huang had no idea? I find that highly unlikely, though I obviously can’t say for sure. In the event that NVIDIA had knowledge — which I am not saying it did, of course — this is a huge scandal that, for the most part, nobody has bothered to keep an eye on outside of a few brave souls at The Information and Bloomberg who give a shit about the truth. Has anybody bothered to ask Jensen about this? People talk to him on camera all the time.  I’ll also add that I am shocked that so many people are just shrugging and moving on from Supermicro, which is a major supplier of two of the major neoclouds (Crusoe and CoreWeave) and one of the minors (Lambda, which they also rents cloud capacity to). The idea that a company had no idea that several percentage points of its revenue were flowing directly to China via one of its co-founders is an utter joke. I hope we eventually find out the truth. Nevertheless, this kind of underhanded bullshit is a sign of desperation on the part of just about everybody involved. So, I want to explain something very clearly for you, because it’s important you understand how fucked up shit has become: hyperscalers are forcing everybody in their companies to use AI tools as much as possible, tying compensation and performance use to token burn, and actively encouraging non-technical people to vibe-code features that actually reach production.  In practice, this means that everybody is being expected to dick around with AI tools all day, with the expectation that you burn massive amounts of tokens and, in the case of designers working in some companies, actively code features without ever knowing a line of code.  “How do I know the last part? Because a trusted source told me — and I’ll leave it at that” One might be forgiven for thinking this means that AI has taken a leap in efficacy, but the actual outcomes are a labyrinth of half-functional internal dashboards that measure random user data or convert files, spending hours to save minutes of time at some theoretical point. While non-technical workers aren’t necessarily allowed to ship directly to production, their horrifying pseudo-software, coded without any real understanding of anything, is expected to be “fixed” by actual software engineers who are also expected to do their jobs. These tools also allow near-incompetent Business Idiot software engineers to do far more damage than they might have in the past. LLM use is relatively-unrestrained (and actively incentivized) in at least one hyperscaler, with just about anybody allowed to spin up their own OpenClaw “AI agent” (read: series of LLMs that allegedly can do stuff with your inbox or Slack for no clear benefit, other than their ability to delete all of your emails ). In Meta’s case , this ended up causing a severe security breach: According to The Information, Meta systems storing large amounts of company and user-related data were accessible to engineers who didn’t have permission to see them, and was marked a sec-1 incident, the second highest level of severity on an internal scale that Meta uses to rank security incidents.  The incident follows multiple problems caused at Amazon by its Kiro and Q LLMs. I quote Business Insider ’s Eugene Kim:  Despite the furious (and exhausting) marketing campaign around “the power of AI code,” I believe that these events are just the beginning of the true consequences of AI coding tools: the slow destruction of the tech industry’s software stack.  LLMs allow even the most incompetent dullard to do an impression of a software engineer, by which I mean you can tell it “make me software that does this” or “look at this code and fix it” and said LLM will spend the entire time saying “you got this” and “that’s a great solution.”  The problem is that while LLMs can write “all” code, that doesn’t mean the code is good, or that somebody can read the code and understand its intention (as these models do not think), or that having a lot of code is a good thing both in the present and in the future of any company built using generative code.  LLM-based code is often verbose, and rarely aligns with in-house coding guidelines and standards, guaranteeing that it’ll take far longer to chew through, which naturally means that those burdened with reviewing it will either skim-read it or feed it into another LLM to work out what the hell to do. Worse still, LLM use is also entirely directionless. Why is anybody at Meta using an OpenClaw? What is the actual thing that OpenClaw does, other than burn an absolute fuck-ton of tokens?  Think about this very, very simply for a second: you have given every engineer in the company the explicit remit to write all their code using LLMs, and incentivized them to do so by making sure their LLM use is tracked. You have now massively increased both the operating costs of the company (through token burn costs) and the volume of code being created.  To be explicit, allowing an LLM to write all of your code means that you are no longer developing code, nor are you learning how to develop code, nor are you going to become a better software engineer as a result. This means that, across almost every major tech company, software engineers are being incentivized to stop learning how to write software or solve software architecture issues .   If you are just a person looking at code, you are only as good as the code the model makes, and as Mo Bitar recently discussed, these models are built to galvanize you, glaze you, and tell you that you’re remarkable as you barely glance at globs of overwritten code that, even if it functions, eventually grows to a whole built with no intention or purpose other than what the model generated from your prompt.  Things only get worse when you add in the fact that hyperscalers like Meta and Amazon love to lay off thousands of people at a time, which makes it even harder to work out why something was built in the way it was built, which is even harder when an LLM that lacks any thoughts or intentions builds it. Entire chunks of multi-trillion dollar market cap companies are being written with these things, prompted by engineers (and non-engineers!) who may or may not be at the company in a month or a year to explain what prompts they used.  We’re already seeing the consequences! Amazon lost hundreds of thousands of orders! Meta had a major security breach! The foundations of these companies are being rotted away through millions of lines of slop-code that, at best, occasionally gets the nod from somebody who has “software engineer” on their resume, and these people keep being fired too, raising the likelihood that somebody who knows what’s going on or why something is built a certain way will be able to stop something bad from happening.  Remember: Google, Amazon, Microsoft, and Meta all hold vast troves of personal information, intimate conversations, serious legal documents, financial information, in some cases even social security numbers, and all four of them along with a worrying chunk of the tech industry are actively encouraging their software engineers to stop giving a fuck about software.   Oh, you’re so much faster with AI code? What does that actually mean? What have you built? Do you understand how it works? Did you look at the code before it shipped, or did you assume that it was fine because it didn’t break?  This is creating a kind of biblical plague within software engineering — an entire tech industry built on reams of unmanageable and unintentional code pushed by executives and managers that don’t do any real work. LLMs allow the incompetent to feign competence and the unproductive to produce work-adjacent materials borne of a loathing for labor and craftsmanship, and lean into the worst habits of the dullards that rule Silicon Valley. All the Valley knows is growth , and “more” is regularly conflated with “valuable.” The New York Times’ Kevin Roose — in a shocking attempt at journalism — recently wrote a piece celebrating the competition within Silicon Valley to burn more and more tokens using AI models : Roose explains that both Meta and OpenAI have internal leaderboards that show how many tokens you’ve used, with one software engineer in Stockholm spending “more than his salary in tokens,” though Roose adds that his company pays for them. Roose describes a truly sick culture, one where OpenAI gives awards to those who spend a lot of money on their tokens , adding that he spoke with several tech workers who were spending thousands of dollars a day on tokens “for what amount to bragging rights.” Roose also added one more insane detail: that one person found a loophole in Claude’s $20-a-month using a piece of software made by Figma that allowed them to burn $70,000 in tokens . Despite all of this burn, Roose struggled to find anybody who was able to explain what they were doing beyond “maintaining large, complex pieces of software using coding agents running in parallel,” but managed to actually find one particularly useful bit of information — that all of this might be performative: I do give Roose one point for wondering if “...any of these tokenmaxxers [were] producing anything good, or whether they [were] merely spinning their wheels churning out useless code in an attempt to look busy.” Good job Kevin.  That being said, I find this story horrifying, and veering dangerously close to the actions of drug addicts and cult followers. Throughout this story in one of the world’s largest newspapers, Roose fails to find a single “tokenmaxxer” making something that they can actually describe, which has largely been my experience of evaluating anyone who talks nonstop about the power of “agentic coding.”  These people are sick, and are participating in a vile, poisonous culture based on needless expenses and endless consumption.  Companies incentivizing the amount of tokens you burn are actively creating a culture that trades excess for productivity, and incentivizing destructive tendencies built around constantly having to find stuff to do rather than do things with intention.  They are guaranteeing that their software will be poorly-written and maintained, all in the pursuit of “doing more AI” for no reason other than that everybody else appears to be doing so. Anybody who actually works knows that the most productive-seeming people are often also the most-useless, as they’re doing things to seem productive rather than producing anything of note. A great example of this is a recent Business Insider interview with a person who got laid off from Amazon after learning “AI” and “vibe coding,” and how surprised they were that these supposed skills didn’t make them safer from layoffs: To be clear, this person is a victim . They were pressured by Amazon to take up useless skills and build useless things in an expensive and inefficient way, and ended up losing their job despite taking up tools they didn’t like under duress.  This person was, at one point, actively part of building an internal Amazon site using AI, and had to “learn to vibe code with a lot of trial and error” and the help of a colleague. Was this a good use of her time? Was this a good use of her colleague’s time? No! In fact, across all of these goddamn AI coding hype-beast Twitter accounts and endless proclamations about the incredible power of AI agents, I can find very few accounts of something happening other than someone saying “yeah I’m more productive I guess.”  I am certain that at some point in the near future a major big tech service is going to break in a way that isn’t immediately fixable as a result of thousands of people building software with AI coding tools, a problem compounded by the dual brain drain forces of layoffs and a culture that actively empowers people to look busy rather than actually produce useful things. What else would you expect? You’re giving people a number that they can increase to seem better at their job, what do you think they’re going to do, try and be efficient? Or use these things as much as humanly possible, even if there really isn’t a reason to? I haven’t even gotten to how expensive all of this must be, in part because it’s hard to fully comprehend.  But what I do know is that big tech is setting itself up for crisis after crisis, especially when Anthropic and OpenAI stop subsidizing their models to the tune of allowing people to spend $2500 or more on a $200-a-month subscription .  What happens to the people who are dependent on these models? What happens to the people who forgot how to do their jobs because they decided to let AI write all of their code? Will they even be able to do their jobs anymore?   Large Language Models are creating Silicon Valley Habsburgs — workers that are intellectually trapped at whatever point they started leaning on these models that were subsidized to the point that their bosses encouraged them to use them as much as humanly possible. While they might be able to claw their way back into the workforce, a software engineer that’s only really used LLMs for anything longer than a few months will have to relearn the basic habits of their job, and find that their skills were limited to whatever the last training run for whatever model they last used was.  I’m sure there are software engineers using these models ethically, who read all the code, who have complete industry over it and use it as a means of handling very specific units of work that they have complete industry over. I’m also sure that there are some that are just asking it to do stuff, glancing at the code and shipping it. It’s impossible to measure how many of each camp there are, but hearing Spotify’s CEO say that its top developers are basically not writing code anymore makes me deeply worried, because this shit isn’t replacing software engineering at all — it’s mindlessly removing friction and putting the burden of “good” or “right” on a user that it’s intentionally gassing up. Ultimately, this entire era is a test of a person’s ability to understand and appreciate friction.  Friction can be a very good thing. When I don’t understand something, I make an effort to do so, and the moment it clicks is magical. In the last three years I’ve had to teach myself a great deal about finance, accountancy, and the greater technology industry, and there have been so many moments where I’ve walked away from the page frustrated, stewed in self-doubt that I’d never understand something. I also have the luxury of time, and sadly, many software engineers face increasingly-deranged deadlines set by bosses that don’t understand a single fucking thing, let alone what LLMs are capable of or what responsible software engineering is. The push from above to use these models because they can “write code faster than a human” is a disastrous conflation of “fast” and “good,” all because of flimsy myths peddled by venture capitalists and the media about “LLMs being able to write all code.” Generative code is a digital ecological disaster, one that will take years to repair thanks to company remits to write as much code as fast as possible.  Every single person responsible must be held accountable, especially for the calamities to come as lazily-managed software companies see the consequences of building their software on sand.  In the end, everything about AI is built on lies.  Hundreds of gigawatts of data centers in development equate to 5GW of actual data centers in construction.  Hundreds of billions of dollars of GPU sales are mostly sitting waiting for somewhere to go. Anthropic’s constant flow of “annualized” revenues ended up equating to literally $5 billion in revenue in four years , on $25 billion or more in salaries and compute. Despite all of those data centers supposedly being built, nobody appears to be making a profit on renting out AI compute. AI’s supposed ability to “write all code” really means that every major software company is filling their codebases with slop while massively increasing their operating expenses. Software engineers aren’t being replaced — they’re being laid off because the software that’s meant to replace them is too expensive, while in practice not replacing anybody at all. Looking even an inch beneath the surface of this industry makes it blatantly obvious that we’re witnessing one of the greatest corporate failures in history. The smug, condescending army of AI boosters exists to make you look away from the harsh truth — AI makes very little revenue, lacks tangible productivity benefits, and seems to, at scale, actively harm the productivity and efficacy of the workers that are being forced to use it. Every executive forcing their workers to use AI is a ghoul and a dullard, one that doesn’t understand what actual work looks like, likely because they’re a lazy, self-involved prick.  Every person I talk to at a big tech firm is depressed, nagged endlessly to “get on board with AI,” to ship more, to do more, all without any real definition of what “more” means or what it contributes to the greater whole, all while constantly worrying about being laid off thanks to the truly noxious cultures that are growing around these services. AI is actively poisonous to the future of the tech industry. It’s expensive, unproductive, actively damaging to the learning and efficacy of its users, depriving them of the opportunities to learn and grow, stunting them to the point that they know less and do less because all they do is prompt. Those that celebrate it are ignorant or craven, captured or crooked, or desperate to be the person to herald the next era, even if that era sucks, even if that era is inherently illogical, even if that era is fucking impossible when you think about it for more than two seconds. And in the end, AI is a test of your introspection. Can you tell when you truly understand something? Can you tell why you believe in something, other than that somebody told you you should, or made you feel bad for believing otherwise? Do you actually want to know stuff, or just have the ability to call up information when necessary?  How much joy do you get out of becoming a better person?If you can’t answer that question with certainty, maybe you should just use an LLM, as you don’t really give a shit about anything. And in the end, you’re exactly the mark built for an AI industry that can’t sell itself without spinning lies about what it can (or theoretically could) do.  Only 33% of announced US data centers are actually being built, with the rest in vague levels of “planning.” That’s about 79.53GW of power, or 61GW of IT load. “Active development” also refers to anything that is (and I quote) “...under development or construction,” meaning “we’ve got the land and we’re still working out what to do with it. This is pretty obvious when you do the maths. 61GW of IT load would be hundreds of thousands of NVIDIA GB200 NVL72 racks — over a trillion dollars of GPUs at $3 million per 72-GPU rack — and based on the fact there were only $178.5 billion in data center debt deals last year , I don’t think many of these are actually being built right now. Even if they were, there’s not enough power for them to turn on. NVIDIA claims it will sell $1 trillion of GPUs between 2025 and 2027 , and as I calculated previously , it sells about 1.6GW (in IT load terms, as in how much power just the GPUs draw) of GPUs every quarter, which would require at least 1.95GW of power just to run, when you include all the associated gear and the challenges of physically getting power. None of this data talks about data centers actually coming online.

0 views
Jeff Geerling 2 days ago

Using FireWire on a Raspberry Pi

After learning Apple killed off FireWire (IEEE 1394) support in macOS 26 Tahoe , I started looking at alternatives for old FireWire equipment like hard drives, DV cameras, and A/V gear. I own an old Canon GL1 camera, with a 'DV' port. I could plug that into an old Mac (like the dual G4 MDD above) with FireWire—or even a modern Mac running macOS < 26, with some dongles —and transfer digital video footage between the camera and an application like Final Cut Pro.

0 views