Latest Posts (20 found)
Stratechery Yesterday

2026.22: Luceing Their Mind

Welcome back to This Week in Stratechery! As a reminder, each week, every Friday, we’re sending out this overview of content in the Stratechery bundle; highlighted links are free for everyone . Additionally, you have complete control over what we send to you. If you don’t want to receive This Week in Stratechery emails (there is no podcast), please uncheck the box in your delivery settings . On that note, here were a few of our favorites this week. This week’s Stratechery video is on The Inference Shift . Why Everyone Hates Luce. To say that the Jony Ive-designed Ferrari Luce, the iconic carmaker’s first electric vehicle, has faced a chilly reception is an understatement. I actually think it looks great —  for an electric car . On Dithering , John and I discuss why the real problem is that it’s branded Ferrari, and on Sharp Tech I get even more philosophical: electric cars are focused first and foremost on efficiency, and not only is that different than performance, Ferrari’s calling card, but also representative of the parts of modern society — including tech — that leave everyone feeling increasingly alienated (and why, surprisingly, AI might help). — Ben Thompson How to Monetize AI Answers. The ad business is, for me at least, endlessly fascinating, and not just because it is the most important business model in consumer tech: I think digital ads, particularly Meta-style ads that introduce you to things you never knew you wanted, a societal good. The other reason to care about ads, however, is that their economic importance means they are where the impacts of new technology are often felt first. This week’s Interview with Eric Seufert covers all this: how LLMs are changing digital ads, the changes both Google and OpenAI have made in terms of monetizing AI, and, more philosophically, why believing in ads might make one more optimistic about humanity in an AI-denominated future. — BT Social Mobility in China, and Lack Thereof.  Late last week China’s State Council announced a reform that will ease so-called “hukou restrictions” and allow migrant workers from all over the country to access social services in the cities where they work, which had long been forbidden. It’s a major reform that furthers Xi’s goal to unify the national market, and should improve the lives of millions of workers, but it also comes with plenty of questions as it’s implemented. We discussed all of it on a great episode of Sharp China this week , as well as reports that top Chinese talent in AI has been banned from leaving the country, continued capital control, and ongoing tensions with Japan and the U.S. that call to mind an ominous passage from Mao Zedong.  — AS Nvidia Earnings, The AI Stack, Nvidia’s New Reporting — Nvidia is changing its reporting to delineate between hyperscaler sales — where Nvidia is fighting commoditization — and everyone else, where Nvidia runs the whole stack. The SpaceX IPO and Data Centers in Space — There isn’t a financial model that justifies the SpaceX IPO, but data centers in space are plausible, and that might be enough. An Interview with Eric Seufert About Models and Ads, and AI’s Upside for Humanity — An Interview with Eric Seufert about building models for generative AI, why Meta’s foundational models are so important, and why understanding advertising leads to optimism about humanity’s future. How Spencer Pratt Happens — Spencer Pratt’s success in L.A. reflects his own surprising political talent, and an increasingly broken Democratic machine in California and beyond. Acquired the Podcast The Ferrari Luce How Things Fell Apart for Germany’s Nixdorf Computer Japan’s Rare Earths Island Social Mobility and Hukou Reform; US Halts Taiwan Arms Sales?; Ongoing Pressure on Japan; An American Xinhua Journalist Arrested The Knicks are in the NBA Finals, A Moment of SGA Truth, Around the League with Giannis, Bulls, and the Basketball Gods SpaceX Hype and the Elon Bargain, Nvidia and the Neoclouds, Q&A on Dropbox, Google, Ferrari Luce Backlash

0 views
Stratechery 2 days ago

An Interview with Eric Seufert About Models and Ads, and AI’s Upside for Humanity

An Interview with Eric Seufert about building models for generative AI, why Meta's foundational models are so important, and why understanding advertising leads to optimism about humanity's future.

0 views
Stratechery 3 days ago

The SpaceX IPO and Data Centers in Space

Listen to this post : It’s hardly the biggest problem in the world — or perhaps the height of privilege to consider it a problem at all — but one of the most annoying consumer experiences is booking an Uber Black and realizing you got assigned a Tesla Model Y (Uber finally stopped allowing new Model Y’s onto Black last year ). Buckle up for an uncomfortable back seat, basic plastic finishes, and, all-too-often, potential car sickness from a driver who hasn’t completely mastered the Tesla’s aggressive regenerative braking. Still, the fact that the Model Y ever made it to the Black level is a testament to the brand Elon Musk built. Back in 2016, when 300,000 people dropped $1,000 each in a matter of hours to reserve an as-yet-unreleased Model 3, I explained that the phenomenon was because It’s a Tesla : The real payoff of Musk’s “Master Plan” is the fact that Tesla means something: yes, it stands for sustainability and caring for the environment, but more important is that Tesla also means amazing performance and Silicon Valley cool. To be sure, Tesla’s focus on the high end has helped them move down the cost curve, but it was Musk’s insistence on making “An electric car without compromises” that ultimately led to 276,000 people reserving a Model 3, many without even seeing the car: after all, it’s a Tesla. This is the same brand halo that landed what is, if we’re honest, a pretty basic car on the Uber Black list. What actually makes these cars compelling is the extent to which they are computers on wheels: I know plenty of very rich people who drive a Tesla not for the finishes but rather the Full Self-Driving (Supervised); there is nothing like it on the market, at least when it comes to cars you can own. Tesla appears to be doubling down on this point of differentiation: the company stopped production of the Models S and X earlier this year, focusing production resources on the CyberCab and robots; if you want your car to drive itself, you’ll get the same model as everyone else. It reminds me of Andy Warhol’s famous quote : What’s great about this country is that America started the tradition where the richest consumers buy essentially the same things as the poorest. You can be watching TV and see Coca-Cola, and you know that the President drinks Coke, Liz Taylor drinks Coke, and just think, you can drink Coke, too. A Coke is a Coke and no amount of money can get you a better Coke than the one the bum on the corner is drinking. All the Cokes are the same and all the Cokes are good. Liz Taylor knows it, the President knows it, the bum knows it, and you know it. That “tradition” is scale, and America is indeed better at it than any other country in the world; and, amongst Americans, no one pursues and seeks to leverage scale quite like Musk. From a press release from American Airlines: American Airlines today announced a sweeping modernization of its narrowbody inflight customer experience with the installation of Starlink, the fastest Wi-Fi in the sky, on more than 500 narrowbody aircraft beginning in Q1 2027. Starlink is widely regarded as the world’s most advanced satellite constellation using a low Earth orbit to deliver broadband Internet capable of supporting inflight streaming, online gaming, collaborative meeting tools and more. With thousands of satellites in low Earth orbit, Starlink can deliver multigigabit connectivity to aircraft using its Aero Terminal, which can support up to 1 Gbps per antenna. “As a premium global airline, we are continuously seeking out world-class partners like Starlink to deliver what our customers need and want,” said American Airlines Chief Customer Officer Heather Garboden. “The addition of Starlink solidifies American as a leading airline in keeping passengers connected in flight.” As part of American’s commitment to an elevated onboard experience, Starlink will enable seamless streaming, browsing and real-time communication capabilities across American’s domestic and short-haul international routes. I linked to the press release just for the amusement of American Airlines, which has in recent years built its strategy around offering anything-but-premium on routes you need, billing their Starlink deal as a commitment to “an elevated onboard experience.” That may have been the argument for United’s Starlink deal when it was announced in 2024 , but by this point it’s tablestakes , which is surely exactly how Musk wants it. Starlink is the consumer-facing business of SpaceX, generating $8.7 billion in revenue last year and $4.4 billion in profit; while it’s not totally clear exactly how SpaceX accounts for launch costs, obviously Starlink benefits greatly from the fact that it has access to SpaceX’s launch capacity. That launch capacity has resulted in over ten thousand active satellites in low Earth orbit, delivering low latency high speed Internet anywhere in the world — including in the air. That’s the carrot for airlines; the stick is the prospect of everyone else having the same service, and customers making flight decisions based on the quality of Internet access available. There is a similarity to Tesla in this way. Musk companies at their best don’t win the game; they change the rules through scale, such that billionaires buy economy cars because they actually drive themselves (with supervision), and airlines transform the consumer experience on their own dime. Musk makes all-in bets — whether that be in terms of launch capacity or in autonomous driving — not by making rational short-term business decisions, but by starting with the desired end state and working backwards. Tech has a long history of silly charts — there is an entire category known as Bezos charts — and the SpaceX S-1 has one that made me laugh. It came in the discussion of SpaceX’s total addressable market: We believe we have identified the largest actionable total addressable market (“TAM”) in human history. We estimate that our quantifiable TAM is $28.5 trillion, consisting of $370 billion in Space from space-enabled solutions; $1.6 trillion in Connectivity across $870 billion in Starlink Broadband and $740 billion in Starlink Mobile as well as additional opportunities in enterprise and government; $26.5 trillion in AI across $2.4 trillion in AI infrastructure, $760 billion in consumer subscriptions, $600 billion in digital advertising, and $22.7 trillion in enterprise applications. For illustrative purposes of sizing our addressable market opportunity, we exclude China and Russia from our global estimates. This image is approximately to scale vertically, but certainly not horizontally: I could use the help in really wrapping my mind around the $26.5 trillion AI opportunity, given it’s more than 13 times the space and connectivity opportunity combined! In all seriousness, the numbers are obviously absurd, but then again, everything about this IPO is absurd. SpaceX is seeking a $2 trillion valuation on a mere $18.67 billion in revenue with $4.9 billion in losses last year, and growth actually slowed from 35% to 33%. That slowdown happened despite the addition of xAI (and thus also X), which tipped the company from a small profit to that massive loss, thanks to $5.1 billion in AI R&D expense. That R&D, keep in mind, went towards building a model that is in 5th place, and whose entire founding team recently left the company. But sure, $26.5 trillion AI opportunity! This is not to say that SpaceX won’t get its desired valuation. Tesla’s valuation never made any sense right up until the Models 3 and Y actually worked out, causing Tesla’s share price to soar (and even then it was hard to ever build a financial model that justified the new share price). Musk’s ability to make his own reality starts with investors; from 2021’s Mistakes and Memes and comparing Apple and Tesla: This comparison works as far as it goes, but it doesn’t tell the entire story: after all, Apple’s brand was derived from decades building products, which had made it the most profitable company in the world. Tesla, meanwhile, always seemed to be weeks from going bankrupt, at least until it issued ever more stock, strengthening the conviction of Tesla skeptics and shorts. That, though, was the crazy thing: you would think that issuing stock would lead to Tesla’s stock price slumping; after all, existing shares were being diluted. Time after time, though, Tesla announcements about stock issuances would lead to the stock going up. It didn’t make any sense, at least if you thought about the stock as representing a company. It turned out, though, that TSLA was itself a meme, one about a car company, but also sustainability, and most of all, about Elon Musk himself. Issuing more stock was not diluting existing shareholders; it was extending the opportunity to propagate the TSLA meme to that many more people, and while Musk’s haters multiplied, so did his fans. The Internet, after all, is about abundance, not scarcity. The end result is that instead of infrastructure leading to a movement, a movement, via the stock market, funded the building out of infrastructure. I explained in that Article why I generally did not cover Tesla’s financial results, and the reasoning extends to why I don’t expect to cover SpaceX’s: Musk is the master of memes, and is himself a meme. He offers a dream — Mars, fully autonomous vehicles, an addressable market of $28.5 trillion — and positions his companies and their stock as access to that dream, and through the alchemy of capital markets, transforms shared delusion into mass market reality. Musk’s track record matters in this regard. Building an electric car company was possible, as was full self-driving (supervised); at the same time there were ever increasing government mandates and programs around decreasing emissions that acted as the stick to Tesla’s carrot. Similarly, landing rockets was possible, and the new market creation downstream from correspondingly lower launch costs was comprehensible. That Musk succeeded in both instances gives him the benefit of the doubt. The question that matters, then, is not if the numbers make sense right now (they absolutely do not); what matters is if the dream is even possible, and if there are actual reasons to think it might happen. I think that data centers in space meet these conditions. The first question about data centers in space is if they are even possible, and I think the answer is clearly yes. The key thing to consider is that there is no requirement that these data centers look anything like data centers on earth. On earth we build massive buildings full of GPUs with massive infrastructure for cooling those GPUs and massive power plants (or a connection to a grid which connects to massive power plants) to power those GPUs. The idea of transporting these massive structures to space sounds implausible, and it is! However, there is no reason that space data centers would look like data centers on earth. What makes far more sense is to think about an individual satellite as something akin to a rack. Right now the largest Starlink satellite in orbit is the V2 Mini Direct-to-Cell, which measures 7.4 meters by 2.7 meters by 0.3 meters (estimated); an NVL72 rack from Nvidia, meanwhile, measures 2.2 meters by 1.1 meters by 0.6 meters, so we’re already in the right size range. The V2 Mini Direct-to-Cell consumes (and dissipates) up to an estimated 25kW of energy; the NVL72 up to 135kW, and it can fit a 1 trillion parameter model quantized to FP4. The big shortcoming for a rack-satellite is power and its dissipation, but going from 25kW to 135kW is certainly within the realm of possibility — and given that you don’t need much of the cooling and power distribution usage on earth, something closer to 100kW might deliver similar performance. There are other issues to address, including the problem of radiation screwing with calculations, reliability, etc., although those two concerns could be addressed in part by using larger chips (which are less efficient, but also use less power); these rack-satellites will also be disposable, like Starlink satellites, ameliorating reliability issues. The key factor, however, is that a fleet of racks, interconnected with lasers (as Starlink’s already are), each with their own solar panels and radiator arrays for cooling (deploying 200+ square meters of radiators per rack will be a huge challenge), is possible . The next question about data centers in space is if there is a use case for them — the carrot — and I already made the argument that there is in The Inference Shift . Specifically, there are three types of workloads developing around LLMs: training, answer inference, and agentic inference. From the section making the case for “agentic inference”: Critically, this articulation of an agentic-specific memory hierarchy implies a necessary trade-off of speed for capacity. Here’s the thing, though: lower speed isn’t nearly as important a consideration if there isn’t a human in the loop. If an agent is waiting around for a job that is being run overnight, the agent doesn’t know or care about the user experience impact; what is most important is being able to accomplish a task, and if entirely new approaches to memory make that possible, then delays are fine. If delays are fine, then all of the focus on pure compute power and high-bandwidth memory seems out of place: if latency isn’t the top priority, then slower and cheaper memory — like traditional DRAM, for example — makes a lot more sense. And if the entire system is mostly waiting on memory, then chips don’t need to be as fast as the cutting edge either. This represents a profound shift in future architectures, but it also doesn’t mean that current architectures are going away: At the same time, these categories won’t be equal in size or importance. Specifically, agentic inference will be the largest market by far, because that is the market that won’t be limited by humans or time. Today’s agents are fancy answer inference; in the future true agentic inference will be work done by computers according to dictates given by other computers, and the market size scales not with humans but with compute. It’s agentic inference that makes the most sense for racks in space, and conveniently enough, that is also the market that is likely to be the largest in the long run. The third question about data centers in space is if there is a stick. Specifically, while I think that racks-in-space are both a lot more viable than people think, and a lot more relevant to agentic inference than current modes of compute, it is at the end of the day cheaper and easier to build on earth, all things being equal. All things are not equal, however: right now we are at the very beginning of the AI buildout and already one of the biggest constraints is not just power (expected), but zoning (unexpected). I wrote in an Update last week : That leads to an interesting contrast to globalization: when companies were closing down American factories and laying off workers and moving operations to China, none of the affected towns or workers had a say. They just suddenly no longer had a job, and a huge number of cities across the Rust Belt no longer had a reason to exist. People simply had to move, or worse, retreat to things like alcohol or drugs. AI, however, is the opposite: building data centers requires permission, which is to say that people actually have a say. Again, I am not at all saying that these people are well informed about data centers, or about the economic impact on their communities, much less the economic impact of AI generally; what I am noting is that people who didn’t have a say in globalization are suddenly finding they do have a say about AI, and it’s not a surprise they are expressing their disapproval by blocking data centers. In that Update I made the case that data center builders — and by extension the companies that use them — should straight up pay people for permission to build data centers in their communities. At a minimum, however, that increases the costs of terrestrial data centers. What seems very plausible in the long run is that the demand for compute ends up being so large that there eventually is nowhere left to build, making the vast expanses of space not just an alternative but in fact the only choice. If all of this happens — and there are a lot of “if”s here! — then suddenly that $2 trillion valuation starts looking reasonable. SpaceX is already monetizing xAI’s first data center, Colossus 1, to the tune of $15 billion/year for 300MW of capacity; that’s 3,000 racks-in-space. Anthropic, meanwhile, will probably make 3x the revenue on that capacity; it remains to be seen if xAI can get back in the state-of-the-art game, but if so then the amount of revenue it can generate per rack-in-space will be commensurately higher. Even without xAI, however, SpaceX has the potential to be a monopoly provider of marginal compute capacity. There are, needless to say, a massive number of assumptions baked into this argument, including assuming a huge number of engineering challenges are solved, Starship actually works, SpaceX gets sufficient supply of the right kinds of chips, compute demand is massively larger, agentic inference unbundles current architectures, and data center opponents are successful. The risk attached to all of these assumptions should discount the valuation you put on this business, which is to say I still think this IPO is nuts. At the same time, I’m glad it exists, for multiple reasons. The first one is the most obvious one: Musk, for all of his faults, has already pushed humanity forward on multiple vectors, including electric cars, self-driving, reusable rockets, satellite Internet, etc., and I’m excited to see him try and do more. The second is that I am in fact concerned about our ability to muster enough compute to fully realize the gains from AI, and am very worried about a replay of nuclear power, where our failure to build denied us the opportunity to even imagine what could be invented in a world of unlimited energy; the fact Musk is proposing an alternative path to unlimited compute is a relief. The third is that I appreciate the extent to which this IPO is a return to what an IPO should be: the opportunity for people to contribute capital to actually build the business, and to benefit if it works out. As I noted, I can’t make a financial model that necessarily justifies this valuation, particularly based on current financials, but neither can a VC investing in the Series A of a company. SpaceX has already invented a lot, and its early investors are going to make a lot of money with this IPO; at the same time, there is still so much more to invent that there remains a lot of upside — and, to be very clear, a lot of risk. It’s a testament to SpaceX’s ambitions that retail investors get to play VC. And hey, you get Mars upside for free! Training will continue to matter, and Nvidia’s current architecture, including high-speed compute, large amounts of high-bandwidth memory, and high-speed networking, will likely continue to dominate. Answer inference will be a meaningful market, albeit a relatively small one, and speed from chips like Cerebras or Groq (I explained how Nvidia is deploying Groq’s LPUs here ) will be very useful. Agentic inference will gradually unbundle the GPU, which alternates between stranding high-bandwidth memory (during the prefill process) and stranding compute (during the decode process), in favor of increasingly sophisticated memory hierarchies dominated by high capacity and relatively lower cost memory types, with “good enough” compute; indeed, if anything it will be the speed of CPUs for things like tool use that will matter more than the speed of GPUs.

0 views
Stratechery 4 days ago

Nvidia Earnings, The AI Stack, Nvidia’s New Reporting

Nvidia is changing its reporting to delineate between hyperscaler sales — where Nvidia is fighting commoditization — and everyone else, where Nvidia runs the whole stack.

0 views
Stratechery 1 weeks ago

2026.21: The Data Center Veto

Welcome back to This Week in Stratechery! As a reminder, each week, every Friday, we’re sending out this overview of content in the Stratechery bundle; highlighted links are free for everyone . Additionally, you have complete control over what we send to you. If you don’t want to receive This Week in Stratechery emails (there is no podcast), please uncheck the box in your delivery settings . On that note, here were a few of our favorites this week. This week’s Stratechery video is on The Inference Shift . Data Center Discontent. The impact of AI is, at least for now, being felt digitally: that is where AI is useful, and the more digital a job, the more it is threatened by LLMs. AI, however, depends on data centers in the physical world, and building data centers needs permission. This gives normal people the sort of veto power over AI they didn’t have in the face of globalization; I make the case in Monday’s Update and on Sharp Tech that understanding this dynamic is more important that trying to correct misinformation, which is a symptom, not a cause, of data center opposition. — Ben Thompson Agent Economics. What will the internet look like when ad-supported models are rendered obsolete by shifting user behavior and the rise of agentic web traffic? Ben considered this question last summer with The Agentic Web and Original Sin , and I was surprised to learn this week that Parag Agarwal, former CEO of Twitter, is now focused on devising solutions for exactly this reality. This week’s Stratechery Interview with Agarwal dives deep into the economics of content on the Internet, why ads make sense for humans, and why incentivizing content for agents will be different, and how Agarwal and Parallel are trying to solve them. I learned a ton from this interview, and I bet you will, too — and don’t worry, we did get a few bonus questions on the ride at Twitter.   — Andrew Sharp Never Count Out the Slime Mold. Wednesday’s Daily Update on Google I/O reminded me of an iconic leaked memo about the ungovernable and poorly coordinated mold in Mountain View, as the company seems to be throwing 10 different types of AI spaghetti at the wall to see what sticks. Then again, Google is now a nearly $5 trillion company and its transformer architecture supercharged the AI era. That second part is why, when Ben highlights a DeepMind approach to building AGI that’s distinct from the approaches at OpenAI and Anthropic, I’m compelled to both pay attention, and remember: for all of Google’s faults and misses, they do in fact have plenty of historic hits.  — AS Data Center Discontent, Understanding the Opposition, Fixing the Problem — There are understandable reasons for people to oppose data centers; the only solution that will work is simply paying them off. Google I/O, World Models, I/O Spaghetti — Google I/O put AI everywhere, for better and for worse. Meanwhile, is DeepMind aligned with Google’s business objectives? An Interview with Parallel Founder Parag Agarwal About Valuing Content on the Agentic Web — An interview with Parallel founder Parag Agarwal about valuing content and incentivizing its creation in a world of agents (plus questions about Twitter). Data Center Unpopularity Google Being Google The Little Vertical Laser That Everyone Uses Intel’s 30 Years in Costa Rica Constructing US-China Stability; Trump’s Taiwan Comments and More Summit Takeaways; Putin in China Wemby, Harper and an Instant Classic from the Spurs in Game 1 vs. OKC A Note on the Future of GOAT and An Emergency Top Five Much Ado About Data Centers, What Tech Gets Wrong About Its Critics, Q&A on SpaceX, Chinese AI, Elon Musk

0 views
Stratechery 1 weeks ago

An Interview with Parallel Founder Parag Agarwal About Valuing Content on the Agentic Web

An interview with Parallel founder Parag Agarwal about valuing content and incentivizing its creation in a world of agents (plus questions about Twitter).

0 views
Stratechery 1 weeks ago

Google I/O, World Models, I/O Spaghetti

Google I/O put AI everywhere, for better and for worse. Meanwhile, is DeepMind aligned with Google's business objectives?

0 views
Stratechery 1 weeks ago

Data Center Discontent, Understanding the Opposition, Fixing the Problem

There are understandable reasons for people to oppose data centers; the only solution that will work is simply paying them off.

0 views
Stratechery 2 weeks ago

2026.20: Shifting Alliances in a Changing World

Welcome back to This Week in Stratechery! As a reminder, each week, every Friday, we’re sending out this overview of content in the Stratechery bundle; highlighted links are free for everyone . Additionally, you have complete control over what we send to you. If you don’t want to receive This Week in Stratechery emails (there is no podcast), please uncheck the box in your delivery settings . On that note, here were a few of our favorites this week. This week’s Stratechery video is on Amazon’s Durability . A New Kind of Computing . AI compute has been divided into two categories: training, and inference. However, in The Inference Shift (and on this week’s Sharp Tech ), I make the case that there are two kinds of inference: the one we know today is “answer inference”, where humans are in the loop, and speed matters; the inference that will matter most in the future, at least in terms of market size, will be “agentic inference”, where humans aren’t involved at all. That will lead to very different trade-offs in architectures, and is good news for both China and space (but maybe not Nvidia). — Ben Thompson All About Elon.  A week on from the news that Anthropic has secured compute from xAI, Tuesday’s Daily Update examined the deal from both sides . On one hand, Anthropic’s side of the deal is a reminder that markets actually work quite well, much to the relief of Claude users all over the world. On the other, the logic of the deal for xAI raises an interesting question about whether Musk will listen to what the market has told him, as well as the future of space data centers and who exactly SpaceX will be serving. Finally, if you can’t get enough Elon, on Sharp Text this week I wrote about his ongoing lawsuit with OpenAI, and why I find the case both boring and insulting , even as it’s clear that win or lose, Musk has already succeeded.  — Andrew Sharp 360 Degrees of US-China Relations. With a U.S. President visiting Beijing for the first time in nine years, this week’s episode of Sharp China asked 10 Questions about the US-China summit and what might be achieved. Trump has already left Beijing as you read this, and as predicted on the podcast, the deliverables from his visit were underwhelming (at least so far). Nevertheless, Wednesday’s conversation doubles as a great window into the state of the relationship generally, including why “upper hand” analysis tends to be overblown, why both sides are incentivized to play for time and stability, and the ways in which China’s posture has changed since the 90s and 2000s. Also: Jensen Huang standing on a runway in Alaska , and fun memories of a US-China fistfight in the Great Hall, back in 2017.  — AS The Inference Shift — Agentic inference is going to be different than the inference we use today, and it will change compute infrastructure because speed won’t matter when humans aren’t involved. SpaceX and Anthropic, xAI’s Two Companies, Elon Musk and SpaceXAI’s Future — The Anthropic xAI deal is shocking but not surprising: Musk should double down on serving other companies. The Deployment Company, Back to the 70s, Apple and Intel — OpenAI is forming a new company to deploy AI, and the other labs aren’t far behind, reinforcing the thesis that AI’s impact will require top-down implementation. Then, Apple has economic reasons to work with Intel. An Interview with Ben Thompson at the MoffettNathanson Media, Internet & Communications Conference — An interview with me about the implications of the compute shortage on Aggregation Theory, consumer AI, and more. Elon’s OpenAI Lawsuit Is Boring and Insulting, and It’s Already a Success — On the OpenAI trial and Elon Musk telling the world the same few Sam Altman stories that everyone knows and loves . Apple’s Supply Squeeze Apple’s AI Land Grab General Motors Dreamt of Robots 10 Questions and Modest Expectations With Trump in China to Meet Xi Jinping The Wizards Win the NBA Lottery, Post-Lottery Reactions and Questions, A Wemby Ejection and a Wolves Resurrection Cavs and Spurs Approach the Conference Finals, The Knicks Look Better Than Ever, Morey Out and What’s Next for LeBron Inference in the Agentic Future, xAI Is Two Companies in One, Q&A on Elon’s Lawsuit, Intel, Apple

0 views
Stratechery 2 weeks ago

An Interview with Ben Thompson at the MoffettNathanson Media, Internet & Communications Conference

An interview with me about the implications of the compute shortage on Aggregation Theory, consumer AI, and more.

0 views
Stratechery 2 weeks ago

The Deployment Company, Back to the 70s, Apple and Intel

Listen to this post: Good morning, President Trump is on the way to China, and Sharp China is your go-to podcast for understanding what happens next. Add it to your podcast player now in anticipation of the next few episodes breaking down the trip. On to the Update: From Reuters : OpenAI said on Monday it is setting up a new company with more than $4 billion in initial investment to help organizations build and deploy artificial intelligence systems, and will acquire an AI consulting firm, Tomoro, to quickly scale up the unit. After its early models saw strong resonance with consumers, OpenAI has been working aggressively to sign corporate contracts and establish a large presence in the business world where its AI will see large-scale deployment. The venture, which will be majority owned and controlled by OpenAI, also comes as rival Anthropic enjoys strong success in its enterprise AI push with its Claude family of models seeing rapid adoption among businesses. The new firm, called OpenAI Deployment Company, will help the ChatGPT maker embed engineers specializing in frontier AI deployment into organizations that will then work closely with various teams to identify where AI can make the biggest impact, OpenAI said. Its acquisition of Tomoro, a consulting firm that helps enterprises deploy AI, will bring around 150 experienced AI engineers and “deployment specialists” to the new unit from day one. Tomoro was formed in 2023 in alliance with OpenAI, and counts companies such as Mattel, Red Bull, Tesco and Virgin Atlantic as its clients, according to its website. That was on Monday; on Tuesday, from The Information : Google plans to hire hundreds of engineers to help customers start using its business-focused AI products, according to a person familiar with the situation. Google’s new “forward deployed engineers” will form a new team within Google Cloud, the unit’s chief, Thomas Kurian, said on LinkedIn on Tuesday, without disclosing the size of the effort. Matt Renner, Google Cloud’s chief revenue officer, said in a separate post that the move would help Google “show up for our customers with more technical resources (vs just an ocean of salespeople).” The announcement is one of several in the industry in recent weeks as tech companies are deploying armies of humans—often described as “forward deployed engineers”—and partnerships with consulting companies to get customers using AI-driven technology intended to automate work. On Monday, OpenAI launched the “OpenAI Deployment Company” in partnership with consulting and investment firms. Last week, Anthropic announced the creation of a joint venture with private equity firms to sell its AI to the PE firms’ customers. It is, needless to say, tempting to drop some snark about AGI apparently not being good enough to deploy AI, but instead I’m going to go with “as predicted”. In 2024’s Enterprise Philosophy and the First Wave of AI , I made the case that the proper analogy for AI in the enterprise was not SaaS, but rather the first wave of computing in the 1970s. Agents aren’t copilots; they are replacements. They do work in place of humans — think call centers and the like, to start — and they have all of the advantages of software: always available, and scalable up-and-down with demand…Benioff isn’t talking about making employees more productive, but rather companies; the verb that applies to employees is “augmented”, which sounds much nicer than “replaced”; the ultimate goal is stated as well: business results. That right there is tech’s third philosophy: improving the bottom line for large enterprises. Notice how well this framing applies to the mainframe wave of computing: accounting and ERP software made companies more productive and drove positive business results; the employees that were “augmented” were managers who got far more accurate reports much more quickly, while the employees who used to do that work were replaced. Critically, the decision about whether or not to make this change did not depend on rank-and-file employees changing how they worked, but for executives to decide to take the plunge. Specifically, I don’t think that the Deployment Company is going in to help employees use chatbots; that’s even more clearly the case with the PE firms that both OpenAI and Anthropic are doing deals with. I expect there to be an ever-increasing number of deals where PE buys software firms with reliable cash flows and conducts significant layoffs, forcing AI to pick up the slack, solving stock-based compensation issues in the process. I don’t know if the mandate for the Deployment Company is going to be quite so harsh, but I assume this is a company that is hired by the executive suite to fundamentally rethink business processes in a way that hasn’t been done since the mainframe: Most historically-driven AI analogies usually come from the Internet, and understandably so: that was both an epochal change and also much fresher in our collective memories. My core contention here, however, is that AI truly is a new way of computing, and that means the better analogies are to computing itself. Transformers are the transistor, and mainframes are today’s models. The GUI is, arguably, still TBD. To the extent that is right, then, the biggest opportunity is in top-down enterprise implementations. The enterprise philosophy is older than the two consumer philosophies I wrote about previously: its motivation is not the user, but the buyer, who wants to increase revenue and cut costs, and will be brutally rational about how to achieve that (including running expected value calculations on agents making mistakes). That will be the only way to justify the compute necessary to scale out agentic capabilities, and to do the years of work necessary to get data in a state where humans can be replaced. The bottom line benefits — the essence of enterprise philosophy — will compel just that. What I wonder is how much of the work ends up reworking data; that, as I noted in that article, is why I was bullish on Palantir: That leaves the data piece, and while Benioff bragged about all of the data that Salesforce had, it doesn’t have everything, and what it does have is scattered across the phalanx of applications and storage layers that make up the Salesforce Platform. Indeed, Microsoft faces the same problem: while their Copilot vision includes APIs for 3rd-party “agents” — in this case, data from other companies — the reality is that an effective Agent — i.e. a worker replacement — needs access to everything in a way that it can reason over. The ability of large language models to handle unstructured data is revolutionary, but the fact remains that better data still results in better output; explicit step-by-step reasoning data, for example, is a big part of how o1 works. To that end, the company I am most intrigued by, for what I think will be the first wave of AI, is Palantir… That integration looks like this illustration from the company’s webpage for Foundry, what they call “The Ontology-Powered Operating System for the Modern Enterprise”: What is notable about this illustration is just how deeply Palantir needs to get into an enterprise’s operations to achieve its goals. This isn’t a consumery-SaaS application that your team leader puts on their credit card; it is SOFTWARE of the sort that Salesforce sought to move beyond. Google’s Kurian, by the way, did dismiss any sort of Palantir comparison in a Stratechery Interview last month: This all makes perfect sense, particularly this bit about the Knowledge Catalog definitely fits how I’ve been thinking. I wrote about this a few years ago about this importance of this whole layer and understanding it, it’s a bit of a big lift to get this in place. You have some sort of analog, say, with like a Palantir that’s putting in like their ontology thing. They have FDEs out on the site, multi-month projects doing this. You have OpenAI talking about Frontier, their agent layer, and they’re partnering with all the tech consultancies to build this out. Is this going to entail a lot of boots on the ground to get this graph working and functional in a way that your agents can operate effectively across it? TK: We’re not competing with Palantir, we’re not building a semantic dictionary or an ontology. What we’re doing is, today I’ll give you the closest analogy. TK: Today when you use a model, let’s say you use Gemini, and you ask a question, Gemini goes through reasoning, and then it shows you a citation. A citation is, “How did I answer the question and what’s the source I derived from?” Now imagine that citation was a query that needed to go to a folder in, for example, a storage system because there’s some documents there and a database because, for example, in a part number, just think about there’s a part number document that lists all the part numbers and sits in a drive and then that part number you need to fetch out to say it’s the modem that the guy is coming to repair, and that’s mapped to a table in a database. So what the graph does, we use Gemini, so we don’t need humans, we use Gemini to say, “Hey, go and read all these documents in these drives and extract the information from it and then match that to the database table that has the reference to the part number”, and so then when Gemini turns around and says, “I got this query about how much inventory of modems they are”, the first thing it does is it says, “Okay, go to the Knowledge Catalog and it says modem is part number one, two, three, four, five”, and then it says, “By the way the table in the database that has the inventory information about this part number is this table, here’s a SQL”, it then makes the quality of what we generate higher and then when it answers the question it shows back — back to your, “Trust my data”, it shows a grounding citation saying, “That’s where we got it from.” Well, so much for not needing humans! I joke, mostly — Kurian was referring to not needing a Palantir-like ontology, not necessarily dismissing the need for FDEs — but it sure is interesting how AI is creating the need for new kinds of jobs. It’s almost as if the world is more dynamic, and pure intelligence, unadulterated by what already exists and the burden of reflexivity, is more static, than the most pessimistic prognosticators may have anticipated. More prosaically, OpenAI and Anthropic need the revenue, enterprises need the imagination, and Google needs to stay in the game. From the Wall Street Journal : Apple and Intel have reached a preliminary agreement for Intel to manufacture some of the chips that power Apple devices, according to people familiar with the matter. Intensive talks between the two companies have been ongoing for more than a year, and they hammered out a formal deal in recent months, these people said. Bloomberg News previously reported the talks. It’s still unclear which Apple products Intel would make chips for, these people said. Apple ships more than 200 million iPhones a year as well as millions of iPads and Mac computers. Ming-Chi Kuo reported on X late last year that Intel would make Apple’s most basic M processor on its 18A process; he didn’t specify which generation. Regardless, while the Wall Street Journal cites Trump administration pressure, and an earlier Bloomberg article Apple’s concentration risk on TSMC and Taiwan, the most obvious reason for a deal — assuming it exists — is economic. Specifically, Apple has for two quarters running said it can’t satisfy demand because it can’t get enough capacity at TSMC. CEO Tim Cook referenced this point multiple times on the last earnings call , but I think this was the most important articulation: The constraint in the March quarter and the June quarter, the primary constraint is the availability of the advanced nodes our SoCs are produced on, not memory. And so I don’t want to predict for supply and demand to match because if I look at it realistically, I think on the Mac mini and the Mac Studio, I believe it will take several months to reach supply-demand balance. And so we’re not at the point where we’re saying this is going to end anytime soon. And it’s not because of a problem per se other than we just undercalled the demand. And there are lead times to this, as you well understand, and it takes a while to correct that. And the primary constraint from a product point of view, or the majority of it for this quarter, for the June quarter will be on the Mac. And it’s Mac mini, Mac Studio and the MacBook Neo. It’s all of those. Cook talked about lead times last quarter as well, and the important thing to note is that while it does take five months or so to make new chips, assuming Apple realized it needed more iPhone 17 Pro chips right away, those new A19 Pro lines only started producing chips partway through last quarter (which is why iPhone 17 Pro sales weren’t as high as they could be). Critically, however, what seems likely is that Apple took capacity away from the Mac to make more iPhone chips, and now doesn’t have enough chips for the Mini and Studio either. The long-and-short of it is this: Apple doesn’t have flexible access to TSMC capacity anymore, because so much of that capacity is going to AI in particular, and it’s costing Apple meaningful money across multiple product lines. This was always the thing that would bring companies to Intel; I wrote in TSMC Risk : Becoming a meaningful customer of Samsung or Intel is very risky: it takes years to get a chip working on a new process, which hardly seems worth it if that process might not be as good, and if the company offering the process definitely isn’t as customer service-centric as TSMC. I understand why everyone sticks with TSMC. The reality that hyperscalers and fabless chip companies need to wake up to, however, is that avoiding the risk of working with someone other than TSMC incurs new risks that are both harder to see and also much more substantial. Except again, we can see the harms already: foregone revenue today as demand outstrips supply. Today’s shortages, however, may prove to be peanuts: if AI has the potential these companies claim it does, future foregone revenue at the end of the decade is going to cost exponentially more — surely a lot more than whatever expense is necessary to make Samsung and/or Intel into viable competitors for TSMC. This, incidentally, is how the geographic risk issue will be fixed, if it ever is. It’s hard to get companies to pay for insurance for geopolitical risks that may never materialize. What is much more likely is that TSMC’s customers realize that their biggest risk isn’t that TSMC gets blown up by China, but that TSMC’s monopoly and reasonable reluctance to risk a rate of investment that matches the rest of the industry means that the rest of the industry fails to fully capture the value of AI. We’re already here (reportedly). TSMC’s failure to invest aggressively enough over the last several years will, in the end, give Intel the single most important thing it needs to become a viable competitor: the customer who did more than any other to make TSMC into the leader in the first place. This Update will be available as a podcast later today. To receive it in your podcast player, visit Stratechery . The Stratechery Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a subscriber, and have a great day!

0 views
Stratechery 2 weeks ago

SpaceX and Anthropic, xAI’s Two Companies, Elon Musk and SpaceXAI’s Future

The Anthropic xAI deal is shocking but not surprising: Musk should double down on serving other companies.

0 views
Stratechery 2 weeks ago

The Inference Shift

Read more of this content when you subscribe today. If you were looking for the ideal time to IPO, being a chip company in May 2026 is hard to beat. Reuters reported over the weekend : Cerebras Systems is set to raise the size and price of its initial public offering as soon as Monday, as demand for the artificial intelligence chipmaker’s shares continues to climb, two people familiar with the matter told Reuters on Sunday. The company is considering a new IPO price range of $150-$160 a share, up from $115-$125 a share, and raising the number of shares marketed to 30 million from 28 million, said the sources, who asked not to be identified because the information isn’t public yet. The fundamental driver of the ongoing surge in semiconductor stocks is, of course, AI, particularly the realization that agents are going to need a lot of compute . What Cerebras represents, however, is something broader: while the compute story for AI has been largely about GPUs, particularly from Nvidia, the future is going to look increasingly heterogeneous. The story of how Graphics Processing Units became the center of AI is a well-trodden one, but in brief: The number one use case for GPUs has been training, which stresses the third point in particular. While the calculations within each training step are massively parallel, the steps themselves are serial: every GPU has to share its results with every other GPU before the next step can begin. This is why a trillion-parameter model needs to fit in the aggregate memory of tens of thousands of GPUs that can communicate as one system. Nvidia dominates both problem spaces, first by securing HBM ahead of the rest of the industry, and second thanks to its investments in networking. Of course training isn’t the only AI workload: the other is inference. Inference has three main parts: The two decode steps alternate for every layer of the model (they’re interleaved, not in sequence), which is to say that decode is serial and memory-bandwidth bound. For every token generated, two distinct memory pools must be read: the KV cache, which stores context and grows with each token, and the model weights themselves. Both must be read in full to produce a single output token. GPUs handle all three needs: high compute for prefill, abundant HBM for KV cache and model weights, and chip-to-chip networking to pool memory across multiple chips when a single GPU isn’t enough. In other words, what works for training works for inference — look no further than the deal SpaceX made with Anthropic. From Anthropic’s blog : We’ve signed an agreement with SpaceX to use all of the compute capacity at their Colossus 1 data center. This gives us access to more than 300 megawatts of new capacity (over 220,000 NVIDIA GPUs) within the month. This additional capacity will directly improve capacity for Claude Pro and Claude Max subscribers. SpaceX retains Colossus 2 — presumably for both training of future models and inference of existing ones — and can afford to do both in the same data center precisely because xAI’s models aren’t getting much usage; more pertinently to this piece, they can do both in the same data center because both training and inference can be done on GPUs. Indeed, the GPUs Anthropic is contracting for at Colossus 1 were originally used for training as well; the fact that GPUs are so flexible is a big advantage. Cerebras makes something completely different. While a silicon wafer has a diameter of 300mm, the “reticle limit” — the maximum area that a lithography tool can expose on that wafer — is around 26mm x 33mm. This is the effective size limit for chips; going beyond that entails linking two separate chips together over a chip-to-chip interposer, which is exactly what Nvidia has done with the B200. Cerebras, on the other hand, has invented a way to lay down wiring across the so-called “scribe lines” that are the boundary between reticle exposures, making the entire wafer into a single chip with no need for relatively slow chip-to-chip linkages. The net result is a chip with a lot of compute and a lot of SRAM that is blisteringly fast to access. To put it in numbers, the WSE-3 (Cerebras’ latest chip) has 44GB of on-chip SRAM at 21 PB/s of bandwidth; an H100 has 80GB of HBM at 3.35 TB/s. In other words, the WSE-3 has just over half the memory of an H100, but 6,000 times the memory bandwidth. The reason to compare the WSE-3 to an H100 is that the H100 is the chip most used for inference — and inference is clearly what Cerebras is most well-suited for. You can use Cerebras chips for training, but the chip-to-chip networking story isn’t very compelling, which is to say that all of that compute and on-chip memory is mostly just sitting around; what is much more interesting is the idea of getting a stream of tokens at dramatically faster speed than you can from a GPU. Note, however, that the limitation in terms of training also potentially applies in terms of inference: as long as everything fits in on-chip memory Cerebras’ speed is an incredible experience; the moment you need more memory, whether that be for a larger model or, more likely, a larger KV cache, then Cerebras doesn’t make much sense, particularly given the price. That whole-wafer-as-chip technique means high yields are a massive challenge, which hugely drives up costs. At the same time, I do think there will be a market for Cerebras-style chips: right now the company is highlighting the usefulness of speed for coding — reasoning means a lot of tokens, which means that dramatically scaling up tokens-per-second equals faster thinking — but I think this is a temporary use case, for reasons I’ll explain in a bit. What does matter is how long humans are waiting for an answer, and as products like AI wearables become more of a thing, the speed of interaction, particularly for voice — which will be a function of token generation speed — will have a tangible effect on the user experience. I have previously made the case, including in Agents Over Bubbles , that we have gone through three inflection points in the LLM era: All of this falls under the banner of “inference”, but I think it will be increasingly clear that there is a difference between providing an answer — what I will call “answer inference” — and doing a task — what I will call “agentic inference.” Cerebras’ target market is “answer inference”; in the long run, I think the architecture for “agentic inference” will look a lot different, not just from Cerebras’ approach, but from the GPU approach as well. I mentioned above that fast inference for coding is a temporary use case. Specifically, coding with LLMs requires a human in the loop. It’s the human that defines what is to be coded, checks the work, commits the pull request, etc.; it’s not hard to envision a future, however, where all of this is completely handled by machines. This will apply to agentic work broadly: the true power of agents will not be that they do work for humans, but rather that they do work without human involvement at all. This, by extension, will mean that the likely best approach to solving agentic inference will look a lot different than answer inference. The most important aspect for answer inference is token speed; the most important aspect for agentic inference, however, is memory. Agents need context, state, and history. Some of that will live as active KV cache; some will live in host memory or SSDs; much of it will live in databases, logs, embeddings, and object stores. The important point is that agentic inference will be less about GPUs answering a question and more about the memory hierarchy wrapped around a model. Critically, this articulation of an agentic-specific memory hierarchy implies a necessary trade-off of speed for capacity. Here’s the thing, though: lower speed isn’t nearly as important a consideration if there isn’t a human in the loop. If an agent is waiting around for a job that is being run overnight, the agent doesn’t know or care about the user experience impact; what is most important is being able to accomplish a task, and if entirely new approaches to memory make that possible, then delays are fine. Meanwhile, if delays are fine, then all of the focus on pure compute power and high-bandwidth memory seems out of place: if latency isn’t the top priority, then slower and cheaper memory — like traditional DRAM, for example — makes a lot more sense. And if the entire system is mostly waiting on memory, then chips don’t need to be as fast as the cutting edge either. This represents a profound shift in future architectures, but it also doesn’t mean that current architectures are going away: At the same time, these categories won’t be equal in size or importance. Specifically, agentic inference will be the largest market by far, because that is the market that won’t be limited by humans or time. Today’s agents are fancy answer inference; in the future true agentic inference will be work done by computers according to dictates given by other computers, and the market size scales not with humans but with compute. To date the invocation of “scaling with compute” has implicitly meant Nvidia bullishness. However, much of Nvidia’s relative advantage to date has been a function of latency: Nvidia chips have fast compute, but keeping that compute busy has required big investments in ever-expanding HBM memory and networking. If latency isn’t the key constraint, however, then Nvidia’s approach seems less worth paying a premium for. Nvidia does recognize this shift: the company launched an inference framework called Dynamo that helps disaggregate different parts of inference, and is shipping products like standalone memory and CPU racks to enable increasingly large KV caches and faster tool use, the better to keep their expensive GPUs busy. Ultimately, however, it’s easy to see cost and simplicity being increasingly attractive to hyperscalers for agentic inference that isn’t remotely GPU-bound. China, meanwhile, for all of its lack of leading edge compute, has everything it needs for agentic inference: fast-enough (but not leading-edge) GPUs, fast-enough (but not leading-edge) CPUs, DRAM, hard drives, etc. The challenge, of course, is compute for training; it’s also possible that answer inference is more important for national security, at least when it comes to military applications. The other interesting angle is space: slower chips actually make space data centers more viable for a number of reasons. First, if memory can be offloaded, chips can be made much simpler and run much cooler. Second, older nodes, by virtue of being physically larger, will better withstand space radiation. Third, older nodes require less power, which means there will be less heat to dissipate via radiation. Fourth, not being on the bleeding edge will mean higher reliability, an important consideration given that satellites won’t be repairable. Nvidia CEO Jensen Huang regularly says that “Moore’s Law is Dead”; what he means is that the future of computing speed-ups will be a function of systems innovation, which is exactly what Nvidia has done. Maybe the most profound implication of agents that act without humans in the loop, however, will be that Moore’s Law doesn’t matter, and that the way we get more compute is by realizing that the compute we have is already good enough. Just as drawing pixels on a computer screen was a parallel process, which meant there was a direct connection between the number of processing units and graphics speed, making AI-related calculations was a parallel process, which meant there was a direct connection between the number of processing units and calculation speed. Nvidia enabled this dual-usage by making its graphics processors programmable, and created an entire software ecosystem called CUDA to make this programming accessible. The big difference between graphics and AI has been the size of the problem being solved — models are a lot bigger than video game textures — which has led to a dramatic expansion in high-bandwidth memory (HBM) per GPU, and dramatic innovations in terms of chip-to-chip networking to allow multiple chips to work together as one addressable system. Nvidia has been the leader in both. Prefill encodes everything the LLM needs to know into an understandable state; this is highly parallelizable and compute matters. The first part of decode entails reading the KV cache — which stores context, including the output of the prefill step — to make an attention calculation. This is a serial step where bandwidth matters, but the memory requirements are variable and increasingly large. The second part of decode is the feed-forward computation over the model weights; this is also a serial step where bandwidth matters, and the memory requirements are defined by the size of the model. ChatGPT demonstrated the utility of token prediction. o1 introduced the idea of reasoning, where more tokens meant better answers. Opus 4.5 and Claude Code introduced the first usable agents, which could actually accomplish tasks, using a combination of reasoning models and a harness that utilized tools, verified work, etc. Training will continue to matter, and Nvidia’s current architecture, including high-speed compute, large amounts of high-bandwidth memory, and high-speed networking, will likely continue to dominate. Answer inference will be a meaningful market, albeit a relatively small one, and speed from chips like Cerebras or Groq (I explained how Nvidia is deploying Groq’s LPUs here ) will be very useful. Agentic inference will gradually unbundle the GPU, which alternates between stranding high-bandwidth memory (during the prefill process) and stranding compute (during the decode process), in favor of increasingly sophisticated memory hierarchies dominated by high capacity and relatively lower cost memory types, with “good enough” compute; indeed, if anything it will be the speed of CPUs for things like tool use that will matter more than the speed of GPUs.

0 views
Stratechery 3 weeks ago

2026.19: Earning & Spending

Welcome back to This Week in Stratechery! As a reminder, each week, every Friday, we’re sending out this overview of content in the Stratechery bundle; highlighted links are free for everyone . Additionally, you have complete control over what we send to you. If you don’t want to receive This Week in Stratechery emails (there is no podcast), please uncheck the box in your delivery settings . On that note, here were a few of our favorites this week. This week’s Sharp Tech video is on Messaging AI in 2026. What We Learned from Big Tech’s First Quarter. Apple, Amazon, Meta, Google and Microsoft all reported earnings last week, and as four of the five megacaps continue to pour massive sums into AI (first quarter CapEx was more than three times that of the Manhattan Project), there are no signs of that pace slowing. Ben broke it all down across several days, including divergent market reactions to great Google numbers and Meta numbers that were arguably even better , as well as the stories for Microsoft and Apple after Q1 . Sandwiched between those Daily Updates, Tuesday’s Article zoomed out to connect Amazon’s infrastructure spending history with its AI strategy going forward . All of it was a great way to parse numbers that continue to boggle the mind, and strategy that actually looks a lot more rational than the numbers sound.  — Andrew Sharp A Conversation with Joanna Stern. How does one write a book about a tech story that seems to change every other week? Joanna Stern accepted that challenge, and explained how it went in this week’s Stratechery Interview . The resulting conversation is a delightful glimpse into the process for one of the most creative tech writers alive and the making of a book that Ben loved. Stern’s shares her thoughts on using an LLM to make a career change, as well as how AI is changing medicine (and mammograms), and limits of LLMs that are still very real. To the latter point, if you’d like to learn more about how ChatGPT misdiagnosed a preying mantis pregnancy, start with this week’s interview, and then  you can buy the book here.   — AS What’s Next for the Celtics? Like many others across the media, I picked the Boston Celtics to make the NBA Finals in June. Alas, they barely made it out of April and were eliminated in the first round to Joel Embiid and the 76ers. The GOAT podcast recapped that disaster first on Monday with a salute to the Sixers (now bittersweet after two losses to the Knicks), and on Thursday’s episode, a longer look at the mess in Boston and a variety of thorny choices from here. Get caught up on all of that and the rest of Playoffs, and if you need an additional hoops fix, this week’s Sharp Text is a salute to the maddening charms of the Minnesota Timberwolves .  — AS Google Earnings, Meta Earnings — Wall Street loved Google’s earnings, and hated Meta’s, even though the latter’s core business was more impressive. The difference is that Google is monetizing its investments now (and it might be all Anthropic). Amazon’s Durability — Amazon looked behind in AI in the training era, but is well place in the inference era, thanks to its continued investment in the long-term. Microsoft Earnings, Apple Earnings — Microsoft unveils its new agentic business model, and Apple confronts shortages in memory and chips even as the Mac benefits from AI. An Interview with Joanna Stern About Living With AI — An interview with Joanna Stern about her new book about living with AI, and starting her own media company. The Wolves Are Why We Do This — A salute to the playoff Timberwolves. Plus: Notes on Vogue history, NBA upheaval, and the “geo” in geopolitics. Google and Meta Earnings Anthropic and xAI Sweden Made DC Great Again The Sixers Get Their Moment in the Sun, A Nightmare for Celtics Fans, Thoughts on the Way Into the Second Round A Championship Response from the Spurs, What’s Next for the Celtics?, Pre-Lottery Thoughts and Emotions

0 views
Stratechery 3 weeks ago

An Interview with Joanna Stern About Living With AI

An interview with Joanna Stern about her new book about living with AI, and starting her own media company.

1 views
Stratechery 3 weeks ago

Microsoft Earnings, Apple Earnings

Microsoft unveils its new agentic business model, and Apple confronts shortages in memory and chips even as the Mac benefits from AI.

0 views
Stratechery 3 weeks ago

Amazon’s Durability

Listen to this post : When it comes to the AI soap opera — there is news every day, and the company on top and the bottom seems to shift by the quarter if not the month — the news that I find most intriguing and instructive this week is about physical goods and logistics. From Bloomberg : Amazon.com Inc. unveiled a suite of logistics services that will let businesses buy its existing freight and distribution offerings as a package, sending shares of rival delivery companies such as FedEx Corp. and United Parcel Service Inc. lower. The world’s largest online retailer on Monday announced Amazon Supply Chain Services (ASCS), offering other companies access to its “full portfolio” of supply-chain and distribution offerings. The service largely consolidates a package of existing products — air and ocean freight, trucking and last-mile delivery — into a new suite it says companies like Procter & Gamble Co. and 3M Co. are already using. This is a very satisfying announcement for Stratechery, given it’s the culmination of a prediction I made a decade ago in The Amazon Tax . Amazon at that point had two primary businesses — Amazon.com and AWS — and I made the case in that Article that they were actually very similar: in both cases Amazon built “primitives” that had Amazon itself as their first, best customer, justifying and driving initial development, but in both cases the ultimate play was to sell those primitives to other companies. It was already clear at the time that logistics would follow the same path: It seems increasingly clear that Amazon intends to repeat the model when it comes to logistics: after experimenting with six planes last year the company recently leased 20 more to flesh out its private logistics network; this is on top of registering its China subsidiary as an ocean freight forwarder… So how might this play out? Well, start with the fact that Amazon itself would be this logistics network’s first-and-best customer, just as was the case with AWS. This justifies the massive expenditure necessary to build out a logistics network that competes with UPS, FedEx, et al, and most outlets are framing these moves as a way for Amazon to rein in shipping costs and improve reliability, especially around the holidays. However, I think it is a mistake to think that Amazon will stop there: just as they have with AWS and e-commerce distribution I expect the company to offer its logistics network to third parties, which will increase the returns to scale, and, by extension, deepen Amazon’s eventual moat. Now, ten years later, we are here, with the official unveiling of Amazon Supply Chain Services , and I think the time frame is an important one: Amazon, more than any other company, actually operates with decade-long timeframes, consistently making real-world investments at massive scale that (1) convert their marginal costs into capital costs and (2) gain leverage on those capital costs by selling them to other businesses. This is, by the way, still a story about AI. Three years ago SemiAnalysis wrote an Article entitled Amazon’s Cloud Crisis: How AWS Will Lose The Future Of Computing , and I found it very compelling. First, though, some history (much of which is covered in SemiAnalysis’ article). Amazon not only invented cloud computing, but also realized it would be a commodity market. While most people in tech think about building sustainable differentiation that allows you to charge higher prices, thus producing profit, commodity markets work differently: there, sustainable profits come from having structurally cheaper costs. Amazon developed exactly that, first through having the largest scale — giving the company both buying power and also the most leverage on their development costs — and second through genuine innovation. AWS built a specialized system called Nitro, built on their own chips, that offloaded server management, including network management, storage management, hypervisor management, etc. from the expensive Intel and AMD servers that the company sold access to; this let Amazon run that many more virtual machines on a single server, significantly increasing utilization, i.e. delivering a structural cost advantage. Amazon doubled down on their custom chip efforts with Graviton, their ARM processors. Graviton chips, particularly the first few generations, were inferior to Intel or AMD chips, but that didn’t mean they were useless. By that time AWS had expanded from simply being an Infrastructure-as-a-Service (IaaS) provider to being a Platform-as-a-Service (PaaS) provider as well. IaaS means you provide raw compute, storage, etc., on which customers can run things like operating systems or databases; PaaS means you provide that basic functionality as a service. Amazon Relational Database Service (RDS), for example, is a fully managed database that customers can access via a set of APIs without having to worry about actually managing the full database themselves, worrying about scaling, duplication, etc. This, by extension, means that customers don’t need to know and don’t need to care about the compute infrastructure that undergirds services like RDS — which has long been Graviton! PaaS lets Amazon double-dip in terms of profitability: first, AWS could sell PaaS products at a higher margin than IaaS products, and second, the company could leverage its own cheaper silicon to serve those products, reducing their costs. Over time Graviton has become more competitive in performance — while still being cheaper — giving Amazon a lower-cost compute instance to sell to end users, but even without 3rd-party take-up the investment in building its own silicon has paid off over time. Fast forward to AI, and SemiAnalysis’ concern was that all of these optimizations left AWS ill-prepared for AI. One big problem was networking: Rather than implement the best networking from Nvidia and/or Broadcom, Amazon is using its own Nitro and Elastic Fabric Adaptor (EFA) networking. This works well for many workloads, plus it delivers a cost, performance, and security advantage. There are business, cultural, and security reasons why Amazon will not implement other networking. The cultural one is important. Nitro and networking SoC’s generally have been Amazon’s biggest cost advantage for years. It’s ingrained into their DNA. Even EFA delivers on this too, but they don’t see how new workloads are evolving and that a new tier is needed due to the lack of foresight in their internal workload and infrastructure teams. Amazon is making a deliberate choice of not adopting that we believe will bite them in the future. Another was Amazon’s insistence on building its own chips, which were not only inferior to the best Nvidia chips in terms of performance, but might also lead to them getting fewer Nvidia chips going forward: At least some other clouds will implement out-of-node NVLink. That’s where the discussion of prioritization now comes in. AI GPUs face tremendous shortages, for at least a full year. This is one of the most pivotal times for AI, and it may mark the haves and the have-nots. Nvidia is a complete monopoly right now. Why would Nvidia prioritize Amazon for these GPUs, when they know Amazon will move to their in-house chips as quickly as they can, for as many compute workloads as they can? Why would Nvidia ship tons of GPUs to the cloud that is not using any of their networking, thereby reducing their share of wallet? Instead, Nvidia prioritizes the me-too clouds. Amazon does get meaningful volume, but nowhere close to where demand is. Amazon’s H100 GPU shipments relative to public cloud shipments is a significantly lower than their share of the public cloud. Those other clouds also can’t satisfy demand, but they get a bigger percentage of the GPUs they ask Nvidia for, and as such, firms looking for GPUs for training or inference will move to those clouds. Nvidia is the kingmaker right now, and they are capitalizing on it. They have to spread the balance of power out to prevent compute share from clustering towards Amazon. These concerns were well-founded in the 2023 time-period when that Article was written: that was a time when AI, thanks to ChatGPT, had hit the mainstream, but the largest share of compute still went to training. Training required all of the things that Amazon lacked, particularly the ability to network large numbers of Nvidia GPUs together into one coherent system. In such a system the most important capability was horizontal networking between chips, so that you could update weights during training, a step that needed to happen serially. It was absolutely the case that cloud providers like Microsoft or Oracle or the neoclouds, which implemented full Nvidia solutions, instead of the standalone HGX racks that AWS favored, were much better suited to training large language models. That is still the case, by the way. What has changed is that training is no longer the biggest AI compute market; inference is, thanks not only to increased AI adoption, but also because of fundamental changes in terms of how AI works. From an Update about Nvidia : Both the shift to inference and the shift in the nature of inference have been positives for AWS’ approach. The utilization point is an important one. Nvidia CEO Jensen Huang made his case for Nvidia chips over custom ASICs at length at GTC 2025 . Huang’s argument was that AI factories — to use his term — were ultimately constrained by power; that meant that the most important metric for profitability was not the cost of chips but rather tokens-per-watt. In other words, if you can’t increase watts, it’s worth spending more on chips to increase tokens on those watts. There are, however, three reasons why this argument may not hold, particularly for a company like Amazon. These points are moot, however, if you don’t have your own logic chip that is at least competitive, and here Amazon’s long-term outlook is paying off. Amazon bought Annapurna Labs, which makes their chips, in 2015, and launched their first AI-focused chip in 2019. No, it wasn’t very good, but critically, that was seven years ago: now Trainium 3 is decent and the trajectory is even better. AWS is positioned to have a sustainable cost advantage for inference going forward. Moreover, they are already replaying the Graviton playbook. Trainium chips help undergird Bedrock, its AI platform, which is to say that users are using Trainium chips even if they didn’t explicitly choose to do so. AWS CEO Matt Garman made this point explicitly in a Stratechery Interview : I think just with GPUs, by the way, you’re going to interact with a lot of these accelerator chips through abstractions. So the vast majority of customers don’t interact with GPUs either, except through maybe like in their laptop or something like that, for graphics. But when you’re talking to OpenAI, even if they’re running on GPUs, you’re not talking to the GPUs, if you’re talking to Claude, you’re through GPUs or Trainium or TPUs, you’re not talking to any of those chips, you’re talking to the interface. And the vast majority of inference out there is being done on one of a handful of models. And so whether it’s 5, 10, 20, 100, it’s not millions of people that are programming to those things directly, and that’s gonna be true going forward just because these systems are so complex, they’re very large. If you’re going to go train a model, not that many people have enough money to go train a model, not that many people have the expertise to actually manage it. They’re very complicated systems, and the OpenAI team is incredible in their ability to squeeze value out of a very large compute cluster. But not that many people have the team that can do that, independent of what the chip happens to be, and so I think that that’s going to be true for all accelerator chips, honestly. The frontier models are an important factor in this, and that is an angle that I didn’t see coming. Nvidia CEO Jensen Huang explained in a recent interview with Dwarkesh Patel why Nvidia didn’t invest in Anthropic early on: At the time, I didn’t deeply internalize how difficult it would be to build a foundation AI lab like OpenAI and Anthropic, and the fact that they needed huge investments from the supplier themselves. We just weren’t in a position to make the multi-billion dollar investment into Anthropic so that they could use our compute. But Google and AWS were. They put in huge investments in the beginning so that Anthropic, in return, used their compute. We just weren’t in a position to do that at the time. I would say my mistake is I didn’t deeply internalize that they really had no other options, that a VC would never put in $5-10 billion of investment into an AI lab with the hopes of it turning out to be Anthropic. So that was my miss. But even if I understood it, I don’t think we would’ve been in a position to do that at the time. But I’m not going to make that same mistake again. Amazon had both the money and the chips to invest into Anthropic precisely because they had built such a cash machine with AWS in the first place. That’s the thing with big investments in infrastructure: they take years to build, but the benefit of that investment compounds over time. Anthropic, meanwhile, thanks to those investments from Amazon and Google, can not only run across a variety of chips, but for a long time was the only frontier model available on all of the leading clouds, an important selling point for enterprises. Microsoft, in the end, needed to let go of Azure’s exclusive access to OpenAI’s API in part because that exclusivity was hurting the prospects of their mammoth stake in OpenAI. You can also make the case that Amazon is the best choice for frontier model access in a world of limited compute: Microsoft’s core business is software, which is to say that the company faces massive pressure to invest in their own AI capabilities, even at the cost of de-prioritizing cloud customers. That’s exactly what happened at Microsoft earlier this year , when the company missed Azure growth projections because they devoted more compute to their internal workloads. It was an understandable decision: cloud demand is eternal, but the risk from AI for existing software businesses is existential. This also applies to Google: the company’s core business is also digital, and while search has fended off the threat from chatbots that many expected, the fundamental challenge is still one to be managed, not extinguished. Amazon’s core businesses, meanwhile, are very much rooted in the physical world: selling and shipping physical goods, and building data centers. Both are amenable to Amazon devoting the majority of its chips to customers’ workloads. If this week marks the resolution of one of Amazon’s long bets, you can see the outline of future resolutions in present day announcements. One prominent example is Amazon Leo, the company’s satellite service that seems, at first glance, duplicative of SpaceX’s Starlink, which has the advantage of already existing at scale. Remember Amazon’s formula, however, which CEO Andy Jassy stated explicitly with regards to Leo on the company’s most recent earnings call : Today, if you ask what stops us from growing the business, we have to get the constellation into space. We have over 20 launches planned this year. We have over 30 launches planned in 2027. But I think the business has a chance to be a very large many billion-dollar revenue business. And I think it has some characteristics that are reminiscent of AWS in that it’s capital-intensive upfront where you’re committing a lot of capital and cash in the early years for assets that you get to leverage over a long period of time. And so I like the free cash flow and return on invested capital characteristics of that business in the medium to long term. The fact that it is extremely capital-intensive is not the only thing about Leo that makes it like AWS: a critical factor is that Amazon is the first-best customer to give the service scale, and here it’s worth going back to logistics. I noted above that Amazon delivery still has marginal costs, and that is because humans have to make the delivery. Amazon, however, has already pointed to the future, a full 13 years ago when the company first started talking publicly about drone delivery. It’s been a long slog, to be sure, but it’s increasingly plausible to imagine a future where delivery costs are a matter of depreciation on drone assets, and what would such a future require? How about reliable widespread satellite coverage for communicating with and guiding those drones? And, if Amazon doesn’t want to be dependent on Jensen Huang for chips, do you think they want to be dependent on Elon Musk for drone connectivity? Of course other businesses — like Apple — will be able to pay to use Amazon’s satellite infrastructure, just like they can now pay to use Amazon’s delivery service, or pay to use AWS, or pay to sell on Amazon.com. The world may change, in increasingly drastic ways, but Amazon’s approach, by virtue of its focus on long-term investments in the physical world, appears to be as sturdy as ever. More generally, I increasingly suspect that long-term vulnerability to AI — or, to put it more positively, long-term incentives to invest in AI — are very strongly correlated with the degree to which a company interacts with the physical world, and secondarily, the degree to which companies feel secure in their control of distribution: This is, in the end, another advantage to making the sort of long-term bets Amazon specializes in: the threats are so distant that you have plenty of time to make new investments that address any weaknesses that develop in the meantime — or, as is the case of AI, wait for the market to tilt in your favor. The first inflection point was the emergence of LLMs — call this the ChatGPT moment. In this first paradigm tokens were generated by GPUs and presented as the answer to a question. The second inflection point was the emergence of reasoning models — call this the o1 moment. In this paradigm there are a very large number of tokens that are generated to figure out the answer before the answer is actually generated; this was an exponential increase in the addressable market for tokens. The third inflection point was the emergence of functional agents — call this the Opus 4.5 moment. In this paradigm those reasoning models are not triggered by humans asking a question, but by an agent solving a problem. This increases the market in two directions: first, humans can run multiple agents, and secondly, agents can leverage reasoning models multiple times to accomplish a task. This isn’t just an exponential increase in the addressable market for tokens, it’s two exponential increases squared. First, while inference still requires significant memory, the requirement is significantly less than that required for training. It’s actually viable to store a model’s parameters in a single server; you don’t need to network together thousands of chips. Second, while reasoning and agentic workloads require significantly more tokens, and thus a massively larger KV cache, the increase is actually so large that even the most optimized Nvidia inference systems are being built with dedicated memory servers . This sort of architecture is much more compatible with Amazon’s networking approach than the thousands-of-chips-networked-together approach is. Third, agents are heavily CPU dependent, which has two important implications. First, fully utilizing accelerators is a function of having sufficient general compute; second, achieving maximum utilization of heterogeneous compute means unbundling CPUs and GPUs and routing workloads between resources, which is exactly the sort of disaggregated-resource abstraction that Amazon has been building with Nitro. First, if you have the money to buy that many Nvidia chips, you also have the money to spend on getting more power — which is exactly what AWS has been focused on. This very much fits AWS’ modus operandi, which is to invest more upstream (in this case in power) with the goal of spending less downstream (paying Nvidia huge margins for their chips). Second, in the long term, electricity is more of a commodity than logic is. That means it is a market where innovation and competition are more likely to break a bottleneck, which is another way to say that investing in one’s own silicon is the area most likely to deliver a return on investment. Third, the nature of inference workloads — particularly agentic ones — is such that perfect accelerator utilization is going to be a much harder problem to solve than when it comes to training. Apple and Amazon feel comfortable not having leading edge models, just access to them, because their business is rooted in the physical. Microsoft has invested heavily in data centers, but doesn’t own their own model, perhaps because they feel their control of distribution to enterprises will protect their core business (or because they had too much of a dependency on OpenAI). Google and Meta are investing at a similar scale to Amazon, and are also heavily invested in their own models. Both are Aggregators, which is to say they have to continually earn attention from consumers, given that competition is only a click away; having good AI is existential to them.

0 views
Stratechery 3 weeks ago

Google Earnings, Meta Earnings

Wall Street loved Google's earnings, and hated Meta's, even though the latter's core business was more impressive. The difference is that Google is monetizing its investments now (and it might be all Anthropic).

0 views
Stratechery 4 weeks ago

2026.18: Long-term, Peripheral & Myopic Visions

Welcome back to This Week in Stratechery! As a reminder, each week, every Friday, we’re sending out this overview of content in the Stratechery bundle; highlighted links are free for everyone . Additionally, you have complete control over what we send to you. If you don’t want to receive This Week in Stratechery emails (there is no podcast), please uncheck the box in your delivery settings . On that note, here were a few of our favorites this week. This week’s Stratechery video is on Tim Cook’s Impeccable Timing . Amazon and AI . When it comes to AI, every quarter seems to bring a new winner and loser. For my part, the company that I find increasingly compelling is Amazon . Things didn’t look promising a couple of years ago, when training was the most important infrastructure use case, but Amazon — whether through vision or good fortune — was positioning itself well for a world defined by inference (given that their inference chip is called “Trainium”, I’m going with a little bit of column A and a little bit of column B). Now the company is adding OpenAI’s models to its offerings, and collaborating with the frontier lab on an entirely new kind of enterprise product: Bedrock Managed Agents, the subject of a Stratechery Interview with AWS CEO Matt Garman and OpenAI CEO Sam Altman . — Ben Thompson The Future of AR Devices.  Amidst a never-ending conversation about AI, software and infrastructure spending, it was refreshing this week to dream about the possibilities for the future of hardware. Ben’s Daily Update on Monday traced his experience with the Meta Display glasses and culminated with an epiphany on what the future of AR should look like. We dove deeper on Sharp Tech with an extended conversation about why the Display glasses are superior to Meta’s Orion prototype, notes on what future VR headsets should emphasize, and whether phones (or books?) should be characterized as AR devices.  — Andrew Sharp Beijing’s Myopia in AI and Elsewhere. On Sharp China this week Bill and I unpacked the implications of a terrific mess in Singapore , as China’s National Development and Reform Commission has moved to block Meta’s $2 billion acquisition of Manus, a formerly Chinese AI company that had reincorporated in Singapore and had already received payment and integrated its products and employees into Meta’s operations. Then, on Sharp Text this morning, I wrote about Beijing’s geopolitical behavior in 2026 , what Western media tends to get wrong, and — with the Manus decision being a good example — why the CCP’s geopolitical and domestic strategies are generally reactive, not proactive, and often counterproductive. — AS AI Hardware, Meta Display, Redefining VR and AR — I finally tried the Meta Ray-Ban Display, and it completely changed how I think about AR and VR. An Interview with OpenAI CEO Sam Altman and AWS CEO Matt Garman About Bedrock Managed Agents — An interview with OpenAI CEO Sam Altman and AWS CEO Matt Garman about their new partnership, plus my thoughts on OpenAI and Microsoft’s new deal. Intel Earnings, Intel’s Differentiation?, Whither Terafab — Intel’s earnings were very impressive, but the chief driver was a structural shift in demand for CPUs for AI. Plus, what is going on with Terafab? Amazon Earnings, Trainium and Commodity Markets, Additional Amazon Notes — Amazon’s earnings suggest that the shift away from training towards inference and agents means their bet on Trainium is paying off. Plus, additional notes on ads, agents, and sports rights. Beijing Is Not Playing the Long Game — Every single week, someone in the Western media will tell you that China is playing “the long game.” Don’t believe them. Meta Ray-Ban Display OpenAI, Musk & Microsoft Fanuc and the Numerical Control Revolution Beijing Kills Meta’s Manus Deal; April Politburo Takeaways; Foreign Forces Afflicting the Youth; US Countermeasures Mounting NAW and CJ and CA CAWWWWW, DEFCON 2 for Jokic and the Nuggets, Notes on OKC, Toronto, and VJ Edgecombe Playoff Stock Watch: Scottie Barnes Awareness, Pistons Repricing, Jokic Market Corrections, and Lots More AWS History and Trainium’s AI Future, OpenAI Makes a Deal With Microsoft, Meta and the Future of Wearable Devices

0 views