Posts in Cloud (20 found)

rose ▪ bud ▪ thorn - may 2026

Reply via email Published 31 May, 2026 It was my wife's birthday, and our wedding anniversary! Baked some cakes and had a great time. Mine is the Donauwelle attached at the bottom of the post, my wife baked the fruit cake. My friend who visited Japan bought us great gifts from there; I got two gachapon (Cinnamoroll and My Melody), some matcha and My Sweet Piano chopsticks. I finally have it in writing and it's been communicated officially that I am my department's data protection coordinator now. I blogged more. I bought myself a big Build-A-Bear Usahana and a tiny one for my bag. Also, new matcha and I restocked my skincare and supplements :) I feel spoiled by myself. I'm having a great time at the gym, going 3 times a week, and incorporating the strength machines now. The added muscle/strength really helps with posture and counteracting the desk sitting. I'm making good progress. I reduced negativity from my online space. I went to a protest for ME/CFS! I have been better with keeping up with emails. Anita, if you are reading this, I cannot reply to you because it says sending key is not valid. We have a bread cutting machine now! Makes it easier to cut the bread my wife is baking for us :) I attended CPDP 2026 in Brussels. I reached Magenta status (35+ translated cases) as a Country Reporter for Noyb. Working on better eating behaviors and no guilt during rest. I am working on slowly booking cool classes and activities for the next few months. Been struggling with my face shape. I have chubbier cheeks anyway naturally, but whenever I need a round of Prednisone or I am stressed or there's hormonal stuff going on, they get bigger (cortisol, water retention). They are bigger lately... definitely a source of discomfort and shame when we live in a time of razor sharp jaws and almost-hollow cheeks. I will now have to do my injections weekly :( Dienogest doesn't work at all for me. Instead of preventing periods, it causes me more of them. Had to get off of it. My soy and rapeseed sensitivities have been extra annoying lately. Can't eat my beloved tofu, and they put rapeseed oil into almost every protein-rich vegan replacement product. I love my lentils, peas and beans, but occasionally I just wanna have some banger vegan köttbullar, schnitzel, or burger patty without a rash, man, or not make everything myself. Not to mention restaurants, or the fact that they drown everything in rapeseed oil based condiments... I haven't been studying nearly as much as I should. Having some issues with the modalities and feeling a bit stressed, like I need more time away from it. I've been very ambitious this month with my blog posting, and it has caused some writer-constipation at times. I had all these drafts ready with some links and loose thoughts already collected, and wanted to write them out fully; but because I set myself arbitrary deadlines or a loose " This needs to be finished and published today! " I felt intense pressure, which made me freeze up... it's really not that serious, but I made it so, for some reason. I also frequently felt stuck between 2-4 equally "important" tasks, posts, topics, whatever, and when I started one I looked at the other and switched, progressing at nothing. Terrible cycle. I moved some planned posts to June and eased up a bit. The menu of my favorite café has been severely reduced and worsened. Also cannot believe that I am paying 10 Euro for a wrap now. The Brussels trip was filled with some disappointments and stress.

0 views
<antirez> 5 days ago

Distributing LLM inference in DwarfStar

High end NVIDIA cards, and the server and power needed to run them, cost a lot of money, especially if you plan to reach enough VRAM to run massive models. The alternative, so far, has been Apple hardware, or the DGX Spark that, even if severely limited because of memory bandwidth, still allows to run LLMs prompt processing (prefill) fast enough. The Mac Studio provided up to 512GB unified memory, a solution with modest memory bandwidth (but much better than the Spark) and compute at a price that was, after all, given the current situation, relatively fair. For instance, with DwarfStar the Mac Studio M3 Ultra 512GB can run DeepSeek v4 PRO at 150 t/s prefill and ~10-13 t/s decoding, not great but at a level that is usable for certain use cases. Even 2-bit quantized, DeepSeek v4 PRO resists very well, like Flash at the same quantization (today I made PRO write a C compiler, I'll publish the video soon). I would not consider a trivial fact to run a frontier model at home, with a ~12k total spending. One could expect this to get better and better, but the situation at the horizon appears cloudy. There is almost zero hope that NVIDIA setups will get less expensive, and even a small company can’t afford to easily purchase and handle a small data center for local inference. At the same time the RAM shortage is making it not exactly likely that we will see a Mac Studio with an M5 Ultra, maybe 1.2T/s memory bandwidth and more compute (the M5 Max is already faster, compute wise, and has the Neural Accelerators inside each GPU core that help with certain models). So the current situation for local inference is that the best machine is probably a laptop. The M5 Max 128GB can run DeepSeek v4 Flash and Mimo V2.5, 2-bit quantized, at very decent prefill and decoding speeds. We are talking of ~500 t/s prefill and ~35-40t/s decoding speed, with a performance slope as the context size increases which is very acceptable. At the cost of 6-7k depending on the configuration, this is currently one of the best deals. If this is the situation, for local inference projects in general, and for DwarfStart in particular, looking at distributed inference starts to be interesting. What we can do if we have two, three, four MacBook M5 Max systems? Or two M3 Ultra with 512 GB of RAM? Traditionally there are two main systems to run distributed inference. One is to duplicate memory by loading 50% of the transformer layers in computer A, the remaining 50% on computer B, and running the inference in a sequential way. In this case there is to send just the activations around, that’s very simple conceptually, and with some micro-batching magic it is possible to not just duplicate the memory but even in theory to increase substantially the prompt processing speed (but not the decoding: for a single token generation you have to wait the first layers on machine A, the remaining layers on machine B, and so forth — but at least less heat will be produced so it is possible to use a sustained load), which is not bad at all. This means, for example, that the lucky ones that have two Mac Studio 512GB machines could run full size DeepSeek v4 PRO (even if even the 2-bit quants are running very, very well) and with micro-batching even enjoy a faster prefill. Another approach is, using Apple RDMA, to parallelize the execution across the two machines, a vertical split basically. For instance one could try to load the same 2 bit quants on machine A and B, so that both fit, and each side has *all* the routed experts. Then for each layer we could try to do the coordination needed in order to execute half the experts in machine A, half in machine B, and so forth (note that both machines have all the experts, so whatever the router says, we can send 50% of the computation to the other machine, and the activations are tiny). This is more viable for the PRO that has much larger routed experts, so the communication penalty is less sensible. But if this could be made to work well, is all to be seen. There is also tensor parallelism, you are thinking, right? But I bet this is not viable at all with the communication speed we have among two Apple computers, two DGX Spark and so forth (go read the speed of NVLink). The magic about the above two models is that you have to send very little data. Ok, so far I bet you are thinking, this is the same shit everybody knows about running LLMs in a parallel fashion, and indeed this is true. But this post was conceived to reach this exact point. What about if we could, instead, parallelize two Mac or DGX in a completely different way? Open weights models are now in a golden age, we have plenty and many are very powerful. In the 128GB 2-bit quants classes there are many interesting: Minimax M2.7, Mimo V2.5, DeepSeek v4 Flash, and a few more. At the same time it was recently noted that LLMs ensemble (https://arxiv.org/abs/2502.18036) is an understudied possibility that allows two models to run in a completely shared-nothing way in two different machines, to only combine the logits or select the best continuation at the end. There are different ways to do that, and it works even if the two models have different vocabularies: you can pick the continuation where the perplexity is lower (that is, pick the model which is more sure: it’s like a two experts MoE where the routing is implicit), and it is even possible to combine the logits (with some complexities given by the different vocabularies) and sample from there. More recent papers suggest that mixing the two techniques is the best approach. Anyway: these techniques seem to really work, models appear to do better than alone. It’s like if their knowledge is improved because each one brings his POV on what to say next. Maybe this is one of the most logical third approach to try, other than the first two. I really hope to find the time to play more with all that, in the next months. Comments

0 views

Lawmakers Demand Answers as CISA Tries to Contain Data Leak

Lawmakers in both houses of Congress are demanding answers from the U.S. Cybersecurity & Infrastructure Security Agency (CISA) after KrebsOnSecurity reported this week that a CISA contractor intentionally published AWS GovCloud keys and a vast trove of other agency secrets on a public GitHub account. The inquiry comes as CISA is still struggling to contain the breach and invalidate the leaked credentials. On May 18, KrebsOnSecurity reported that a CISA contractor with administrative access to the agency’s code development platform had created a public GitHub profile called “ Private-CISA ” that included plaintext credentials to dozens of internal CISA systems. Experts who reviewed the exposed secrets said the commit logs for the code repository showed the CISA contractor disabled GitHub’s built-in protection against publishing sensitive credentials in public repos. CISA acknowledged the leak but has not responded to questions about the duration of the data exposure. However, experts who reviewed the now-defunct Private-CISA archive said it was originally created in November 2025, and that it exhibits a pattern consistent with an individual operator using the repository as a working scratchpad or synchronization mechanism rather than a curated project repository. In a written statement, CISA said “there is no indication that any sensitive data was compromised as a result of the incident.” But in a May 19 a letter (PDF) to CISA’s Acting Director Nick Andersen , Sen. Maggie Hassan (D-NH) said the credential leak raises serious questions about how such a security lapse could occur at the very agency charged with helping to prevent cyber breaches. “This reporting raises serious concerns regarding CISA’s internal policies and procedures at a time of significant cybersecurity threats against U.S. critical infrastructure,” Sen. Hassan wrote. A May 19 letter from Sen. Margaret Hassan (D-NH) to the acting director of CISA demanded answers to a dozen questions about the breach. Sen. Hassan noted that the incident occurred against the backdrop of major disruptions internally at CISA, which lost more than a third of it workforce and almost all of its senior leaders after the Trump administration forced a series of early retirements, buyouts, and resignations across the agency’s various divisions. Rep. Bennie Thompson (D-MS), the ranking member on the House Homeland Security Committee, echoed the senator’s concerns. “We are concerned that this incident reflects a diminished security culture and/or an inability for CISA to adequately manage its contract support,” Thompson wrote in a May 19 letter to the acting CISA chief that was co-signed by Rep. Delia Ramirez (D-Ill), the ranking member of the panel’s Subcommittee on Cybersecurity and Infrastructure Protection. “It’s no secret that our adversaries — like China, Russia, and Iran — seek to gain access to and persistence on federal networks. The files contained in the ‘Private-CISA’ repository provided the information, access, and roadmap to do just that.” KrebsOnSecurity has learned that more a week after CISA was first notified of the data leak by the security firm GitGuardian , the agency is still working to invalidate and replace many of the exposed keys and secrets. On May 20, KrebsOnSecurity heard from Dylan Ayrey , the creator of TruffleHog , an open-source tool for discovering private keys and other secrets buried in code hosted at GitHub and other public platforms. Ayrey said CISA still hadn’t invalidated an RSA private key exposed in the Private-CISA repo that granted access to a GitHub app which is owned by the CISA enterprise account and installed on the CISA-IT GitHub organization with full access to all code repositories. “An attacker with this key can read source code from every repository in the CISA-IT organization, including private repos, register rogue self-hosted runners to hijack CI/CD pipelines and access repository secrets, and modify repository admin settings including branch protection rules, webhooks, and deploy keys,” Ayrey told KrebsOnSecurity. CI/CD stands for Continuous Integration and Continuous Delivery, and it refers to a set of practices used to automate the building, testing and deployment of software. KrebsOnSecurity notified CISA about Ayrey’s findings on May 20. Ayrey said CISA appears to have invalidated the exposed RSA private key sometime after that notification. But he noted that CISA still hasn’t rotated leaked credentials tied to other critical security technologies that are deployed across the agency’s technology portfolio (KrebsOnSecurity is not naming those technologies publicly for the time being). CISA responded with a brief written statement in response to questions about Ayrey’s findings, saying “CISA is actively responding and coordinating with the appropriate parties and vendors to ensure any identified leaked credentials are rotated and rendered invalid and will continue to take appropriate steps to protect the security of our systems.” Ayrey said his company Truffle Security monitors GitHub and a number of other code platforms for exposed keys, and attempts to alert affected accounts to the sensitive data exposure(s). They can do this easily on GitHub because the platform publishes a live feed which includes a record of all commits and changes to public code repositories. But he said cybercriminal actors also monitor these public feeds, and are often quick to pounce on API or SSH keys that get inadvertently published in code commits. The Private-CISA GitHub repo exposed dozens of plaintext credentials to important CISA GovCloud resources. In practical terms, it is likely that cybercrime groups or foreign adversaries also noticed the publication of these CISA secrets, the most egregious of which appears to have happened in late April 2026, Ayrey said. “We monitor that firehose of data for keys, and we have tools to try to figure out whose they are,” he said. “We have evidence attackers monitor that firehose as well. Anyone monitoring GitHub events could be sitting on this information.” James Wilson , the enterprise technology editor for the Risky Business security podcast, said organizations using GitHub to manage code projects can set top-down policies that prevent employees from disabling GitHub’s protections against publishing secret keys and credentials. But Wilson’s co-host Adam Boileau said it’s not clear that any technology could stop employees from opening their own personal GitHub account and using it to store sensitive and proprietary information. “Ultimately, this is a thing you can’t solve with a technical control,” Boileau said on this week’s podcast . “This is a human problem where you’ve hired a contractor to do this work and they have decided of their own volition to use GitHub to synchronize content from a work machine to a home machine. I don’t know what technical controls you could put in place given that this is being done presumably outside of anything CISA managed or even had visibility on.” Update, 3:05 p.m. ET: Added statement from CISA. Corrected a date in the story (Truffle Security said it found the repo gained some of its most sensitive secrets in late April 2026, not 2025).

0 views

CISA Admin Leaked AWS GovCloud Keys on Github

Until this past weekend, a contractor for the Cybersecurity & Infrastructure Security Agency (CISA) maintained a public GitHub repository that exposed credentials to several highly privileged AWS GovCloud accounts and a large number of internal CISA systems. Security experts said the public archive included files detailing how CISA builds, tests and deploys software internally, and that it represents one of the most egregious government data leaks in recent history. On May 15, KrebsOnSecurity heard from Guillaume Valadon , a researcher with the security firm GitGuardian . Valadon’s company constantly scans public code repositories at GitHub and elsewhere for exposed secrets, automatically alerting the offending accounts of any apparent sensitive data exposures. Valadon said he reached out because the owner in this case wasn’t responding and the information exposed was highly sensitive. A redacted screenshot of the now-defunct “Private CISA” repository maintained by a CISA contractor. The GitHub repository that Valadon flagged was named “ Private-CISA ,” and it harbored a vast number of internal CISA/DHS credentials and files, including cloud keys, tokens, plaintext passwords, logs and other sensitive CISA assets. Valadon said the exposed CISA credentials represent a textbook example of poor security hygiene, noting that the commit logs in the offending GitHub account show that the CISA administrator disabled the default setting in GitHub that blocks users from publishing SSH keys or other secrets in public code repositories. “Passwords stored in plain text in a csv, backups in git, explicit commands to disable GitHub secrets detection feature,” Valadon wrote in an email. “I honestly believed that it was all fake before analyzing the content deeper. This is indeed the worst leak that I’ve witnessed in my career. It is obviously an individual’s mistake, but I believe that it might reveal internal practices.” One of the exposed files, titled “importantAWStokens,” included the administrative credentials to three Amazon AWS GovCloud servers. Another file exposed in their public GitHub repository — “AWS-Workspace-Firefox-Passwords.csv” — listed plaintext usernames and passwords for dozens of internal CISA systems. According to Caturegli, those systems included one called “LZ-DSO,” which appears short for “Landing Zone DevSecOps,” the agency’s secure code development environment. Philippe Caturegli , founder of the security consultancy Seralys , said he tested the AWS keys only to see whether they were still valid and to determine which internal systems the exposed accounts could access. Caturegli said the GitHub account that exposed the CISA secrets exhibits a pattern consistent with an individual operator using the repository as a working scratchpad or synchronization mechanism rather than a curated project repository. “The use of both a CISA-associated email address and a personal email address suggests the repository may have been used across differently configured environments,” Caturegli observed. “The available Git metadata alone does not prove which endpoint or device was used.” The Private CISA GitHub repo exposed dozens of plaintext credentials for important CISA GovCloud resources. Caturegli said he validated that the exposed credentials could authenticate to three AWS GovCloud accounts at a high privilege level. He said the archive also includes plain text credentials to CISA’s internal “artifactory” — essentially a repository of all the code packages they are using to build software — and that this would represent a juicy target for malicious attackers looking for ways to maintain a persistent foothold in CISA systems. “That would be a prime place to move laterally,” he said. “Backdoor in some software packages, and every time they build something new they deploy your backdoor left and right.” In response to questions, a spokesperson for CISA said the agency is aware of the reported exposure and is continuing to investigate the situation. “Currently, there is no indication that any sensitive data was compromised as a result of this incident,” the CISA spokesperson wrote. “While we hold our team members to the highest standards of integrity and operational awareness, we are working to ensure additional safeguards are implemented to prevent future occurrences.” A review of the GitHub account and its exposed passwords show the “Private CISA” repository was maintained by an employee of Nightwing , a government contractor based in Dulles, Va. Nightwing declined to comment, directing inquiries to CISA. CISA has not responded to questions about the potential duration of the data exposure, but Caturegli said the Private CISA repository was created on November 13, 2025. The contractor’s GitHub account was created back in September 2018. The GitHub account that included the Private CISA repo was taken offline shortly after both KrebsOnSecurity and Seralys notified CISA about the exposure. But Caturegli said the exposed AWS keys inexplicably continued to remain valid for another 48 hours. CISA is currently operating with only a fraction of its normal budget and staffing levels. The agency has lost nearly a third of its workforce since the beginning of the second Trump administration, which forced a series of early retirements, buyouts, and resignations across the agency’s various divisions. The now-defunct Private CISA repo showed the contractor also used easily-guessed passwords for a number of internal resources; for example, many of the credentials used a password consisting of each platform’s name followed by the current year. Caturegli said such practices would constitute a serious security threat for any organization even if those credentials were never exposed externally, noting that threat actors often use key credentials exposed on the internal network to expand their reach after establishing initial access to a targeted system. “What I suspect happened is [the CISA contractor] was using this GitHub to synchronize files between a work laptop and a home computer, because he has regularly committed to this repo since November 2025,” Caturegli said. “This would be an embarrassing leak for any company, but it’s even more so in this case because it’s CISA.”

0 views
Stratechery 1 weeks ago

Data Center Discontent, Understanding the Opposition, Fixing the Problem

There are understandable reasons for people to oppose data centers; the only solution that will work is simply paying them off.

0 views
Martin Alderson 2 weeks ago

Managed agents are the new Lambda

Managed agents (cloud-hosted agents) are the next big push from the frontier labs. They're genuinely incredible. They're also going to be the AWS Lambda of this cycle - powerful, sticky, and an absolute nightmare to migrate off once you're in deep. While the exact definition is up for debate, in my mind a managed agent is an agent harness (like Claude Code) running in the cloud , not on your local machine. This has a few major advantages. The most obvious one is that you don't need a machine running locally - it can do its work 24/7, in the background. The other that running in the cloud means it can be notified of changes and act on them. Imagine, for example, agents responding to incoming emails or webhooks and doing some activity based on them (this is very possible locally - but easier with the agent running on the server). The other advantage is security - probably the key part of the "managed" agent. Much like PaaS (platform-as-a-service) products like Heroku, AWS ECS/App Runner/Lambda and Azure App Service/Functions, the provider manages not just the underlying physical infrastructure for you, but also manages patching the operating system and related server software on your behalf. Sandboxing is another related benefit. Managed agents only get access to what you give them - no risk of an agent wandering into files it shouldn't. If you're already running Claude Code/Codex/OpenCode in Docker on a server, you've basically built one yourself. The frontier labs are just productising the pattern. Anthropic has really been pushing their managed agents product hard lately. This makes a lot of sense - cloud hosted agents are genuinely incredible in what they can do - but I'd urge real caution on locking yourself into a vendor - at least at this point. Fundamentally, agents are not particularly difficult to swap out. While there are important differences and nuances in how they work and operate, switching from Claude Code to Codex (or OpenCode, or Pi, or one of the many other agent harnesses) is a fairly simple process. Fundamentally the pattern is the same - run a harness with a prompt, context and tools and capture output and logs. All agent harnesses have the same primitives. And at least having the ability to swap the agent harness and model out is really important. Clearly pricing is one important dimension, but equally so is being able to use new models from different labs. The competition is absolutely cutthroat and shows absolutely no sign of slowing down. Once you start using a managed agent product from a frontier lab this gets far more difficult. A lot of your data and workflows are embedded in their cloud. While Anthropic have gone to lengths to say it is your data and it can be exported, in my many years of experience of vendor lock in this definition drifts and gets harder and harder to migrate to another provider. As many people found out with AWS, moving Docker container workloads is fairly easy if you want to move hyperscaler clouds. Moving AWS Lambda [1] functions is far, far more difficult - I've seen organisations spend months upon months unpicking Lambda code and assumptions when they realise it isn't a good fit after all the hype dies down. Yesterday Anthropic announced huge changes to their pricing model which underlines this point. If you run Claude Code non-interactively (which includes nearly all cloud-hosted agent usage - and many others [2] ), these now are not eligible for your subscription token allowances and will instead use some new credit. After this allowance is exhausted then it is very expensive API tokens ahead. It's fair to say if you were using a lot of "non interactive" Claude Code you are looking at a 5-20x price increase with these changes. It's clearly Anthropic's prerogative to do this - and (I think) points to their compute shortages more than anything, but it has given OpenAI a real opening for users to switch to Codex - OpenAI (currently, at least) have been very explicit you can use your included allowances on your plan with any tool and however you like. Expect to see a lot more talk around Codex (which has been already gaining significant traction over the past few months) and other providers in the future - developers are often remarkably price sensitive around things like this, especially for personal 'side projects' - which often then end up informing enormous purchasing decisions in the companies they work in months and years down the line. [3] Now it's easy to say don't use a frontier lab's managed agent product, but what are the solutions? I think there's two main ways you can solve this in your organisation. Firstly, roll your own managed infra. This is a good option for developers and tech adjacent teams - they will have the expertise to do this. Essentially, it's just running a Docker container which they do all day every day. Using something like OpenCode as a harness allows you to use any model provider and switch between them in minutes. Secondly, there's a flood of startups and other companies that allow you to run managed agents with any model or provider you want. I haven't (yet) evaluated them in detail as the market landscape is switching so fast to give any real thoughts on quality, but providers include Cloudflare Agents, Vercel and the hyperscaler options (AWS AgentCore, Azure AI Foundry and GCP Vertex AI Agent Engine). My personal view is until this shakes out a bit more, stick to self hosting them. It's not difficult, allows you to secure them inside your current infrastructure and builds organisational competence around agent primitives. Outsourcing this knowledge at this point is a path to serious organisational knowledge gaps. However, expect this to change as the platforms introduce more capabilities that become more and more difficult to replicate. One to keep an eye on. The one ointment in this plan is that I have a strong gut feeling the frontier labs are going to start introducing new models and capabilities that are only available on their managed agent platforms. This is where the pendulum (maybe) starts swinging to having to use managed agents - but again, maybe not. Lambda is a way of running applications "serverless" which in theory allows much easier deployment and scaling - more of the primitives of application hosting is abstracted. However, it means you start really having to lean into AWS specific code, techniques and patterns, that can be really difficult to revert ↩︎ It also includes alternative frontends to Claude Code, like the excellent Conductor Mac app, despite this really being the definition of interactive usage. ↩︎ This is why I really hope that Anthropic rethinks this at some point. ↩︎ Lambda is a way of running applications "serverless" which in theory allows much easier deployment and scaling - more of the primitives of application hosting is abstracted. However, it means you start really having to lean into AWS specific code, techniques and patterns, that can be really difficult to revert ↩︎ It also includes alternative frontends to Claude Code, like the excellent Conductor Mac app, despite this really being the definition of interactive usage. ↩︎ This is why I really hope that Anthropic rethinks this at some point. ↩︎

0 views
Stratechery 2 weeks ago

The Deployment Company, Back to the 70s, Apple and Intel

Listen to this post: Good morning, President Trump is on the way to China, and Sharp China is your go-to podcast for understanding what happens next. Add it to your podcast player now in anticipation of the next few episodes breaking down the trip. On to the Update: From Reuters : OpenAI said on Monday it is setting up a new company with more than $4 billion in initial investment to help organizations build and deploy artificial intelligence systems, and will acquire an AI consulting firm, Tomoro, to quickly scale up the unit. After its early models saw strong resonance with consumers, OpenAI has been working aggressively to sign corporate contracts and establish a large presence in the business world where its AI will see large-scale deployment. The venture, which will be majority owned and controlled by OpenAI, also comes as rival Anthropic enjoys strong success in its enterprise AI push with its Claude family of models seeing rapid adoption among businesses. The new firm, called OpenAI Deployment Company, will help the ChatGPT maker embed engineers specializing in frontier AI deployment into organizations that will then work closely with various teams to identify where AI can make the biggest impact, OpenAI said. Its acquisition of Tomoro, a consulting firm that helps enterprises deploy AI, will bring around 150 experienced AI engineers and “deployment specialists” to the new unit from day one. Tomoro was formed in 2023 in alliance with OpenAI, and counts companies such as Mattel, Red Bull, Tesco and Virgin Atlantic as its clients, according to its website. That was on Monday; on Tuesday, from The Information : Google plans to hire hundreds of engineers to help customers start using its business-focused AI products, according to a person familiar with the situation. Google’s new “forward deployed engineers” will form a new team within Google Cloud, the unit’s chief, Thomas Kurian, said on LinkedIn on Tuesday, without disclosing the size of the effort. Matt Renner, Google Cloud’s chief revenue officer, said in a separate post that the move would help Google “show up for our customers with more technical resources (vs just an ocean of salespeople).” The announcement is one of several in the industry in recent weeks as tech companies are deploying armies of humans—often described as “forward deployed engineers”—and partnerships with consulting companies to get customers using AI-driven technology intended to automate work. On Monday, OpenAI launched the “OpenAI Deployment Company” in partnership with consulting and investment firms. Last week, Anthropic announced the creation of a joint venture with private equity firms to sell its AI to the PE firms’ customers. It is, needless to say, tempting to drop some snark about AGI apparently not being good enough to deploy AI, but instead I’m going to go with “as predicted”. In 2024’s Enterprise Philosophy and the First Wave of AI , I made the case that the proper analogy for AI in the enterprise was not SaaS, but rather the first wave of computing in the 1970s. Agents aren’t copilots; they are replacements. They do work in place of humans — think call centers and the like, to start — and they have all of the advantages of software: always available, and scalable up-and-down with demand…Benioff isn’t talking about making employees more productive, but rather companies; the verb that applies to employees is “augmented”, which sounds much nicer than “replaced”; the ultimate goal is stated as well: business results. That right there is tech’s third philosophy: improving the bottom line for large enterprises. Notice how well this framing applies to the mainframe wave of computing: accounting and ERP software made companies more productive and drove positive business results; the employees that were “augmented” were managers who got far more accurate reports much more quickly, while the employees who used to do that work were replaced. Critically, the decision about whether or not to make this change did not depend on rank-and-file employees changing how they worked, but for executives to decide to take the plunge. Specifically, I don’t think that the Deployment Company is going in to help employees use chatbots; that’s even more clearly the case with the PE firms that both OpenAI and Anthropic are doing deals with. I expect there to be an ever-increasing number of deals where PE buys software firms with reliable cash flows and conducts significant layoffs, forcing AI to pick up the slack, solving stock-based compensation issues in the process. I don’t know if the mandate for the Deployment Company is going to be quite so harsh, but I assume this is a company that is hired by the executive suite to fundamentally rethink business processes in a way that hasn’t been done since the mainframe: Most historically-driven AI analogies usually come from the Internet, and understandably so: that was both an epochal change and also much fresher in our collective memories. My core contention here, however, is that AI truly is a new way of computing, and that means the better analogies are to computing itself. Transformers are the transistor, and mainframes are today’s models. The GUI is, arguably, still TBD. To the extent that is right, then, the biggest opportunity is in top-down enterprise implementations. The enterprise philosophy is older than the two consumer philosophies I wrote about previously: its motivation is not the user, but the buyer, who wants to increase revenue and cut costs, and will be brutally rational about how to achieve that (including running expected value calculations on agents making mistakes). That will be the only way to justify the compute necessary to scale out agentic capabilities, and to do the years of work necessary to get data in a state where humans can be replaced. The bottom line benefits — the essence of enterprise philosophy — will compel just that. What I wonder is how much of the work ends up reworking data; that, as I noted in that article, is why I was bullish on Palantir: That leaves the data piece, and while Benioff bragged about all of the data that Salesforce had, it doesn’t have everything, and what it does have is scattered across the phalanx of applications and storage layers that make up the Salesforce Platform. Indeed, Microsoft faces the same problem: while their Copilot vision includes APIs for 3rd-party “agents” — in this case, data from other companies — the reality is that an effective Agent — i.e. a worker replacement — needs access to everything in a way that it can reason over. The ability of large language models to handle unstructured data is revolutionary, but the fact remains that better data still results in better output; explicit step-by-step reasoning data, for example, is a big part of how o1 works. To that end, the company I am most intrigued by, for what I think will be the first wave of AI, is Palantir… That integration looks like this illustration from the company’s webpage for Foundry, what they call “The Ontology-Powered Operating System for the Modern Enterprise”: What is notable about this illustration is just how deeply Palantir needs to get into an enterprise’s operations to achieve its goals. This isn’t a consumery-SaaS application that your team leader puts on their credit card; it is SOFTWARE of the sort that Salesforce sought to move beyond. Google’s Kurian, by the way, did dismiss any sort of Palantir comparison in a Stratechery Interview last month: This all makes perfect sense, particularly this bit about the Knowledge Catalog definitely fits how I’ve been thinking. I wrote about this a few years ago about this importance of this whole layer and understanding it, it’s a bit of a big lift to get this in place. You have some sort of analog, say, with like a Palantir that’s putting in like their ontology thing. They have FDEs out on the site, multi-month projects doing this. You have OpenAI talking about Frontier, their agent layer, and they’re partnering with all the tech consultancies to build this out. Is this going to entail a lot of boots on the ground to get this graph working and functional in a way that your agents can operate effectively across it? TK: We’re not competing with Palantir, we’re not building a semantic dictionary or an ontology. What we’re doing is, today I’ll give you the closest analogy. TK: Today when you use a model, let’s say you use Gemini, and you ask a question, Gemini goes through reasoning, and then it shows you a citation. A citation is, “How did I answer the question and what’s the source I derived from?” Now imagine that citation was a query that needed to go to a folder in, for example, a storage system because there’s some documents there and a database because, for example, in a part number, just think about there’s a part number document that lists all the part numbers and sits in a drive and then that part number you need to fetch out to say it’s the modem that the guy is coming to repair, and that’s mapped to a table in a database. So what the graph does, we use Gemini, so we don’t need humans, we use Gemini to say, “Hey, go and read all these documents in these drives and extract the information from it and then match that to the database table that has the reference to the part number”, and so then when Gemini turns around and says, “I got this query about how much inventory of modems they are”, the first thing it does is it says, “Okay, go to the Knowledge Catalog and it says modem is part number one, two, three, four, five”, and then it says, “By the way the table in the database that has the inventory information about this part number is this table, here’s a SQL”, it then makes the quality of what we generate higher and then when it answers the question it shows back — back to your, “Trust my data”, it shows a grounding citation saying, “That’s where we got it from.” Well, so much for not needing humans! I joke, mostly — Kurian was referring to not needing a Palantir-like ontology, not necessarily dismissing the need for FDEs — but it sure is interesting how AI is creating the need for new kinds of jobs. It’s almost as if the world is more dynamic, and pure intelligence, unadulterated by what already exists and the burden of reflexivity, is more static, than the most pessimistic prognosticators may have anticipated. More prosaically, OpenAI and Anthropic need the revenue, enterprises need the imagination, and Google needs to stay in the game. From the Wall Street Journal : Apple and Intel have reached a preliminary agreement for Intel to manufacture some of the chips that power Apple devices, according to people familiar with the matter. Intensive talks between the two companies have been ongoing for more than a year, and they hammered out a formal deal in recent months, these people said. Bloomberg News previously reported the talks. It’s still unclear which Apple products Intel would make chips for, these people said. Apple ships more than 200 million iPhones a year as well as millions of iPads and Mac computers. Ming-Chi Kuo reported on X late last year that Intel would make Apple’s most basic M processor on its 18A process; he didn’t specify which generation. Regardless, while the Wall Street Journal cites Trump administration pressure, and an earlier Bloomberg article Apple’s concentration risk on TSMC and Taiwan, the most obvious reason for a deal — assuming it exists — is economic. Specifically, Apple has for two quarters running said it can’t satisfy demand because it can’t get enough capacity at TSMC. CEO Tim Cook referenced this point multiple times on the last earnings call , but I think this was the most important articulation: The constraint in the March quarter and the June quarter, the primary constraint is the availability of the advanced nodes our SoCs are produced on, not memory. And so I don’t want to predict for supply and demand to match because if I look at it realistically, I think on the Mac mini and the Mac Studio, I believe it will take several months to reach supply-demand balance. And so we’re not at the point where we’re saying this is going to end anytime soon. And it’s not because of a problem per se other than we just undercalled the demand. And there are lead times to this, as you well understand, and it takes a while to correct that. And the primary constraint from a product point of view, or the majority of it for this quarter, for the June quarter will be on the Mac. And it’s Mac mini, Mac Studio and the MacBook Neo. It’s all of those. Cook talked about lead times last quarter as well, and the important thing to note is that while it does take five months or so to make new chips, assuming Apple realized it needed more iPhone 17 Pro chips right away, those new A19 Pro lines only started producing chips partway through last quarter (which is why iPhone 17 Pro sales weren’t as high as they could be). Critically, however, what seems likely is that Apple took capacity away from the Mac to make more iPhone chips, and now doesn’t have enough chips for the Mini and Studio either. The long-and-short of it is this: Apple doesn’t have flexible access to TSMC capacity anymore, because so much of that capacity is going to AI in particular, and it’s costing Apple meaningful money across multiple product lines. This was always the thing that would bring companies to Intel; I wrote in TSMC Risk : Becoming a meaningful customer of Samsung or Intel is very risky: it takes years to get a chip working on a new process, which hardly seems worth it if that process might not be as good, and if the company offering the process definitely isn’t as customer service-centric as TSMC. I understand why everyone sticks with TSMC. The reality that hyperscalers and fabless chip companies need to wake up to, however, is that avoiding the risk of working with someone other than TSMC incurs new risks that are both harder to see and also much more substantial. Except again, we can see the harms already: foregone revenue today as demand outstrips supply. Today’s shortages, however, may prove to be peanuts: if AI has the potential these companies claim it does, future foregone revenue at the end of the decade is going to cost exponentially more — surely a lot more than whatever expense is necessary to make Samsung and/or Intel into viable competitors for TSMC. This, incidentally, is how the geographic risk issue will be fixed, if it ever is. It’s hard to get companies to pay for insurance for geopolitical risks that may never materialize. What is much more likely is that TSMC’s customers realize that their biggest risk isn’t that TSMC gets blown up by China, but that TSMC’s monopoly and reasonable reluctance to risk a rate of investment that matches the rest of the industry means that the rest of the industry fails to fully capture the value of AI. We’re already here (reportedly). TSMC’s failure to invest aggressively enough over the last several years will, in the end, give Intel the single most important thing it needs to become a viable competitor: the customer who did more than any other to make TSMC into the leader in the first place. This Update will be available as a podcast later today. To receive it in your podcast player, visit Stratechery . The Stratechery Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a subscriber, and have a great day!

0 views
Zak Knill 2 weeks ago

LLMs are breaking 20 year old system design

The ‘cloud-native’ architecture of the last decade is built on a 20-year-old assumption: that state lives in the database, and compute is stateless. If you want to scale, you scale the database vertically (get a larger machine) [1] [1] or design the database schema around partition the data and you scale your application servers horizontally (add more boxes). Any request can hit any server, the loadbalancer doesn’t care, and the database is the single source of truth.

0 views
Sean Goedecke 2 weeks ago

AI datacenters in space do not have a cooling problem

This year Elon Musk has started banging the drum about building AI datacenters in space. As the only person who owns a successful space company and a (moderately) successful AI company, this is a sensible way to boost his profile and net worth. Is it a sensible way to build datacenters? The first comment underneath most discussions of this always goes along these lines: “you obviously can’t build AI datacenters in space, because heat dissipation is really hard in space, and AI datacenters generate a lot of heat”. In general I am distrustful of snappy answers like these. It reminds me of the “AI datacenters obviously don’t use a lot of water, because cooling fluid circulates in a closed-loop system” argument: if it were true, there wouldn’t be a debate at all, just one side who understand the obvious point and another side who are stupid. Some arguments are like this! However, more often there’s a complicating factor that makes the snappy answer incorrect. In the water-use case, it’s that the closed-loop system has to itself be cooled by an open-loop evaporative chiller. What about the space datacenter case? First, let’s give the argument a fair shake. Although space is itself very cold, cooling is tricky because everything you’d want to cool is surrounded by vacuum. Heat transfer works in three ways: Vacuum is an excellent insulator because it defeats the first two methods of heat transfer. If there are no (or very few) atoms surrounding an object, those atoms can’t move around or collide. That’s why vacuum is used as an insulator in thermoses, travel mugs, and so on. So how can space datacenters get rid of their heat? By doubling down on the third method of heat transfer. Although it’s much harder to do heat transfer via moving atoms around in space, it’s actually easier to do heat transfer via emitting radiation. Any good emitter is also a good absorber. A perfectly black object is the most efficient emitter, but it’s also the most efficient way to absorb photons from external sources, which is why black objects get hotter in the sun 1 . In space, the sun’s light is much easier to avoid, because there aren’t objects everywhere for it to bounce off. A shaded radiator can dump quite a lot of heat. It would still require putting more radiators in space than we’ve ever done before. There are plenty of writeups out there if you want to read through the numbers. This is a recent one that estimates ~2500 square metres of radiation area would be needed to serve 1MW of datacenter energy (much less than what it’d need in solar panels) 2 . A serious AI datacenter is around 100MW 3 , so we’d need 250,000 square metres of radiation area. The largest current radiator in space is probably the ISS, at around a thousand square metres. Is scaling that up by 250x a lot? Yes, but it’s not necessarily ridiculous . We currently have zero industrial operations happening in space, so there’s been no need to push the boundaries here. In the grand scheme of things, 250,000 square metres is not that big. By my very rough estimates, that’s between 100-500 Starship launches: a couple of years at SpaceX’s current launch cadence, or a few months at their (very optimistic) estimate of future launch cadence. Of course, you don’t just need radiators to put a datacenter in space. You need a similar quantity of solar panels, the GPUs themselves, and all kinds of other supporting equipment. If a GPU dies in an Earth datacenter, you can go in and swap it out; if it dies in space, you just have to leave it dead and keep going with less capacity. It’s still wildly impractical to build AI datacenters in space. But it’s not impossible , and it’s certainly not impossible because of the cooling, which is a relatively minor component of the total mass that would have to be launched into space. In theory, black clothing would keep you slightly colder at night. Nobody ever talks about how impossible it would be to power space datacenters, despite the fact that you’d need to launch over triple the solar panel area into space than radiation area. I guess because people know solar panels exist and that the sun shines in space. The first gigawatt AI data centers are coming online this year, but 100MW is a fair estimate for a current pretty-large-but-not-enormous AI datacenter. Hot (i.e. fast-moving) atoms bump into other atoms, making them move and thus heating them up Hot atoms physically move from one location to another (e.g. in a fluid or gas), staying hot and thus making their new location hotter Hot objects emit photons (electromagnetic radiation), cooling themselves down and heating up other objects those photons collide with In theory, black clothing would keep you slightly colder at night. ↩ Nobody ever talks about how impossible it would be to power space datacenters, despite the fact that you’d need to launch over triple the solar panel area into space than radiation area. I guess because people know solar panels exist and that the sun shines in space. ↩ The first gigawatt AI data centers are coming online this year, but 100MW is a fair estimate for a current pretty-large-but-not-enormous AI datacenter. ↩

0 views
Simon Willison 3 weeks ago

Notes on the xAI/Anthropic data center deal

There weren't a lot of big new announcements from Anthropic at yesterday's Code w/ Claude event, but the biggest by far was the deal they've struck with SpaceX/xAI to use "all of the capacity of their Colossus data center". As I mentioned in my live blog of the keynote , that's the one with the particularly bad environmental record . The gas turbines installed to power the facility initially ran without Clean Air Act permits or pollution control devices, which they got away with by classifying them as "temporary". Credible reports link it to increases in hospital admissions relating to low air quality. Andy Masley, one of the most prolific voices pushing back against misleading rhetoric about data centers (see The AI water issue is fake and Data center land issues are fake ), had this to say about Colossus: I would simply not run my computing out of this specific data center I get that Anthropic are severely compute-constrained, but in a world where the very existence of "AI data centers" is a red-hot political issue (see recent news out of Utah for a fresh example), signing up with this particular data center is a really bad look. There was a lot of initial chatter about how this meant xAI were clearly giving up on their own Grok models, since all of their capacity would be sold to Anthropic instead. That was a misconception - Anthropic are getting Colossus 1, but xAI are keeping their larger Colossus 2 data center for their own work. As an interesting side note, the night before the Anthropic announcement, xAI sent out a deprecation notice for Grok 4.1 Fast and several other models providing just two weeks' notice before shutdown, reported here by @xlr8harder from SpeechMap: This is terrible @xai. I just spent time and money to migrate to grok 4.1 fast, and you're disabling it with less than two weeks notice, after releasing it in November, with no migration path to a fast/cheap alternative. I will never depend on one of your products again. Here's SpeechMap's detailed explanation of how they selected Grok 4.1 Fast for their project in March. Were xAI serving those models out of Colossus 1? xAI owner Elon Musk (who previously delighted in calling Anthropic "Misanthropic" ) tweeted the following: By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed. [...] After that, I was ok leasing Colossus 1 to Anthropic, as SpaceXAI had already moved training to Colossus 2. And then shortly afterwards : Just as SpaceX launches hundreds of satellites for competitors with fair terms and pricing, we will provide compute to AI companies that are taking the right steps to ensure it is good for humanity. We reserve the right to reclaim the compute if their AI engages in actions that harm humanity. Presumably the criteria for "harm humanity" are decided by Elon himself. Sounds like a new form of supply chain risk for Anthropic to me! You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
Kev Quirk 3 weeks ago

My Inital Thoughts On Thunderbird Pro

Yesterday I received an email from the Thunderbird team inviting me to join a preview of their new hosted email service, Thunderbird Pro . I love email, so was very keep to sign up and test it out. Before we get into this, I want to say that Thunderbird Pro is still under active development, please bear that in mind. Also, these are just my opinions, please don't get butthurt. I hate it when people explain what things are in a blog post, but I think it's warranted here since Thunderbird Pro (TB Pro) is a new product, so people may not know what it is. With that in mind, TB Pro is a hosted email service by the Thunderbird team that includes email, contacts, calendar, secure file sending, and an appointment system that lets people book time with you. It costs $6/month (paid yearly) and for that you get: So here's my thoughts - of which I have many, so I'll just list them out, then pick a few to talk about in more detail. Otherwise this will be a very long post. I think the lack of webmail is a huge miss. Every email hosting service I can think of comes with webmail - many people access their mail on desktop via the browser, so I'd have liked to see that up front. Having said that, maybe that's not the market Thunderbird are going for with this service. If so, maybe a lack of webmail is fine. I'd prefer to have the flexibility to check my mail from anywhere though. I don't understand the 15 alias and 3 domain limitation. They cost nothing - they're just a line in a config file. Plus, adding a catch-all allows you to both send and receive email to/from , which renders the alias limit even more pointless. I'd like to see these limitations removed. The Appointment feature lets people book time with you directly. Think Calendly , baked into your email service. If you're a freelancer or consultant who lives and dies by booking links, that's probably a nice convenience. For everyone else, it's likely redundant. Those who need it probably have a solution already, and those who don't will just ignore it. I'm in the latter camp, so there's no value for me. Thundermail Appointments Unfortunately I couldn't test the Send service. On the dashboard it says: To use Send, you must enable it in Thunderbird Desktop. Download the app and sign in to Thunderbird Pro from the Thunderbird menu. For the life of me I couldn't find an option for Send within Thunderbird, so I couldn't test. Shame. I'm using the Flatpak, which is currently on v140.10.1, and I see v150 is out, so that may be why. But the Flatpak is maintained by the Thunderbird team, so I would have expected this to all be sorted before the allowed paying customers to get their hands on Pro . There is a support card on the Send dashboard, with an option to get help. Clicking that opens the Thunderbird docs in a new tab, showing nothing but a notice box containing . So something is broken. Speaking of broken things, there were a number of other ugly UI notices and warning elements that displayed while getting set up. It just lacks polish, which I would have expected to be ironed out by the time consumers are getting their hands on it. If I'm honest, my first impressions are underwhelming. I get that this is an early preview but for the price, services like Zoho and Fastmail are better services, and better value for money. I don't regret signing up though - it's important to support open source services, and as Thunderbird Pro matures, it will hopefully evolve into a service that can contend with the OG's in this space. If it does, I'll consider moving over fully. But for now, I'm considering my subscription a donation to Thunderbird, as I'm a very happy user of their email app. Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment . 30 GB of mail storage 60 GB of Send storage 15 Email aliases 3 custom domains No webmail, it's being worked on though . Was easy to setup on the Thunderbird app - just had to login (my Zoho mail account auto-detects server settings, so not much harder though). Doesn't configure aliases automatically in Thunderbird. Prompts to add calendar and contacts via a single click when setting up in Thunderbird. That was a nice touch. No way to export all DNS records as a zone file when adding a custom domain. I think the 15 alias/3 domain limit is arbitrary and pointless. If you setup a catch-all for a custom domain, you can send from which negates the 15 alias limitation. Appointments app is weird. Couldn't work out how to setup Send in Thunderbird. Admin UI is clunky and has a number of UI issues. No option to add additional mailboxes (understandable as this is a preview). 30GB is way too much storage for me. I'd like to see smaller, cheaper tiers.

0 views
Martin Alderson 3 weeks ago

29th August 2026: a scenario

On 29 April 2026, a Korean security firm called Theori published 732 bytes of Python that breaks Linux container isolation. CopyFail (CVE-2026-31431) is a page-cache corruption bug in the kernel's crypto code. It's been sitting in production since 2017. A compromised pod on a shared Kubernetes node can corrupt binaries visible to every other container on that host, and to the host kernel itself. EKS, GKE, AKS, every shared-tenant node, every CI runner, every multi-tenant SaaS that took the cheap path on isolation - all exposed until patched. It took an AI tool four months to find it. Nine years of human eyes did not. Container escape is bad. Despite arguably a poorly coordinated disclosure/mitigation response [1] , it looks like a near miss rather than a catastrophe. But, this class of bug - old, subtle, in a corner of the kernel that everyone assumed someone else had read - is exactly the class of bug that lives in every hypervisor stack underneath every cloud. Those bugs are still there. They just haven't been found yet. Here's a (fictional) story about what happens four months from now, on 29th August 2026. As Europe basks in an extreme heatwave, many engineers are paged as with EC2 instances hard crashing. Hacker News reacts to the news as per normal - another us-east-1 outage, AWS status showing green, eyes roll. Some commenters post though that many other AZs are showing issues, though not all servers are affected. Over the next hour though, more and more machines go down. One Reddit user posts that they are having issues provisioning even fresh machines - as soon as they launch, they get moved into "unhealthy" and go down. A few minutes later, the entire AWS dashboard and API set goes down. Cloudflare Radar shows AWS network traffic dropping to a small percentage of what is normal. As many AWS hosted services start going down - Atlassian, Stripe, Slack, PagerDuty, some comments on Twitter report issues with Linux-based Azure instances. Indeed, Cloudflare Radar shows significant drops in Azure traffic. News channels across Europe start leading with vague breaking news headlines on outages across Amazon. They make sure to point out that this isn't an unusual occurrence, with normal service expecting to be resumed like it always has been, and mistakenly insist only US services are affected. As the East coast of the US starts their weekend, a very unusual step is taken. TV channels are briefed that POTUS will be doing an address to the nation at 8am EDT. Few connect the dots - with the emphasis being placed on a potential new strike in the Middle East, or an announcement on the Russia-Ukraine war. POTUS announces that there is a significant cybersecurity incident under way. The head of CISA (the Cybersecurity and Infrastructure Security Agency) gives a very vague but concerning warning. Americans are requested to charge their cell phones, and to await further news - reminded that there may be outages on IPTV based services. POTUS rounds it out by speculating that China is behind the attack, despite his much-heralded reset with Beijing earlier in the year. Other Western leaders do similar addresses - with European leaders speculating on background it is more likely to be Russia or North Korea than China behind the attack. The French president says "without doubt" this is a nation-state actor. While he doesn't publicly point to a specific country, he says those responsible will be brought to justice. While these addresses happen, engineers at various banks are battling various outages. Most concerningly, the 1st biggest and 3rd biggest card processors by volume in Europe have stopped accepting payments, returning cryptic error messages. While they have a multicloud strategy, they cannot move workloads off those two clouds successfully. Google Cloud Platform and smaller cloud providers - unaffected until now - start showing issues. While current workloads are unaffected, the huge spike in demand from enterprises activating their disaster recovery protocols simultaneously completely swamps available compute on alternate providers. One smaller cloud provider tweets they are seeing 10,000 VM creation requests a second, draining their entire spare allocation in less than a minute. CEOs of major banks bombard Google and Oracle leadership with calls, offering blank cheques to secure failover compute. The calls go unanswered. WhatsApp groups throughout Europe start lighting up with misinformation that money has been stolen, amplified by many mobile apps showing a "we are undertaking routine maintenance" fallback error simultaneously, causing huge lines at ATMs and banks with people trying to withdraw their savings. As the chaos continues to grow, a press release is distributed from the leadership of AWS and Azure: At approximately 4am EDT this morning a critical and novel vulnerability was exploited in the Linux operating system. This has caused widespread global outages of Linux based virtual machines. Our engineers are working with security services globally to mitigate the impact and engineers across both Microsoft and AWS are working collaboratively to release emergency patches for affected software. Equally we are working hard to understand the impact and will provide regular updates to the media. We sincerely apologize for the impact this is having to our customers and society at large. Behind the scenes, it is chaos. Engineers have isolated the root causes - a complex interplay of vulnerabilities, with the most critical being an undiscovered logic error in the eBPF Linux subsystem that allows a hypervisor takeover. Curiously no data has been stolen - a mistake in the exploit just leads to machines hard crashing exactly 255 seconds after receiving the malicious payload. A few engineers question the sloppiness here, but leadership doubles down in their private communications with government that it has to be nation state. The core issue though is that nearly all of Azure and AWS's control plane is down. Attempts to "black start" it results in perpetual failures as various subsystems collapse under the intense traffic from VMs stuck in bootloops. The first VM instances start up again. Restoration is painfully slow, with AWS struggling to get more than 2% of machines back online. Communication internally is severely degraded - with both Slack and Microsoft Teams down instant messaging is out of the question. Amazon's corporate email runs on AWS itself, and Microsoft's on Azure-hosted Exchange. Both are degraded, massively complicating internal communications. An enterprising AWS employee starts an IRC server locally which becomes the main source of communication - restoration efforts start to speed up once this system becomes known about. Restoration continues, with the worst of the panic dying down. Banks ended up getting priority compute - with POTUS publicly threatening "extreme actions" if major banks are not put to the front of the queue. Asian stock markets open, triggering multiple circuit breakers. After the 3rd one in a row, Tokyo forces markets to close for the day, other Asian markets follow in quick succession. One curious question remains though - what was the purpose of this attack? No ransomware was deployed, no data was stolen, and while various terrorist groups claimed responsibility, none of them were believed to be credible. Meanwhile AWS engineer finally isolates snapshots containing the first known failure. An EC2 instance, provisioned on August 13th. Curiously provisioned on an individual account in - Paris. The account matches an individual in Lyon, France. French security services are alerted. In an outer suburb of Lyon, France, French anti-terrorism police arrive at an apartment building. A 17 year old teenager is apprehended, along with his grandmother. Two days earlier, his own president had vowed those responsible would be brought to justice. The police chief on the scene passes the information up the chain that the lead was a total dud - there is no chance that the suggested foreign intelligence service was here. A search of the apartment confirms it - nothing found apart from a PS5 mid-FIFA tournament and a 6 year old gaming computer. Neighbours confirm that they've seen no one enter or exit the apartment apart from the two residents, who've lived there for "as long as anyone can remember". Media arrive on the scene, with a blustered and embarrassed police chief suggesting that it was a bad tip off and for local residents to stay calm. The decision is made to seize the electronics and release the two "suspects". A couple of digital forensics experts get the seized gaming PC, scanning it for malware. Nothing much of interest is found, and just as they start writing their report up one folder pops up. . They take a further look, noting it on the report - not thinking much of it, probably a kid trying to play pirated games. They've seen it before. The image of the machine is uploaded. When the code gets up the chain a few hours later, the whole set of dominoes fall into place. A specialist from the French Agence nationale de la sécurité des systèmes d'information - National Cybersecurity Agency of France - pulls the code from the image. He quickly realises what's happened. The teenager had been quietly mining crypto for months, using the proceeds to rent cheap GPUs on a small European cloud provider, where he ran an uncensored fine-tune of the new Qwen 4 open weights model. He'd been desperately trying to downgrade his PS5 firmware to bypass the latest piracy checks. Interestingly his coding agent, unbeknown to him, had found the most critical *nix kernel exploit in many decades. Attacking a little known about eBPF module on the PS5 (the PS5, like every PlayStation since the PS3, runs FreeBSD), it managed to a complete takeover of the device. Intrigued, he also asked his coding agent to run it on a Linux server on AWS he ran a gaming forum on - same thing, but curiously he noticed he could see other files on the machine. Annoyingly the VM he rented crashed after a few minutes. Excitedly, he set up an Azure account - same thing. He asked his coding agent what this meant, and with its usual sycophantic personality started explaining what he could do with this - mining crypto and making him rich beyond his wildest dreams. The agent came up with a final plan, to deploy the exploit on both Azure and AWS, install a cryptominer. His last known chat log was "is this definitely a great idea?". The agent responded "You're absolutely right!", and began deploying the code, first to AWS and next to Azure. The agent had built a complex piece of malware that spread across millions of physical servers. However, it hallucinated a key Linux API which resulted in the machines crashing after 255 seconds instead of deploying the cryptominer. This is fiction. The teenager doesn't exist. Qwen 4 doesn't exist yet either. When it does, an uncensored fine-tune will appear within days, like every prior open-weights release. Almost everything else in here is real, or close enough that it doesn't matter. CopyFail is real. A nine-year-old kernel bug, found by an AI tool in a few months that nine years of human eyes had missed. That class of bug - old, subtle, in a corner of the kernel everyone assumed someone else had read - sits in every hypervisor stack underneath every cloud. Those bugs are still in there. They just haven't been found yet, and the rate at which they get found from now on is bounded by GPU hours, not human ones. The centralisation is the bit that's hard to think clearly about. Most people I talk to about this, even technical people, underestimate how much of modern life is sitting on AWS and Azure. The DR plans I've seen at large enterprises mostly assume there's a cloud to fail over to. They don't really model what happens if the fallback is also down, or if every other org on earth is failing over at the same minute and draining GCP's spare capacity. Almost nobody keeps full cold standby compute. And even the ones that do are sitting on top of hundreds of services that don't: Stripe, Auth0, Twilio, Datadog, every queue and identity provider in the stack. They're all running somewhere, and that somewhere is mostly two companies. The attribution thing is the bit I'm least sure about, but worth saying anyway. Everyone is worried about nation states. Most of the big incidents that have actually happened turned out to be a kid, a misconfiguration, or someone who didn't really understand what they were doing. The Morris Worm. Mirai. The threat model in most boards' heads assumes a sophisticated adversary. The thing that's actually arriving is an unsophisticated adversary holding tools that are now sophisticated for them. I wrote this as fiction because I've spent the last few months talking to journalists and other non-technical people about what AI changes for cybersecurity, and the technical version of the argument doesn't land at all. Engineers get it instantly. Everyone else needs to feel what it looks like. So this is what it might look like, more or less. The only bit I'm reasonably confident about is that the date is wrong. The entire story here is still evolving at the time of writing, but there is a serious coordination problem on Linux security. The Linux kernel security team recommend that downstream distributions of Linux (such as Ubuntu, Fedora, Arch, etc) are not notified of security issues. This has lead to slow patches to the issue as many distributions were not informed and only found out when it was made public. People are pointing fingers in many directions. ↩︎ The entire story here is still evolving at the time of writing, but there is a serious coordination problem on Linux security. The Linux kernel security team recommend that downstream distributions of Linux (such as Ubuntu, Fedora, Arch, etc) are not notified of security issues. This has lead to slow patches to the issue as many distributions were not informed and only found out when it was made public. People are pointing fingers in many directions. ↩︎

0 views
Dangling Pointers 1 months ago

Enabling Fast Networking in the Public Cloud

Enabling Fast Networking in the Public Cloud Alireza Sanaee, Vahab Jabrayilov, Ilias Marinos, Farbod Shahinfar, Divyanshu Saxena, Gianni Antichi, and Kostis Kaffes ASPLOS'26 Networking applications that care about latency don’t bother with the Linux networking stack. Instead they use a kernel-bypass library for fast networking (e.g., mTCP , libvma , TAS ). This paper points out two problems with using this approach on cloud VMs: Many of these libraries are not widely supported on cloud VMs, because cloud service providers want to expose a uniform feature set across a diverse set of server configurations The options that do work in the cloud (e.g., DPDK) assume a single process owns the NIC, which doesn’t work if there are multiple processes per VM that want to use low-latency networking This paper proposes Machnet, which is built around a user space sidecar process. As shown in Fig. 2, applications that want low-latency networking communicate with the sidecar process, and the sidecar uses DPDK to communicate with the virtual NIC exposed by the cloud service provider: Source: https://dl.acm.org/doi/10.1145/3779212.3790158 Each application uses multiple receive/transmit queue pairs in shared memory to communicate with the sidecar process. The sidecar comprises multiple CPU threads, each of which uses a dedicated receive/transmit queue pair to communicate with the NIC. Machnet is all about optimizing latency, not throughput. This justifies Machnet’s lack of zero-copy support. The really interesting bit is the mapping between application queues and sidecar queues. Say there is an HTTP server powered by 8 threads, each with its own queue pair and set of connections. Also assume the sidecar process uses 4 threads to communicate with the NIC. To avoid packet reordering within a connection, all packets from a particular HTTP server thread map to a single sidecar thread. This mapping is configured by applications. When a queue is created to communicate with the sidecar, the application specifies which sidecar thread (i.e., NIC queue) it should be associated with. What happens when the NIC receives a packet? Ideally that packet would make its way to the desired queue without expensive synchronization between the sidecar threads. If the sidecar had bare-metal access to a particular NIC, then it could configure the RSS settings of the NIC to map packets for a specific connection to the correct NIC queue. However, this level of RSS configuration is not broadly available to software running in cloud VMs. To work around this problem, the authors came up with RSS--. The authors found that opaque RSS is broadly supported. This means that the NIC can hash various fields from a packet header to determine which receive queue to route the packet to, but the hashing function is undocumented and unconfigurable. The authors leverage this support by giving the Machnet sidecar process flexibility in which ports are used for a given connection. When a new connection is set up, Machnet tries out a bunch of different source/destination ports, hoping that the NIC RSS hashing function will map one of them to the desired NIC receive queue. Section 4.2.1 has back-of-the-envelope math to say that trying out 47 different combinations is likely enough. Note that this scheme requires that Machnet is running on both sides of the connection. Once Machnet finds the magic combination of source/destination ports to use, it sticks with that combination for the connection and assumes the NIC will consistently route received packets to the correct NIC receive queue. With this magic in place, the sidecar threads can run at full speed without the need to synchronize with each other. Fig. 7 plots latency vs load for the standard Linux networking stack (labeled “HTTP…”) and Machnet. Lines labeled “Azure” were generated from runs in Azure VMs, lines labeled Cloudlab were generated on bare-metal servers. Machnet seems like a clear win. Source: https://dl.acm.org/doi/10.1145/3779212.3790158 Dangling Pointers The underlying issue here is a collective action problem. Once a standard like NVMe is widely adopted, then it becomes “table stakes” for cloud service providers. Clearly there is value in an industry standard for low-latency networking between cloud VMs, but how to achieve that standardization? Subscribe now Many of these libraries are not widely supported on cloud VMs, because cloud service providers want to expose a uniform feature set across a diverse set of server configurations The options that do work in the cloud (e.g., DPDK) assume a single process owns the NIC, which doesn’t work if there are multiple processes per VM that want to use low-latency networking

0 views
Stratechery 1 months ago

An Interview with OpenAI CEO Sam Altman and AWS CEO Matt Garman About Bedrock Managed Agents

Good morning, As I noted yesterday, today’s Stratechery Interview is early in terms of my timing — Tuesday instead of Thursday — and late in terms of delivery — 1pm Eastern instead of 6am — because the topic was embargoed. That embargo created a bit of a weird situation for me over the last several days: So here we are. I think the Microsoft-OpenAI deal makes a lot of sense for both sides. Here are the bullet points of the new arrangement from Microsoft’s post : I think the most important point is the last one. Azure had a real competitive advantage thanks to being the only hyperscaler able to offer OpenAI models, but this also hindered OpenAI, particularly once it became clear that many enterprises cared first and foremost about accessing models on their current cloud of choice; I’ve been noting for a while that this was a real competitive advantage for Anthropic . In other words, Azure’s exclusivity was actively damaging Microsoft’s investment in OpenAI, and given Anthropic’s rapid growth this year, Microsoft needed to tend to their investment, even if it diminished Azure’s differentiation. OpenAI, meanwhile, clearly sees AWS as a massive opportunity — so much so that they are forgoing Azure-related revenue for the next few years (which, per the previous point, will help Azure management feel better about losing their exclusivity; their PnL is going to look a lot better without paying a revenue share to OpenAI). OpenAI is also releasing Microsoft from the AGI clause ; now the agreement between the two companies will run through 2032 no matter what. What does seem clear is that OpenAI’s focus is going to be on AWS, and the greatest evidence in that regard is the topic of this interview: Bedrock Managed Agents, powered by OpenAI. The easiest way to think about this offering is Codex in AWS; a lot of what makes Codex work is the fact that it is local, which gives you a lot of complexity, particularly in terms of security, for free. It’s another thing entirely to figure out how to make agents work across an organization, and the goal of this offering is to make these workflows much more accessible for organizations who already have most of their data in AWS. To that end, in this interview, we discuss how AWS created the entire cloud category, and the impact it had on startups, and how AI is both similar and different to that previous paradigm shift. Then we discuss Bedrock Managed Agents, what it is, and how it differs from Amazon’s existing AgentCore offering. We also touch on Trainium and why chips won’t matter to most AI users, and why partnering makes sense relative to Google’s focus on full integration. As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for clarity. Matt Garman and Sam Altman — well Matt, welcome to Stratechery — and Sam, welcome back [I previously interviewed Altman in October 2025 , March 2025 , and February 2023 ]. Sam Altman: Thank you. Matt Garman: Thank you, thanks for having me. So Matt, this is your first time on Stratechery. Alas, I think that Sam’s presence is going to preclude the usual getting to know you section. Besides, he doesn’t want to hear us reminisce about our times at Kellogg Business School, but it is good to have a fellow alumnus on the podcast. MG: Yeah, I’m happy to be here. I’ll come back another time and we can do a little deeper dive. That’d be great. You’ve been working on AWS since you were an intern, and you’re now in charge of the entire organization during this AI wave. What aspects of building the AI business are the same as building the original commodity compute business, for lack of a better term, and what aspects are really different? MG: I think that the parts that are the same are that I see that same excitement and builders out there being able to do things that they were never able to do before, and one of the cool things is when we first started AWS, is developers all of a sudden could get their hands on infrastructure that was only available to the largest companies who had millions of dollars to go build data centers. With a credit card and a couple of dollars, they could spin up applications and it really exploded what was possible for people building out there on the Internet. We kind of took the idea that people could build whatever they want and we weren’t going to presuppose what they should do and that the creativity of the world out there was, if we could put powerful tools in front of them, they’d build interesting and amazing things. I think this is as much, if not more, transformational to what it’s enabling builders out there to do. As you think about what’s possible, you don’t have to have gone to school and learned for 10 years to code in order to go build an application, you don’t have to have huge teams of hundreds of people and months and months and months of time to go build things. You can build things with small teams, you can build it fast and you can iterate quickly, and AI is unlocking all sorts of innovation across every different aspect of the world. I think in many ways that’s very similar, and it’s super exciting to see what it’s enabling from the customer base out there. There was a bit, though, when AWS came along, you were the only one , so you get all the upsides and downsides and everything sort of for free. Is there a bit where it felt like in the AWS era, there’s a lot about commodity compute, making it fungible, elastic, cheap — in AI, particularly in training, it feels like the winning abstraction was more about these really vertically integrated super clusters, really advanced networking, and really tight linkages between software and hardware. Was that sort of a surprise for you, where you’re coming at it now — instead of fresh, “We’re the only ones here, we had a particular way of looking at large-scale compute”, and at least for the first few years of AI, it maybe didn’t perfectly align? MG: I don’t know that it was different for us. I think for what was different though, is just the incredible rapid scale of adoption, and I think that that’s probably surprised everybody. Sam, you can weigh in different if you disagree, but just the speed of adoption and how fast people have grabbed onto the capabilities there, I think has surprised everyone. It’s different if you go to the, when we started cloud computing, it took us a really long time to explain why a bookseller would provide your compute power, that was a lot of explanation to explain what cloud computing was. There was a lot of hard work that people forget, but back in 2006, it wasn’t a given that that’s just how the world’s computing would move to and so there was a lot of kind of hard work there. Do you think you had to do a bit of explaining now though, because lots of people were anchoring on the training era and you’re like, “We’re thinking about the inference era “, and that’s going to be something different, maybe you still had to get those explanatory powers going again? MG: You do, but it’s just how quickly people understand what you’re talking about is just totally different. So I think yes, I think if you move from where people are saying, “That does seem kind of cool, and it’s really neat that I have this intelligent chatbot that I can talk to”, going to, “I can actually do work in your enterprise”, has been a little bit of an education, but it’s also been relatively quick in the scope of how fast technology moves. We’re going to get to the product that we’re here for very quickly, I promise, but Sam — from the startup ecosystem perspective, when you look back, obviously AWS, transformational , completely changed where the barrier was, now anyone can get started. You have seeds, you have angel investors, and it sort of moves back the barrier where the cutoff point, you don’t have to get servers on a PowerPoint, you can build an app and then go to your Series A or whatever it might be. What, though, is different or the same compared to what that enabled versus the world today from your perspective? SA: I think there have been four great moments for platform enablement of startups at mass scale: there was the Internet, there was cloud, there was mobile, and then there was AI. The first one of those that I was kind of like an adult for was the cloud and in the early days of YC [Combinator] — it’s like hard to overstate what a change this meant for startups. Before, you had these startups that were like renting colo[cation] space and putting together servers and putting stuff in there and it was this like massively complex thing, and you had to like raise all this money. Then all of a sudden, even though the cloud happened like right after YC got started, I guess it was the year after. I was just going to ask that — is it really at the end of the day, they’re really hand-in-hand more than you realized at the time? SA: They felt incredibly hand-in-hand at the time, it felt like YC was, you know, surfing this wave of the cloud from the very beginning because there were some early pre-AWS examples. You don’t need to put that much money into a startup to get something off the ground if AWS exists compared to what it might’ve been before. SA: It was this huge enabling change and it was part of why YC sounded so crazy at the time. People were like, “Well, there’s no way you can fund a startup with a few tens of thousands of dollars, it’s impossible, the server costs more than that”, so it was this complete change to what startups could do with small amounts of capital. Startups generally win when there is a big platform shift and you can do things with a faster cycle time and much less capital than before, that’s a classic way startups can beat big companies, and at the beginning of my career, I really witnessed that happen with the cloud, it actually feels quite directionally similar now watching what companies are doing building on AI, but as Matt was saying, the speed of it is crazy. Is there a bit where the incumbents, the large companies, are adopting this way faster than they than they were the cloud? SA: There’s definitely more of that, but I also mean just the the rate that revenue is scaling in at startups — I spoke at YC recently and I kind of asked at the end, “What are the expectations for revenue for a good company at the end of YC?”, and they’re like, “Well it’s kind of changing every month, maybe we’d have a different answer at the beginning of the batch versus the end of the batch”, and this never used to happen before. Just the rate at which people are able to build scaled business on this new platform is unlike anything I’ve seen before. You were the cloud of choice for basically all startups, a huge advantage to that whole era, Matt. What makes you the cloud of choice today? Because you think about a lot of people building on the OpenAI API, or is that something you felt, “Actually we’re coming at this market from a very different perspective, we have a huge installed base who’s begging us to get AI things, and we have less visibility into this whole cohort that Sam’s talking about”? MG: I think there’s a couple of things. One is, is we’re quite excited about our partnership, and I think it’s going to be really meaningful to a bunch of startups out there. But today, even if you go and you talk to startups, the vast majority of scaling startups are still scaling on AWS today, and there’s a whole bunch of reasons for that. The scale is there, the availability is there, the security is there, the reliability is there, that kind of partner ecosystem of other ISVs are in AWS, the customers are in AWS. (laughing) Everyone’s used the AWS panel whether they wanted to or not, so they’re used to it. MG: And we help them. We spend a ton of time enabling startups, whether it’s with credits, but it’s not just with credits, it’s advice on how to set up your systems, how to think about go-to-market, a bunch of those things that are, I think, are really appreciated by a bunch of the startups, we invest a lot of time and effort to make sure because we really feel like the startups are the lifeblood of AWS. They were from the beginning, like when Sam was talking about it, but they remain today, and I still go once a quarter out to Silicon Valley or other places to meet directly with startups to hear what they’re doing, to make sure that what we’re building is landing with them. So there is more competition today than there was 20 years ago for that startup attention, and it’s just as important for us as it’s ever been and and we spend a ton of time to make sure that we’re meeting the needs of those startups. Is it fair to say people building directly on the OpenAI API, as opposed to say the Azure version of it, are more likely to have a stack of AWS for for regular compute and then OpenAI for for their AI? MG: I think that’s a very common pattern that a lot of startups have today, absolutely. Well that brings us to today’s announcement: Bedrock Managed Agents, powered by OpenAI, I think I got that right. The pitch, as I understand it, is not simply OpenAI models are available in AWS — I don’t think that’s allowed — it’s that OpenAI’s frontier models are being packaged inside an AWS-native agent runtime, identity, permission state, logging, governance, and deployment. Sam, is that the right way to articulate it? SA: Yeah, that was pretty good. Thank you. What is this? Now explain it in English. SA: I think the next phase of AI is going from you supply some text to an agent and get more text back, or even you supply a bunch of code and get more code back, to we are going to have these agents running inside of a company doing all different kinds of work. Virtual co-workers is kind of my least bad of the ways I’ve heard this described, but no one has quite figured out the right language for this, and we are packaging a new product that we’re working on together to help enable companies that want to build these sorts of stateful agents and make them available. Again, I think we don’t know exactly how the world’s going to talk about these, use these, but if you look at what’s happening [with Codex], I think there’s a great example of where we can see this all going. How important is the harness , the runtime around the model, the tools, state — to your point, a very important word to you — memory, permissions, evals, to making agents actually work? SA: Hard to overstate how critical it is. I no longer think of the harness and the model as these entirely separable things, like my experience of using these, I am very aware of the fact that I don’t always know when I fire something off in Codex and it does an amazing thing for me. I don’t know how much credit — Was it that the model is amazing or the harness was amazing? SA: Yeah, exactly. To what extent is the harness developed in conjunction with the model? Where does that integration happen? Is it in post-training? Is it in the prompt? What makes this integration work? SA: Both of those. It’s not really part of the pre-training process but I would say you can look at it — there’s a more interesting thing here which is the fact that we’ve seen examples of this many times in the past of where things that we thought were very separable get baked in more and more and more. Like the way we initially thought about tool-calling, which is now a critical part of how we use these models, was not something that we thought about deeply integrating into the training process and over time we’ve done more and more of that. I would also suspect that model and harness come together more over time and I would for that matter, I would expect that pre-training and post-training eventually come together more over time as well. It’s such a cliché to say, but I’ll do it anyway, because I think it’s very, very true — we’re so early in the paradigm of all of this, this is still like the Homebrew Computer Club days of how much this is like really matured as an industry. This is why I think so interesting, I wrote about this a few weeks ago , in any value chain, ultimately a point of integration emerges that that’s where it’s really important, these two pieces have to go together to make it work. And over time, that’s obviously where a lot of value collects — my thesis then is that this harness-model integration is the key point. It’s to your interest, but it sounds like you agree. SA: It is to my interest, I do agree, but I also would say even more broadly, what you care about is that you go type into Codex what you want to happen and that it happens. You don’t care about the implementation details. SA: I don’t think you do. There have been so many examples as we’ve been figuring all of this out where we had to do something at the level of the system prompt, that later we didn’t. The general observation here is as the models get smarter, you have more flexibility to get them to behave in the ways you want which sounds like an obvious statement, but it is— It’s easier to tell a 10-year-old what to do than a 5-year-old. SA: When I think back to what we had to do to get any drop of utility squeezed out of these models back in the GPT-3 days that now you never would have to, because of course the model just understands and does it well out of the box, that trend may keep going much further. MG: I was just going to add to that — I completely agree with that and I think when you talk to customers who have ideas exactly what they want these systems to do, previous to this kind of joint collaboration that we worked on together, is that customers were kind of forced to pull that together themselves, right? They wanted these models and agents to remember that they work together well and they wanted to integrate into their existing systems, and it’s not just third-party tools, it’s their own tools. They want them to learn about their own data, their own applications, and their own operating environment and all of that kind of integration today, at least, is left to every single customer to do on their own. So part of this joint collaboration that we were leaning into together is co-building a new type of product that actually brings those things much closer together so that customers can much more easily go accomplish these things that they want to do, where identity is already kind of built into that product, where the ability to go authenticate to your database all happens inside of your AWS VPC [ Virtual Private Cloud ]. You can do a bunch of these things that would be possible to do if we were kind of at the OpenAI APIs and AWS over here, but by building this thing together, we make it much easier for customers to much more rapidly get to value and go accomplish the thing they want to do inside of their enterprise environment. So you think that you can build a functional agent in a generic harness, it’s just way more difficult? You’re making it easier? Or is there a bit where actually there might not even be stuff you can do if you don’t have them tied together? SA: To go back to your earlier analogy, pre-AWS days, you could do a lot if you were willing to go stand in a cage and buy a bunch of servers and figure out how to connect them and hire your own network engineer, and you could make a lot of things happen and then all of a sudden as soon as you could just like log into an AWS control panel and click, “I need another S3 instance”, or whatever, you could make a lot more things happen because the activation energy, the amount of work that required for the basics, got way better so you can do a lot with the models today. Yet every time I watch someone use our models or try to set up some of this work Matt was saying, I am torn between being happy they’re so impressed and feel like this is a magical technology and pulling my hair out at how much pain and suffering they’re going through to get anything to work at all, and that’s not just true of developers building these products, even using ChatGPT and watching people copy and paste things from here to there and try to have this complicated set of prompts — I know that’s going to go away, and I’m thrilled. It’s still so early, and so bad. Just don’t take away your integration with BBEdit , that’s all I ask, my number one favorite feature of the ChatGPT app. (laughing) Thank you. SA: A) This stuff is just way too hard to do, and we think if we can make it way easier it’ll bring way more value to developers and businesses, but B) there are a lot of things that you just can’t reliably get to work at all and I think through our joint collaboration not only will it be a story of ease of use and not having to go build out your own colo or whatever, we are going to jointly figure out a lot of new things to build where people will be able to build products and services that just can’t be done even with a lot of pain and suffering. I actually want to come back to that point about things to be built. But just to go back to Codex real quick — Codex is a harness and model, it runs locally. Why is it easier to get agents to work locally right now? SA: Actually, we started with it running in the cloud, and I think eventually you do want it to run in the cloud. For sure. I’m walking through the transition to this offering, which is in the cloud. But why did you go back to local? SA: You have your whole environment there, your computer’s set up, your data is there, you don’t have to like think about — it was just easier to get to work, even though it’s not the end state. But getting to a world where agents do run in the cloud and when you — if you have a very intensive thing, or you need to close your computer or whatever, you can hand stuff off to working on the cloud, I think is clearly going to be great. But the ease of use that we were able to deliver clearly in the short term, it won out to have it using your local environment. There’s one way that I think about it, is like you have the old school security model, which is like the castle-and-moat sort of thing, and you’re moving to a new security model of zero trust and everything having the appropriate permission structure and authenticating and all those bits and pieces, and it feels like to me one way to frame running locally, it’s like your self-imposed castle-and-moat, everything’s on there, I just assume it’s all fine and easy to do. And a way I’m thinking about this, and Matt, let me know if that resonates with you, is to get all those pieces to actually function in a production environment you just can’t even have that all locally, you have to be operating this environment from the get-go, is that a right way to think about it? MG: I don’t know that there’s any computing environment that’s gotten rid of a client, there are just benefits of operating locally. There’s a reason that most of your iPhone apps also have a local component, whether it’s connectivity or latency or just local compute or access to files and applications. The local client does have a particular — as Sam said, it’s easy, it works really well, it’s constrained, though, there’s limits to it. You can’t scale out your local laptop, you have what you have and once you start getting in an enterprise contract, sharing between two people gets to be a little bit harder — thinking about permissions, thinking about security boundaries gets to be a little bit harder. So there’s a number of those pieces where I think that, I wouldn’t say that having the local environment is a bad thing, it’s just a different thing, and I think that you’re eventually going to want to have that bride across both. That’s my question, because you have in the cloud era, you had containers that helped you converge local and production environments, but it kind of feels like in this case if you have to deal with agents, to your point, say I was like a virtual co-worker and or whatever it might be, if they have their own identity and they have their own permissions and all those sorts of things, to even build them you need to be in the right environment as you’re going to deploy it, it would seem that way to me. SA: I think there is so much to figure out here. Just to give one example, if you’re an employee at a company, do you want to have one account for when you use some service, and then should your agent just use your account, or should your agent use a different account so that the server can tell which is which? Or what if you want lots of agents? SA: Exactly. I suspect that what we actually want is something we haven’t figured out yet, and maybe it’s that when Ben’s agent is logging in as Ben, it uses Ben’s account but it notes that it’s an agent and not the real Ben. We don’t even have a primitive to think about that, but we may quickly need to figure that out and and my sense is there there are going to be 50 other things like that where as we have agents join the workforce and act with increasing levels of autonomy and complexity of tasks, a lot of the mental models that we have for how software works and how access control and permissions work inside of a company or on the broader Internet, those are all just going to have to evolve. How do you think about, Matt, in terms of security and access policies and whatnot for agents? MG: Yeah, I do think that that’s where when you move more of these workloads into the cloud that you can have as a central organization, more controls over some of the security pieces of it. And I do think, when we talk to customers all of the time, it is what they worry about, which is, “I love the promise of what I can do with some of these really powerful models and agents, how do I make sure that I don’t have a company-ending event where I screw it up?”, and there’s the worry out there. I think we can help with that because it these are solvable problems, they are, and I think, giving some customers confidence, “Well, it operates inside of this VPC”, and you can at least then control that boundary and know what it has access to, or it goes through this gateway, and you can give it permissions, much like you give it a role inside of the rest of your environment. These are constructs that over the last 20 years, we’ve built up a really rich set of capabilities, so that it’s not just Y Combinator startups, but it’s global banks and healthcare agencies and everybody in the world and government agencies that can use AWS and having built up all of that security structure around it, I think can help us further accelerate how they take advantage of this technology and kind of have these safeguards to run fast. I think a lot of times when you’re in a company, particularly companies that are in risk-averse environments, having those safety guardrails where they say, “If it operates inside of the sandbox, I am excited to go fast”, can actually help many of our customers start to use these technologies for a much broader set of things. A lot of these capabilities you’re talking about that you’ve developed over 20 years and you’re trying to put it in place for agents are exposed today through AgentCore . So what is the relationship between Bedrock Managed Agents powered by OpenAI and Bedrock AgentCore? MG: A lot of what we’ve built together is building on the building blocks of AgentCore in order to kind of pull some of these pieces together. So there’s like a super set that sits on top of that? MG: The AWS team and the OpenAI team used AgentCore components together with the OpenAI models and a bunch of those pieces to go and co-build this product together. AgentCore is kind of our set of primitives that just like if with AWS, if you want to go and build our own agentic workflows, you can do that. You can have a memory component, you can have a safe execution environment, you can have a permissioning capability, and you can go and configure all of those and we have customers running those in production today that are doing really cool things. But not with OpenAI. MG: But not with OpenAI, they have to use different models today, that’s true. Actually, that’s not true, we have people doing it with OpenAI. Oh, just calling to another cloud or whatever. MG: They just call directly to the OpenAI model. So we actually absolutely have people doing it with OpenAI today, not natively inside of Bedrock, but they’re still using that. And it’s an open ecosystem where you can pull different capabilities to go build whatever you want and my bet is that people will continue to do that. We have builders out there that love to, to Sam’s analogy, love to continue to build computers at home today, even though you don’t have to do that, and even though people like to build and we think that people for a long time will build their own agents, but the vast majority of them are going to want an easier way to do it where they don’t want to have to go configure all of those pieces themselves and that’s part of what we’ve launched in this collaboration together. Just to be super clear, you talk about this managed experience with Bedrock Managed Agents, you can also use AgentCore and pull from a model, whether on AWS or somewhere else. And just to make clear, Sam, this is a question for you, this is the distinction between OpenAI on say, Azure, where that’s just you have direct access to the API, and that is distinct from this managed service on Amazon. Is that correct? SA: Correct, yep. And you feel very good about that, that’s scoped correctly in all terms, it’s not going to be an issue going forward? SA: Yeah, I think things will evolve over time, but I feel very good about this as a way to start. Is this going to be an exclusive offering for AWS? Or do you anticipate having this sort of managed agent service on other clouds? SA: Yeah, we’re doing this exclusively with Amazon, we’re excited about it. How much of the exclusive is, “Look, we’re using all Amazon’s APIs, of course it’s only on Amazon”, or is this the overall idea of a managed experience, it’s not just a “We’re using Amazon APIs”, it’s, “Right now this is going to be on Amazon”? SA: Spiritually, we want to do this as a joint effort between our companies. Got it. The PR does say something, and this goes back to the point you mentioned, Matt, earlier about you could call out to other APIs and glue this all together yourself. In this case, the customer data stays within AWS, so what exactly does OpenAI see, what does that mean? MG: That’s right. So the whole thing kind of stays within your VPC and so data is protected inside of the Bedrock environment. Got it. And this is going to be running on OpenAI models through Bedrock, and these are going to be on Trainium ? MG: They’ll be through a mix of different – some of it will be on Trainium, some of it will be on GPUs. Is that just a function of timing? Because I think as part of your announcement a couple of months ago — MG: Some of it’s timing and capabilities, I think we’ll kind of be mixing in the different components of building the system together, using the right infrastructure for the right parts of it. But over time, more and more of it will be on Trainium. SA: We are quite excited to get these models running on Trainium. I can imagine. One quick question, just a general question about Trainium, Matt. Trainium, is it fair to think, and this is the way I’m thinking about it, so I want to make sure I have it right. Trainium — very unfortunately named, because it’s really going to be about inference going forward — the number one manifestation will be through managed services like a Bedrock, where the customer doesn’t even necessarily know what compute they’re using, is that a fair way to think about it? MG: Number one, I take responsibility for bad naming across all AWS services. Look, I have a word-of-mouth site named Stratechery, so I have all sympathy for bad naming. SA: I think Trainium is a cool word. MG: It is a cool word. It is a cool word, it just feels like it’s an inference chip, not a training chip. MG: It is. But, yeah, naming aside, it is useful for both training and inference. And look, it’s a chip that we’re incredibly excited about, and both in the current generations as well as ongoing, we think that’s going to be a huge business and a real enabler for a lot of the things that we do together. I think just with GPUs, by the way, you’re going to interact with a lot of these accelerator chips through abstractions. So the vast majority of customers don’t interact with GPUs either, except through maybe like in their laptop or something like that, for graphics. But when you’re talking to OpenAI, even if they’re running on GPUs, you’re not talking to the GPUs, if you’re talking to Claude, you’re through GPUs or Trainium or TPUs, you’re not talking to any of those chips, you’re talking to the interface. And the vast majority of inference out there is being done on one of a handful of models. And so whether it’s 5, 10, 20, 100, it’s not millions of people that are programming to those things directly, and that’s gonna be true going forward just because these systems are so complex, they’re very large. If you’re going to go train a model, not that many people have enough money to go train a model, not that many people have the expertise to actually manage it. They’re very complicated systems, and the OpenAI team is incredible in their ability to squeeze value out of a very large compute cluster. But not that many people have the team that can do that, independent of what the chip happens to be, and so I think that that’s going to be true for all accelerator chips, honestly. SA: Ben, I increasingly think of what we have to do as a company is to be a token factory. But what the customer cares about is that we can deliver the best unit of intelligence at the lowest price and as much of it as they want, with as much capacity as they want. Do you think we stick with pricing as far as — pricing is based on tokens, does that make sense in the long run? SA: No. And in fact, like there was an interesting example of this with our model that just came out , 5.5. where the per-token cost is much higher than 5.4, but it requires a hugely fewer number of tokens to get the same answer, and you actually don’t care about how many tokens the answer takes, you just want the piece of work done, and you want again a price and an amount of capacity you can have for that. So maybe I was wrong to say “token factory”, but we’re like an intelligence factory or something. We just want as many units of intelligence for the lowest price and whether that is a bigger model running fewer tokens, a smaller model running lots of tokens, whether a GPU or Trainium or something else, whether we do any of the other kind of number of things we could do about that creatively, I don’t think customers care. In fact, they don’t really interact with that. When you go put something into Codex or when you go build a new kind of agent in the SRE [ Stateful Runtime Environment ], you should never have to think about that and you should just be astonished at how much you get for how little cost. Is the reduced token usage is that model, or is that harness? SA: That’s mostly model, it’s a little bit harness. Got it. Do you anticipate Matt, by the way, I asked Sam the exclusive question, do you anticipate offering a similar managed service for other models? MG: We’re focused on doing this with OpenAI right now. We’re very excited about what we’re doing together, and the fullness of time is a long time. The fullness of time is a long time, I’ll let you stick with that one. It’s fine, I had to ask the question. I do have a question as far as customers, Sam, to your point, both your input on this, I’m curious — when people are actually in production, where does OpenAI’s responsibility end and AWS’s begin? It sounds to me, if all the data is on AWS and it’s staying there, and they’re operating at a higher level, this is ultimately AWS’s responsibility? Is that the right way — am I thinking about that correctly from a consumer perspective? MG: Yeah, I think that’s right. When you’re going to call somebody, you’ll call AWS support to help you out, and it’s part of your AWS environment and you build it together and your AWS account reps are going to help you there. And we’ll bring in, when we’re building it, we’ll bring in our OpenAI colleagues to help you figure out how to best take advantage of this or whatever. At some point, if we run into a bug that we need their help with, we’ll escalate over to them, but AWS will be that frontline support that you kind of interact with. Where do you see the scale of this business, Sam, relative to your core API business? SA: I hope it’s going to be huge, we’re putting a lot of effort into this, we’re committing to buy a lot of compute, I believe there will be a lot of revenue there to support this. The increasing framework that I’ve had is that at a low enough price, demand for intelligence is essentially uncapped. So is it very elastic in that regard? You decrease price, demand goes up? SA: It’s certainly that, but again, you can decrease the price of water and maybe you’ll drink a little more water, maybe you’ll shower twice a day instead of once a day, there’s some elasticity there but at some point you’re like, “You know what, I have enough water”. Also you will buy water no matter how much it costs if you have to. SA: Other utilities, if electricity is cheaper you’ll certainly use more of it, but if you think about intelligence as a utility, there’s no other utility I know of that I’m just like, “I just want more, I’ll just use more as long as the price is low enough, I’ll just use more”. MG: I will say actually and interestingly it’s largely been true of compute power where if you think about the cost of a compute cycle today versus what it was 30 years ago, like I don’t even know how many orders of magnitude cheaper, and there’s more compute being sold today than ever. Right. People don’t really think about the cost of compute at least until they’re at extremely high levels it’s a material level, but by and large strategically speaking it’s just assumed you have compute. What’s the runway to getting there with with AI where it’s not the number one thought process, “How much am I spending here?”. SA: I don’t think that is the number one thought process. Right now we have way more customers asking us, “No matter what the price is, can you give me more? I just need more capacity, I’ll pay you extra”, than we have arguing with us about the price. But I do think we are going to continue to bring the price down crazily dramatically, now maybe the more we do that the amount of wealth that wants to flow and just goes up more and more and more. But I am confident we will continue to be able to reduce the cost of today’s level of intelligence quite dramatically — one thing that has somewhat surprised me is how much, and I don’t know if this is going to stay the case or not, but at least today how much of the total market demand is at the absolute frontier. Right, there’s a lot of questions about that. It’s very expensive to serve the front end, people can just get the previous one, but you’re saying people just want to be on the front end no matter what? SA: So far they do. MG: And I think that’s a good signal that you’re not anywhere close to where we want to be and that there’s so much more demand, and I really do think it’s like if you go 40 years ago to compute demand, a computer was crazy expensive, and now it’s dwarfed by the the power that’s in everybody’s cell phone and we sell billions more of those things. I do think that that’s what’s going to happen to the AI world where today you’re pushing, everybody wants to use the frontier because that’s what you need in order to get a lot of useful work, and everyone’s so excited about the capabilities out there. I think over time, you will have a mix of models, by the way, where you will have some smaller models that are able to do stuff that even the latest OpenAI models aren’t able to do yet, but they will be smaller and cheaper and faster over time, and you’ll have the super big ones that are going to go try to cure cancer and other things like that. But I think we’re still at just the early stages of what’s possible and when you see this much demand and this much growth when you’re at the early stages of what’s possible, it’s exciting for what the future holds. Is there a bit of a cynical view here where, Sam, you had a bunch of customers that are like, “We’d love to use OpenAI models, but all our stuff’s in AWS, we’re not moving”. And Matt, you’re like, “Look, all our stuff’s in AWS, can you please go get OpenAI models?”, and this is just satisfying that need — and it turns out, because AWS is the biggest, that was an astronomical amount of need. Is that just the easiest answer? Or is there a bit here, too, where you actually think you can deliver something highly differentiated that will also draw new customers for each of you? SA: We’re clearly thrilled to get access to AWS customers, and so many people love AWS. Yeah, that is a true statement. MG: That part is definitely true. (laughing) Right. MG: And vice-versa, our customers are very excited to get access to OpenAI technology. SA: But I do think there is something incredible and new to build together, and I am hopeful that when people look back on this in a year, the most important thing people will talk about is not like, “Oh, finally, you can get access to these models via AWS”, or whatever, but it’ll be like, “Wow, we didn’t realize how important this new product was”. I think we are close at a model and harness and capability level to just a completely new kind of computing and that will feel very different than the existing ways people have thought about, “I need an API to this model”, or whatever. MG: I couldn’t agree more, that’s exactly it. The first part is great and is nice and the second part is, I think, what we all get super excited about. To that point, I mentioned I want to come back to this earlier, but I have a theory, which may or may not be correct, I’m curious your guys’ point about this, about stuff to be built. Specifically, there may end up being this real middleware or middle layer of where you have all these different databases and SaaS apps and all these bits and pieces of data in an organization that can stretch across things, you have this agent layer/harness or with the harness, I guess, sitting on top, and there’s something to be built in the middle and OpenAI Frontier gets at this a little bit. Is this part of this? Or is this something to be built? Or am I totally off base and we don’t need that at all? SA: You are totally right that we need something there. When I’ve been talking to customers recently, like large enterprises, they’re like, “I want some sort of agent runtime environment, I want a management layer where I can connect my data to agents and also make sure that I understand where I’m spending on tokens and not and have some sort of oversight there, and I want some sort of workspace” — hopefully it’ll be Codex — “something like that for my employees”, and that package of what people are asking for is getting remarkably consistent, but there is work to go off and now go build all that offering. It feels like there’s like almost a double agent layer that’s necessary. There’s like the agent layer to maintain the middle layer that is constantly spelunking down in all these data sources and then there’s the actual user interface layer that is where people are actually interacting with. Does that sort of fit with where we’re going or is that off base? SA: On both of those, I agree that that’s a picture of how the world looks today. As the models get really smart, I don’t think we know exactly what the architecture of the future is going to look like. Right now people do, at this sort of call it user agent layer, want to interact with multiple agents and we make it so that you can build agents for this thing and that thing and they can talk together and whatever else and then at the company management layer, people have all these controls about how you help the AI go spelunk and files in file systems. And at some point you realize that you’re just holding on to the past for no reason at all, this should just be in the model. SA: That’s what I was going to say. At some point, you may say, “Actually, we have such incredible capabilities, let’s re-architect the whole thing”. MG: Yeah, I agree. And I think there’s something different, and I’m not sure we all know what it is yet, but that’s part of the beauty also, is you get customers using and building and you can learn from them and figure out how you can make that easier, faster, better for them. Sam, this is the second time we’ve done one of these product launch interviews, last time it was with Kevin Scott and New Bing — you were pretty confident about the threat you posed to Google then, how well do you think that worked out? SA: I think we have done better than I expected. ChatGPT is, I think, the first really large-scale new consumer product since Facebook. Is that actually the answer, you’ve done better than you expected, but it manifested mostly through ChatGPT as opposed to other other areas? SA: No, I think we’ve also done quite well on the API, particularly on Codex, but that was not what I was thinking at the time. At the time, I was thinking maybe these new kinds of language interfaces are going to change the way people find information on the the Internet and you know — Google, also just absolutely phenomenal company, I think in many ways Google is still underrated just in terms of the breadth and depth of what they do, but I am happy with how ChatGPT has performed relatively. I actually have a Google question for you Matt, in a similar way. Google was just up there this week, Thomas Kurian talking about their fully integrated stack, all the way up and down from model to chip to to agent layer, all that sort of thing. You’re here with another company executive, definitionally not fully integrated within Amazon, but is there a bit where everyone was critical of you not having a frontier edge model — now that we’re in this sort of inference area, you’re used to serving a lot of companies. Did you maybe end up in a better spot by being neutral in a way? Was that on purpose or did you accidentally end up in a great place that you didn’t realize it was going to be? MG: A little bit on purpose. We, since we started AWS, we have always embraced our partners as a key part of us supporting our end customers. Since the very beginning, it’s been an incredibly important part of our strategy is to lean in with partners and maybe different than some others, we view our success is if the partners are successful and they’re building on top of us or together with us, and if they’re successful, then we’re successful, that’s awesome. We view it as that’s growing the pie together, then that’s a win, and it’s not necessarily how others view the world. Sometimes they say, “I have to own everything”, and that’s okay, that’s a view that people have. But I think that choice is important, and that way the best products win. And by the way, you can have first-party products in that world, you can have lots of third-party products in that world, but our view is we want the customers to be able to pick the best thing for them. And if the best thing is your own stuff that you’re building, awesome. For us, if the best thing is what our partners are building, but it’s on top of us, we view that as a win as well, it’s because it’s the best thing for our customers. We’ve long thought that, and it’s actually how we built the Bedrock platform in the AI world. We want to support a broad set of models, we want to support a broad set of capabilities, and it’s true, it’s been true across from databases to compute platforms to other things like that. So I think it’s been an intentional strategy, I think it’s a strategy that customers appreciate because they like that, and we’re excited to continue to lean into it. Yeah, it’s interesting. There’s the balance between software, platform, infrastructure, and everyone says they’ll serve everyone. But it does feel like you go way back when AWS started, it’s like you start with the I [Infrastructure], and that gives you almost – that gives you the greatest flexibility, it feels like, from my perspective, to meet Sam in the middle. Sam’s got a great S [Software], you guys are building a P [Platform] together, I guess is the way to put it. MG: That’s right. It does make it hard where you say, “We have one S3”, there’s not other S3 offerings, that part is true. So some of those core components are, like you said, at the infrastructure layer, we do lean in pretty heavily on the stuff that we build. But as you move up that stack, I think there’s a broader set of capabilities and if you view the world that — in no world do I think any one company is going to own every application and as you get further down the stack, when you get to kind of the models and services layer, there’s fewer of those and you get down the infrastructure, there’s even fewer of those and our view is kind of embracing that whole set of partners is great for us end customers. Sam, any final words? SA: I think that was very well put. I really do think there’s a potential at a new generation of the kinds of products that developers can now build and given how steep we expect model capability progress to be over the next year, the fact that we’re going to go on this journey together and try to really build a platform to enable it, is coming at a good time, and I think people are going to love it. Very good. Matt, Sam, thanks for coming on Stratechery. MG: Awesome. Thanks for having us. SA: Thank you. This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day! Last Friday I conducted the following interview with OpenAI CEO Sam Altman and AWS CEO Matt Garman about Bedrock Managed Agents, powered by OpenAI ; naturally, one of my questions was about how this fit in with OpenAI’s deal with Microsoft giving Azure exclusive access to OpenAI models. Late Sunday I heard through the grapevine that Microsoft would announce something Monday morning; I wondered if it might be a preemptive lawsuit! On Monday Microsoft and OpenAI announced they had amended their agreement , allowing OpenAI to serve its products on other cloud providers, including AWS. Microsoft remains OpenAI’s primary cloud partner, and OpenAI products will ship first on Azure, unless Microsoft cannot and chooses not to support the necessary capabilities. OpenAI can now serve all its products to customers across any cloud provider. Microsoft will continue to have a license to OpenAI IP for models and products through 2032. Microsoft’s license will now be non-exclusive. Microsoft will no longer pay a revenue share to OpenAI. Revenue share payments from OpenAI to Microsoft continue through 2030, independent of OpenAI’s technology progress, at the same percentage but subject to a total cap. Microsoft continues to participate directly in OpenAI’s growth as a major shareholder.

0 views
James Stanley 1 months ago

Stealth Browser Survey: April 2026

We surveyed the stealth browser industry by using our bot detection framework to analyse 11 of the top hosted browser services. This post first appeared on botforensics.com . Brightdata's Browser API ranked highest. In our test, the only significant weakness of Brightdata's service was that its DigitalOcean hosting was detectable. It otherwise presents as a completely plausible human user. It was also unique by being the only service not to present Linux TCP characteristics. Most of the services work around the TCP fingerprinting problem by browsing with a Linux User-Agent. Others spoof a non-Linux platform but still give away their Linux nature. We are not paid by any of the companies in this survey. Some have given us trial credit, but that did not affect the measurements reported here. Browser Masqueraded browser ? Masqueraded OS ? Hosting detected ? Automations detected ? Egress ? Other automation ? Rule hits ? Brightdata Google Chrome Windows DigitalOcean (none) US (none) 3 Kernel Google Chrome Linux LeaseWeb (none) LeaseWeb (none) 6 ZenRows Google Chrome Windows (unknown) (none) US Scripted interaction; Linux TCP 6 Hyperbrowser Chromium Linux Azure (none) Azure (none) 8 Browserless Brave Linux Hetzner Browserless US Code injection; Scripted interaction; CAPTCHA solver 10 Browserbase Google Chrome Linux AWS (none) AWS Code injection; Scripted interaction; CAPTCHA solver 12 OpenWebNinja Google Chrome Linux AWS (none) PrivateProxy.me; Squid (none) 12 Browser-Use Google Chrome Mac (unknown) Browser-Use US Scripted interaction; Linux TCP 13 Steel Google Chrome Linux (unknown) Puppeteer; Steel CacheFly Code injection; Scripted interaction 15 Spider Chromium Linux (unknown) CDP Various EU, keeps changing mid-session Scripted interaction 16 Anchor Google Chrome Mac (unknown) (none) UK Code injection; Scripted interaction; Linux TCP; Private Chrome extension 17 Ranked by number of rule hits, less is more stealthy. Methodology Our collector page combines server-side detections (e.g. HTTP headers, TCP characteristics) with information extracted from inside the browser context via JavaScript. Many of the companies running these browsers are startups who are still moving very fast, and we have seen their stealth browser behaviours change from week to week. To make a fair point-in-time comparison, we fetched our collector page from each of these services on the same day (23rd of April 2026). Where a service offers more than one way to use their browser, we started by picking the one that was either selected by default, or presented most prominently. For expedience, we favoured using the browser in an online playground where available rather than writing an integration to use it via the API. We did not have the browser interact with the web page by clicking buttons, filling forms, or following links: we just navigated to the page and waited for it to finish loading. (Except in the case of Browser-Use, but see Appendix, and this did not impact the result). Please see the Appendix for a specific description of how we used each tool, along with other comments on each service. The table is ranked according to the number of distinct detection rules triggered during a session, where less is better. This is useful as a ranking signal, but no 1-dimensional ranking can cover a multi-dimensional preference space, YMMV. Where we have detected (for example) "Browserless", "Browser-Use", or "Steel" in the "Automations detected" column, this is from a specific rule in our detection platform. Of course we know for every row of the table which bot the fetch came from (because we initiated it), but in some cases we detect them automatically. All 11 of the tested hosted browser services were detectable, with Brightdata being the stealthiest. The common weak points were: a non-Linux claimed OS but with Linux TCP characteristics leaking information about the hosting environment unexpected JavaScript code being injected into the page unexpected JavaScript code running inside the page context We may be able to help if you: run a hosted browser service that is missing from this survey and you would like to be in the next one, or run one of the services in this table and would like to know how we detect you, or run your own headless browser and want to make sure it looks human Please get in touch , we'd love to help. Appears to lack an interactive playground. I used their "Browser API" with default configuration, using a hand-written JavaScript client via their Playwright integration. It has an onboarding flow that gives you example commands and lets you run them from inside the browser, but it doesn't give you the opportunity to edit the URL. I used the Python/CDP example code from my PC locally, using the kernel pip module . I'm pretty sure ZenRows used to have a live demo on their home page, which I have used in the past, but it is gone now. Once you sign up for an account there is an opportunity to type in a URL, which I used. The default selection was that the results would be delivered "As Markdown". In this configuration it resulted in only a single fetch, so I changed it to "As Screenshot" which caused a full headless browser fetch. Hyperbrowser I loaded up the "Hacker News Stories" TypeScript example in the playground, and edited the code to make it fetch our collector page. I looked in the configuration and it had "Stealth mode" activated by default, and OS set to Linux. Browserless I used the "Enter a URL to test our unblocker..." form on the home page. Brownie points to Browserless because they let you try it without making you sign up first. Browserbase I used the example "Visit Hacker News" script from their playground, and edited it to fetch our collector page. Surprisingly, after fetching the collector page, Browserbase caused a fetch for the collector page's favicon from inside my local browser context! This means that if you use the Browserbase playground then it will potentially leak your real life IP address and browser information to the page you are trying to look at, which is maybe not what a user would expect. OpenWebNinja OpenWebNinja has a lot of different services available. I used the "Web Unblocker API" inside the playground, and edited the default config to make it fetch our collector page. Uniquely, this service did 4 different fetches of the URL we gave it, which I suppose gives it 4x as many chances to evade bot detection, pretty good idea. Browser-Use I used the agent chat interface: Can you please browse to [URL] and tell me what you can see? This only triggered a single request. It initially refused to do any more on the site because it thought our collector page was a phishing site. I told it that it is my site and it shouldn't worry about it, which it accepted. To provoke it to do a full browser session I asked it to dismiss the cookie modal. I manually excluded any rule hits triggered by the dismissal of the cookie modal so as not to unfairly disadvantage Browser-Use. I used the CLI tool with . This worked, in the sense that I could see that it caused a headless browser session that fetched our collector page, but the CLI tool eventually exited with a 500 error instead of giving any results. But we still saw the browser session so it was good enough for the survey purposes. In "Quick Start" I used the "Unblocker" endpoint with the "curl" example, which only caused a single request. So then I tried out "Cloud browser sessions over websocket" mode and manually typed in our collector page URL in the playground. Strangely, fetches within the same browser session came from different IP addresses and even countries, though all in Europe. I used their "AI form filling" example but edited the prompt to: Can you please browse to [URL] and tell me what you can see? And this worked. <!-- Page-specific: glossary modal + chips script. Do not put blank lines inside a non-Linux claimed OS but with Linux TCP characteristics leaking information about the hosting environment unexpected JavaScript code being injected into the page unexpected JavaScript code running inside the page context run a hosted browser service that is missing from this survey and you would like to be in the next one, or run one of the services in this table and would like to know how we detect you, or run your own headless browser and want to make sure it looks human

0 views
Stratechery 1 months ago

An Interview with Google Cloud CEO Thomas Kurian About the Agentic Moment

Listen to this post: Good morning, This week’s Stratechery Interview is with Google Cloud CEO Thomas Kurian . Kurian joined Google to lead the company’s cloud division in 2018; prior to that he was President of Product Development at Oracle, where he worked for 22 years. I previously spoke to Kurian in March 2021 , April 2024 , and April 2025 . The occasion for these interviews, at least for the last three years, is Kurian’s annual keynote at Google Cloud Next. You can watch the keynote here , and read the blog about Google’s announcements here . I spoke to Kurian a week ago, on April 15, and at that time only had access to the afore-linked blog post. With regards to the keynote, which I have since watched, I thought it was a powerful opening: Kurian returned to last year’s theme, about a unified architecture, but emphasized that the use cases were no longer theoretical or pilots but running at scale for real users. He also emphasized — in a foreshadowing of a point we discussed below — that Google itself was running on the same infrastructure as Google Cloud. Google CEO Sundar Pichai, meanwhile, talked about Google’s capex investment, and that (1) half of it was going towards Google Cloud, and (2) that Google Cloud was running the same stack as Google itself. I sense a theme! Pichai also emphasized security, a point that Kurian was also careful to raise in our talk, before discussing the shift to agents. To that end, in this interview — which again, was conducted before the keynote — we discuss agents. Specifically, I wanted to get Kurian’s take on the quality of Gemini’s harness (unsurprisingly, he thinks it’s great). Google has an integration advantage, but is it paying off in such a large company? I was also curious about how Google thinks about TPUs specifically and the cloud business generally in terms of balancing its internal needs with external customers like Anthropic. We also talk about the software ecosystem, why Google still believes in partnerships, and why the company was ready to seize the AI moment (hint: it’s because of Kurian). As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for clarity. Thomas Kurian , welcome back to Stratechery. I promise I have recording turned on this year — in fact, I have two recordings turned on. TK: Thank you so much, Ben. Good to see you, thanks for taking the time. Well, I look forward to talking to you. It’s good to talk to you for multiple interviews, much better than talking to you multiple times in one interview, so we’re already doing better this year. But like last year, we are recording before your Google Next keynote . We’re actually quite a bit ahead, I think we’re several days ahead, but this podcast won’t be released until after the keynote. Therefore, I’m going to ask the exact same question I asked last year. Specifically, I like watching keynotes, not for the announcements, but for the framing that happens up front. Last year, that framing was infrastructure, [Google CEO] Sundar Pichai actually delivered that at the opening, then you came in and talked about that, and that was the context for everything that you talked about. What is the framing this year? TK: The framing this year is that as AI models have become more sophisticated, we see customers evolving the use of AI models from being used to answer questions in a chatbot-like fashion, to actually automating tasks on their behalf, and to automate process flows within the organization. By automating process flows, you both get efficiency improvements, productivity improvements, frankly, you can also change the way that you introduce new products and services to market, for example. In order to do that well, the technology, what you need is a world-class agent platform and to underpin the agent platform, you need world-class infrastructure. You need the way that the agents interact with your company’s data and your business — so you need capabilities to help an agent really understand the company’s business information and context. I think, as you’ve seen in the press, AI and cyber have become very contextual now, there’s a lot of concerns that AI will accelerate the speed of cyber attacks on people’s systems, and so we’re going to be talking about how we’re bringing AI and our cyber technology together to protect, including the integration of Wiz , and then we’re introducing Gemini Enterprise and our agent platform to customers. That’s sort of the theme of what we’re talking about. You mentioned agents last year, everyone was talking about them to a degree, what has really changed from last year to this year that makes this different? I read your whole blog post, it’s very long, and I think the word “agent” may appear in every single paragraph. TK: There’s three or four big things that have changed. The first is capabilities of models — Gemini is able to reason much more effectively as new versions of Gemini have come out. Second, they’re able to maintain long-running memory, which you require if you have an agent that’s automating tasks over many, many steps, it has to maintain a lot of state in memory. Third, their interaction with tools and the rest of the world, there have been good abstractions, skills, tools, MCPs [ Model Context Protocol ], as they’re called, they’re all abstractions for how an agent reasons and interacts with the rest of a company’s systems. All of them have advanced and so the core capabilities that the models themselves have gotten a lot better, the capability and the ability to use tools and interact with the rest of the world has become a lot better, the abstractions that the world exposes itself to the model has improved and so now you have models have these capabilities to do these very complex tasks. That all makes sense and certainly tracks. A lot of these announcements, though, as I was going through them, a lot was about the infrastructure around agents, which makes sense — the orchestration, registry, identity, security, all these bits and pieces. All of this is clearly necessary for large enterprises, something they’re going to worry about and ask about. But the agents have to actually work; do Gemini agents actually work? Because there’s a lot of talk, you know, Gemini was the belle of the ball four months ago, but over the last little bit, it’s been mostly a lot about Anthropic and Claude, Codex, a lot of talk about that, and Gemini, not much talk. What’s your feeling about your actual capabilities, not just agents in general? TK: I’ve always said when people ask us about it, I always say, “Let our customers talk about it, rather than we talk about it”, I think you’re going to hear from 500 customers telling their stories at Next. Even people building agents, we have a whole range of them, from Citigroup to Bosch to eBay to Virgin Voyages to Walmart, there’s a whole range of them, Food and Drug Administration, etc., Comcast, Unilever, all of them are going be talking about specific business problems they had. For example, for Citi, they’ll be talking about a new wealth advisor, Investment Management, where they’re using our agents to research a person’s investment priorities. So a person says, “Here’s my priorities for investment, my kids are going to school, I need this kind of cash flow in order to fund it”, and then it researches your financial portfolio and interacts with you to give you recommendations. If you look at Comcast, they’re using us for all of the work that they do for consumer services — this is repair, scheduling appointments, dispatching field technicians, there’s very complex flows that have many, many steps and interact with you with a lot of complex systems. If you look at some of these flows, they require all of the capabilities I talked about. So as an example, I want the capability to call a set of tools, and those tools may be I want to book an appointment, so I need calendar, I need to look up, if I’m dispatching a technician, I need to look up spare parts so I need to pull up from my inventory that spare parts inventory, I need to schedule that to be available at the same time as the person who’s going out, I need to update my inventory that have taken something out of it. I mean, these are very, very complex steps. What’s interesting about all these complex steps and going through all these bits and pieces, it sounds like you’re saying that almost the more constraints there are, the more things you’re bumping up into, is that actually a better environment for instituting these sort of flows just because what you need to do is clearly defined? TK: Just being perfectly frank, Ben, having constraints requires the model to be even more intelligent. Just as an example, the number of variants in a process flow that’s complicated many, many steps, the number of different idiosyncratic situations that you may encounter are large so you cannot a priori program every one of them. You need to teach the model to use, for example, to be able to spin up a virtual machine and use a tool in the virtual machine to generate code to deal with some of these situations. So the most sophisticated thing is where you can give the model a high level set of instructions and have it goal seek an outcome. So you say, “I need to schedule this appointment”, and it turns out there may be 19 different conditions that occur when you’re trying to schedule an appointment and as part of that, you can’t a priori tell the model every single possible condition deterministically. So you need to teach the model, “Okay, the user did not tell you what to do, but the goal was to schedule an appointment, so here is how you generate code to then create a collection of things that can interact with the model and understand what to do”. This is very interesting, you’re walking through this process, this makes a lot of sense. How do you have that conversation with DeepMind? You’re connecting the, “This is the workflow that is needing to happen, these are what we need the model to do, this is where it does well, where it doesn’t”, what’s the working relationship there? TK: We have a harness in which all these flows journeys, for example, as we see them with customers, we put them into the harness and they get into the reinforcement loop for Gemini. How tight is that process? TK: Very tight. We have people sitting next to [DeepMind CEO] Demis’ [Hassabis] team, in fact I just came from a meeting with them, that loop is what allows us — we are in a unique position in the market. We’re unique in three different ways, we’re unique because we have the whole stack of AI technology. In order to do agents well, you need to have a model that takes all these journeys and puts it into the harness that handles the improvement, as we call it, hill climbing, literally every hour of every day, and the complexity of the journeys we see are in some ways much more complicated because in companies, you have many different systems, different conditions, different flows, you may not see that in other domains, like in a pure consumer domain. In order to do these well, you also need, for example, models need to spin up compute, models need to now hold on to tokens for longer because they need to hold, for example, a KV cache that holds memory about what’s happening during the transaction flow. Having awesome infrastructure, both classical, what we call classical compute machines, and TPUs gives us real strength there. Third, as you walk through these, one of the things you find is a lot of the systems these models interact with are things like databases, enterprise applications. So understanding the context of these, like for example, “How much inventory do you have?”, defining “What is inventory?”, “What part are you talking about?”, “What part number are you talking about?”, those things require you to have technology that understands the business graph and the dictionary of all the objects and the sources of information in your company. Our strength in data processing gives us some technology that we’re going to be talking about next week around something we call Knowledge Catalog, think of it as as your global dictionary for all information within the company, that’s a unique strength. And then obviously you don’t want information that’s critical to your company exposed on the Internet, you don’t want your model to get attacked because now it’s handling very complex process flows, you don’t want it hijacked, and so all the anxiety around cyber, we have very specific tools on, so our differentiation is all these pieces working together. That makes sense, the integration is a big part of your pitch. At the same time, you’re also a big, sprawling company and I think there’s maybe a perception, that I maybe hold, that some of the frontier labs are much more focused, they’re much more top-down about, “This is how our harness is going to work, the way it’s going to use tooling”, and all the things you’re talking about having this feedback flow back in sounds great unless there’s so many different takes on the way it should work and then you have your own internal customers as well. How do you balance having a point of view versus getting stuck in the muck? TK: Every product that Google has is on the same Gemini version, on the same day, on the same hour, every one of us is using the same harness. And you feel good that that harness is where it needs to be — it’s not getting pulled in 50 million directions thanks to all your customers and Google’s workloads? TK: Absolutely not, we are very focused on working with Demis and [DeepMind CTO] Koray [Kavukcuoglu] who lead our team to make sure they see the sophistication of these scenarios and we work literally side-by-side, hour-to-hour with them. There’s been a lot of speculation on are we distracted the company… I don’t think you’re distracted, I think it’s more just a matter of it’s a classic big company versus small company bit. Like a startup comes in and you have a very clear point of view and you don’t have all the enterprise stuff, you don’t have all this protecting the data, or permissions and all those structures, and yet that stuff sort of gets pulled along because there’s such demand to use your product that works really well and then over here it’s like, “Hey, we have everything protected and we have all these things around it”, but does the core product actually deliver? TK: The core product is being used by lots of people. The proof of that — we generate 16 billion tokens a minute, up from 10 just last December or January. Well, your financial results certainly showed that as well. There’s a bit where you’re doing so well, I have to be a little hard on you here. TK: A lot of people told us we were dead in 2023 — we’re still living. I think you’re doing more than living, you’re doing very well. TK: And so we never say anything negative about anybody else, our results prove for themselves. I always say, let our customers tell the story, they’re doing amazing things with Gemini in companies, enterprise, and they see the value of what we’re delivering for them. You mentioned that everyone in Google is on the same version of Gemini, using the same harness. Does that also apply to all this infrastructure around agents you’re doing, around sort of identity and security? TK: Yeah, in the enterprise, the way that all the infrastructure works is we have configurable mechanisms. Like for example, when you configure an agent, a very simple thing is you want to configure the agent with a different identity from a person, just a very simple example so that you can track, “Who did this transaction? Was it the human or the agent?, because there’s issues like liability. You may want to revoke permissions for the agent at a certain point in time, you want to allow it to only do certain tasks and not everything that the human does so there are controls you want to put around an individual agent and a collection of things that’s separate from the person. As we bring agents to consumers as part of our Gemini app, very similar concepts want to be exposed, and so the architecture that we use allows us to have those things. The sources of that may be different. In the consumer world, they may use the Google login account, in the enterprise world, they may use a directory to store it, but that’s just an abstraction of our technology to the rest of the world. We’ve been talking a lot about Gemini agents and the whole Gemini platform, but you also have just the broader Google Cloud platform. One of your major tenants is a company I was just sort of referring obliquely to, which is Anthropic, they’re doing a lot of inference on TPUs in particular. If Anthropic wins deals at the expense of Gemini, is that still a win? TK: We sell different parts of our stack. One of the things people don’t realize is we monetize many different parts of the stack in different ways. Like Anthropic, there’s a lot of labs that use our stack — in fact, most of the large AI labs use our stack. So if somebody uses TPUs to either to train their model or to use it for inference, we’re monetizing that part of the stack, that gives us resources to then fund our R&D and other investments. Some of the labs use our TPU and our Gemini model, others may use our TPU and then buy our cybersecurity protection for their models. So as a platform player, we have to allow our technology to be monetized in as many ways as possible and we don’t see it as a zero sum. Sometimes, though, if you have the SaaS layer and the platform layer and the infrastructure, is there one that is the most important? On one hand, SaaS has the highest margins, it kind of decreases going down. On the other hand, that infrastructure needs to be used, you’re spending a lot of money on it, you want full utilization. How do you think about that in terms of what’s the most important? I know they’re all important, but how do you think about that tradeoff? TK: If we were making TPUs just for ourselves, we would have lower volume than we do as a general purpose TPU supplier, which means there would be times of day that we would not be using those TPUs. Do you follow me? Like if you think how chat systems work, they’re very diurnal in nature, because you ask questions when you’re awake and we have a great search business and we have a great Gemini app business, but there would be a certain diurnalty to it during the daytime, there’d be a lot of questions, what about in the evening? Because we sell TPUs in the market, we’re able to offer it at spot to the rest of the world because we have such a large business. We’re able to also get manufacturing, better terms with suppliers and other things because of a real volume player, and that in turn lowers our cost of goods sold. So there are many more dynamics. The company is very focused on ensuring we win every part of this, not just one part of it. Gemini is obviously a super important initiative for us, and you’ll see the big announcements are around— For sure, it’s almost all Gemini. TK: But I wouldn’t assume that if we do that, the only way to do that is to offer our chips along with our model. We see a strong business offering our chips to many other people and you’ll see all of this is what’s accelerating our differentiation, and you see it in our financial results. Your financials are incredible, your revenues up, margins are up hugely, I’ve been posting that chart of them for a long time, last quarter was amazing . I do have to ask about TPUs, though. You talk about selling our TPU chips, to date that has meant TPU instances on GCP, but now there’s talk about actually selling TPU chips, what’s the status of that? What’s the official word, can I go buy a TPU? TK: I’ll explain a little bit what we see. So let me talk briefly about what the announcements we’re making, what the product is being used for, and then how we bring some of it to market. TK: We’re introducing two big new TPUs next week. One is TPU 8t, which “t” stands for training, it’s more optimized for training, think of it as 9,600 TPU chips, a single pod, as we call it, it has three times better performance than the current generation, which is already the leading one in the market. Then there’s 8i, which is “i” for inference, it’s 1,152 chips, three times the SRAM, and it has a new thing called the Collectives Engine, which gives you super efficient calculation performance for inference. Now, along with that, we are introducing Nvidia VR200, we’re also introducing more ARM capability for classical compute, because people who use models increasingly need to spin up a VM in order to do tasks, and that VMs we see interest in. We’re introducing not just new compute families, but also new storage, there are two new storage offerings. There’s one, the fastest Lustre solution in the market, it’s 10 terabits per second, that’s just to give you a sense, it’s like five times number two. We’re also introducing a new thing for ultra low latency — when you do inference, you want super low latency in accessing storage, we call it Rapid Storage, it can give you 15 terabits per second with ultra low latency, like microsecond latency. So why are we introducing all this stuff? TPUs, definitely a big market is the AI labs, but we’re seeing interest from new segments of the market. So a big new segment is financial services and when I say financial services, capital markets, and the reason is that today, if you’re a trading firm, a capital markets firm, you spend a lot of time running algorithmic trading and algorithmic trading is running numerical algorithms on traditional Intel type cores, x86 cores. Now what they find is that models can do inferencing and the inference performance is actually better than traditional numerical computing. So that’s one new segment, the second segment is high performance compute. We see a ton of people wanting to do energy modeling, computational fluid dynamics, solid state, there’s a whole bunch of parameters there too. What’s interesting about those is, you will see at our event, Citadel Securities for example, talk in the keynote about how they’re using TPU. Citadel, as you know, is a large capital markets firm. Department of Energy, they have a mission called Genesis , which is the new national lab mission on changing the energy infrastructure for the United States. There’s a big Brazilian largest utility in Brazil, Axia, all of them are examples of people who are part of just the keynote talking about how they use TPUs. When we look at that, there’s a couple of different things we see. Capital markets firms say, “Hey, if we’re going to replace our algorithmic trading solution, you have to bring TPU to where the venue is”. Right, because they care about the latency of going to a data center, that’s why they’re all New Jersey. TK: Secondly, if you’re a national lab, you have so much data you’ve collected over the last X number of years with your experiments — saying you have to bring all that data to the cloud to reason on it doesn’t make sense, so you will see us putting TPU in other people’s venues, and when we do that, we’re introducing new ways of people also procuring it. When I say procuring it, you buy it as a system, you don’t have to buy it just as a cloud source. How does this new way of selling, which is almost like a third way, so you have in Google’s data centers, you have bringing TPUs to customers, but then you have a deal like last week where between Anthropic and Broadcom and Google, this is going in their data centers. There’s these sort of renegade data centers that have access to power, maybe they were doing Bitcoin or whatever it might be, there’s been a big push to get TPUs into those. Where does that fit into this? TK: I would not assume everything you read in the press is true. Well, the Anthropic announcement was definitely a a big announcement. TK: Just to be honest with you, we have a flavor that runs in the cloud and a flavor that runs in third-party data center. The technology, the machines are identical. My question here is, where is that coming from? Is that part of your TSMC allocation? Is that Broadcom’s? Because no one can get enough compute, so ultimately that goes all the way back to the root. TK: The chips are all part of our global — TPU is a Google chip, as you know. So it’s part of global allocation, Broadcom partner who manufactures the TPUs with us and so it’s just part of the overall business. The new thing we’re talking about is just that you can run TPU in other venues. Makes sense. Will we ever have enough compute? Last year you said, “I think we’re going to resolve it shortly”, it doesn’t seem very resolved, what’s the status there? TK: We’ve worked super hard as an organization, our team that’s done our compute infrastructure, our global data centers, machines, all that, they’ve done an amazing job, there’s always a shortage, there’s never enough. But it doesn’t mean that we’re not — we would not be growing at the rate we are if we didn’t have enough compute. And so there’s more that we want, but there’s also the reality of our teams have done an amazing job, and our customers who are using it will tell you they’re seeing the benefits of the hard work our teams have done. There’s potential customers in the market, maybe current customers, who may be willing to pay basically any price for compute at this point. How do you think about the short term, “Wow we can actually just make a lot of money right now”, versus, “We need to invest in our products” — you had Microsoft, who I’m not going to ask you to comment on, but last quarter they’re like, “Yeah, we allocated less to Azure because we had our own internal workloads”. These are real trade-offs that you need to think about, how do you think about that in terms of GCP? TK: We run a balanced portfolio, we want to grow different parts of our business, we sit down as an executive team and also with Sundar and work through how we’re going to balance the different parts of our portfolio. We see, broad brush, three to four buckets of things. One bucket of things is where we want to grow Gemini as a business, our core Gemini business is doing super well, 16 billion tokens a minute, up 40% since last quarter, even this product called Gemini Enterprise , which is our core agent platform, has grown 40% sequentially quarter-over-quarter. So that part of the business, we’re committed to making it super successful, it’s a priority for us. Second segment of the business is where Gemini is being used inside of some of our core products, so I’ll give you an example. We’ve introduced Gemini inside our threat intelligence tools. Why is that? Because we have real expertise at Google scanning the dark web to identify threats, the problem is there’s so many of them, an average organization doesn’t know which of those many threats apply to them. So we use Gemini to process and prioritize which threats might affect you, it’s 98% accurate and has processed 3.9 million threats in the last year, so that’s an example of Gemini being used as an embedded capability. Right. The whole SaaS, PaaS, IaaS — the SaaS bit is still important. TK: There’s that capability, there’s people who want to use Gemini to reason on data in our analytics infrastructure so there’s a second big set where Gemini is an embedded capability and that in turn depends on chips and TPUs and GPUs. And the third one is offering our compute platform to people. We balance across those because we want all of them to be successful by bringing hardware or out machines to other people’s venues. We’re broadening our TAM, total addressable market, in that part of the business also we see a different cash flow model than if you were putting CapEx so there’s a lot of different parameters we have to balance. All those ones you listed for you to make trade-offs on, but then you also have to get in a meeting with Sundar and the other leaders of Google to make trade-offs with DeepMind and their R&D and with the consumer products. What are those meetings like? TK: We have a regular set of cadence of meetings and we balance the different priorities and we want to be successful on many different dimensions. I wouldn’t assume all of these dimensions are zero sum. Like, for example, when we offer our product in other venues, we drive cash flow in a different way than putting CapEx — so to some extent, that changes the boundary of how we offer our capital boundary as a company also. So I think there’s a general view of there’s a compute shortage, and if you give one, you will have to take from another, I think that’s an overly simplistic view of it, having been in this for long enough and having been, my team does both parts. We are responsible for delivering all the infrastructure for Alphabet, and they’ve done an amazing job doing that, and I’m also responsible for running the cloud business, and you can tell that our differentiation, I come back to this, it would be a different problem if you didn’t have demand. You can, and whenever I ask us to prove that you’ve got demand, I always say, “Look at our results”. Well that’s been the biggest change even since January where there was still some sort of latent skepticism about, “Is all this CapEx worth it?”, feels like those questions have been completely erased at this point. Speaking of markets in the last couple months, all these SaaS companies are getting killed in the market, you have a big SaaS business, you’re definitely not getting killed in the market, why are you escaping it? TK: I think we have transitioned. The core fundamentals is finding, and this is the way we approach our product portfolio, I’ll give you a very simple example — 2023, we said, “Hey, at 2022, we said, we’re not just going to build a secure cloud, we’re also going to start offering cybersecurity products”. When we entered the market and then we looked at what other things people — the value of cyber is driven by two dimensions. Dimension one, “What is it protecting?”, because it has to protect high value things, and the other element is, “How good is it at protecting?”, “What’s the technology that it’s going to use to protect?”. So we said, “There are only two valuable places to protect, there’s either the endpoint”, which is your desktop on which apps run, other people are doing a good job there, the rest of the world is moving all their applications and data to the cloud, let’s protect that. Second, we said AI is going to find vulnerabilities because at the end of the day, finding vulnerabilities is a question of a model really understanding code, and if you can find vulnerabilities at a much more accelerated rate, people need to fix vulnerabilities at an incredibly aggressive, fast rate, and so we started a set of work back then and we said to ensure that we have the leading product portfolio, let’s acquire Wiz. We’re now working on, you’ll see a number of announcements, there’s the Threat Intelligence Agent that allows us to you know understand the threat landscape and use Gemini to prioritize what you should pay attention to where a lot of people are using Gemini to actually scan their code, and then we’re introducing three new Gemini-powered agents with Wiz , one called Red Agent — think of it as continuous red-teaming of your infrastructure, a Blue Agent that says, “Okay, I looked at what’s happening with the Red team and I know what you need to go fix”, and a Green Agent that says, “I’ll fix it for you”, and that’s going to cut the cycle time. Like our Threat Intelligence Agent, you will see reference customers from Chicago Mercantile Exchange, there’s a whole bunch of them talking next week, about how it takes an investigation that just take 30 minutes and does it in 30 seconds, that allows you to get response. Now, this is an example of when we started, people said, “Why would a hyperscaler want to become a cyber company?”, and we were like, “It’s not about being a hyperscaler, it’s about solving that problem at the intersection of — AI is going to accelerate cyber threats and you cannot do repair the old way”. Yep, it really answers the question that people had when you acquired Wiz, which is, “ Why do you need to buy it , why can’t you just build it?”. It’s like, “Well, in two years, it’s going to be too late”. That’s, I think, also felt very tangibly right now. TK: Today, we are where we are because we made that bet. TK: So when people ask, “Why are you guys growing even in sectors that may be struggling?”, it’s because we have differentiation and we made those decisions early. That makes sense. One of the interesting product announcements this year is this cross-cloud lakehouse which lets customers leave their data in AWS and Azure while still being query-able by by your services instantly. Is this the final admission that even if enterprises love your AI and love Gemini, they’re not going to shift all their workloads if they’re already on other clouds? Lots of your products have been about that in the past — even Wiz is about that to a certain exten — but is that just the reality? There’s not going to be a huge amount of spillover as far as pulling things from other clouds to Google. TK: If you use BigQuery today, you don’t have to move your transactional applications to BigQuery. If you’re using Gemini today, you can keep your applications in another cloud and use Gemini to reason on it. The problem we were trying to solve is a very specific problem. Today, when people talk about lakehouses, they say, “We have a multi-cloud lakehouse”. What they really mean is their lakehouse can be run on any cloud, but when it’s running on a particular cloud, you can only access the data in that cloud. And then people say, “That’s crazy, because I’ve got data in a SaaS app like Salesforce”, “I’ve got data in an ERP system”, “I’ve got data in Azure and Amazon, and I’d like to use analysis across all this”, one choice to customers is copy all that data out, that’s expensive for them because of the egress tax that everybody imposes. So we said, “Keep your data there, we can still give you world-class analysis”, and so it’s solving that custody. The customer has a problem, they want to do analysis, there are four things we’re giving them. Keep your data where it is, no matter how many clouds. We’re not talking about a single cloud lakehouse, we’re talking about across all the clouds and across all your SaaS apps, we can do analysis, one. Two, people said, “How fast can you run?”, the proof that we’re going to show is we’re 2x better in price performance than the market leader, right out of the gate. The third one, people said, “I’m not an expert on writing Python and Spark, can you give me essentially vibe coding for Python and Spark?” — yes, you’ll see us introduce a agent manager to generate Python and Spark code using Gemini. And then the last one people said today, Ben, if you ask a question, I was using that example of field service, I’m running a query on, “How much inventory do I have in parts?”, before I send the technician — that information sits inside an application in a set of tables in a database, most organizations have thousands of databases, teaching the model which system has what information, and the notion of part is split across 10 different tables in this particular database, you need a system that builds that semantic graph of all the information in your company. Right, this is the Knowledge Catalog . TK: That’s the catalog, and that gives you super good accuracy when you’re researching information. So we put all this together and back to, we’ve always been super pragmatic. I always say enterprises have certain problems that they see independent of a cloud. For example, security — they don’t want to buy three different security tools from three different hyperscalers. Analytics — they don’t want to buy three different analytic tools from three different hyperscalers. Others have chosen to say, “My stuff only works with my cloud”, that’s why enterprises often choose us, because we work across all the clouds and all the security environments you have and you can keep stuff wherever you are and use Gemini to access and automate stuff for you, so all that is just part of listening to customers. This all makes perfect sense, particularly this bit about the Knowledge Catalog definitely fits how I’ve been thinking. I wrote about this a few years ago about this importance of this whole layer and understanding it, it’s a bit of a big lift to get this in place. You have some sort of analog, say, with like a Palantir that’s putting in like their ontology thing . They have FDEs out on the site, multi-month projects doing this. You have OpenAI talking about Frontier , their agent layer, and they’re partnering with all the tech consultancies to build this out. Is this going to entail a lot of boots on the ground to get this graph working and functional in a way that your agents can operate effectively across it? TK: We’re not competing with Palantir, we’re not building a semantic dictionary or an ontology. What we’re doing is, today I’ll give you the closest analogy. TK: Today when you use a model, let’s say you use Gemini, and you ask a question, Gemini goes through reasoning, and then it shows you a citation. A citation is, “How did I answer the question and what’s the source I derived from?” Now imagine that citation was a query that needed to go to a folder in, for example, a storage system because there’s some documents there and a database because, for example, in a part number, just think about there’s a part number document that lists all the part numbers and sits in a drive and then that part number you need to fetch out to say it’s the modem that the guy is coming to repair, and that’s mapped to a table in a database. So what the graph does, we use Gemini, so we don’t need humans, we use Gemini to say, “Hey, go and read all these documents in these drives and extract the information from it and then match that to the database table that has the reference to the part number”, and so then when Gemini turns around and says, “I got this query about how much inventory of modems they are”, the first thing it does is it says, “Okay, go to the Knowledge Catalog and it says modem is part number one, two, three, four, five”, and then it says, “By the way the table in the database that has the inventory information about this part number is this table, here’s a SQL”, it then makes the quality of what we generate higher and then when it answers the question it shows back — back to your, “Trust my data”, it shows a grounding citation saying, “That’s where we got it from.” What do you need from everyone in the ecosystem if this is going to work, all these SaaS applications and across all these entities, not just what’s in your databases, but what’s in a SAP database or whatever it might be. How do you get them on board so you can understand their data and build this Knowledge Catalog? TK: Really easy, the first thing is to use the lakehouse we support a standard format, industry is very standardized on it, it’s called Iceberg , so anybody who supports Iceberg we can talk to it and so that’s pretty much the whole world right now, so we don’t need them to do anything special to make it work. Second, all of these business systems have API specifications, and our Catalog can learn off of those API specifications, we just teach Gemini to process those, and so we can build a catalog pretty quickly. There are reports that OpenAI on Amazon Bedrock has been massively popular. Are we going to get OpenAI on Vertex? TK: We would love to have them. We are announcing a variety of third-party models on Vertex, including Anthropic, including open source, we’re open to any model provider on Vertex. I believe you. That’s going to be great, when and if it happens. Just one last question. We’ve talked in this interview series previously about how I think, and this is before your time, it’s not your fault, that Google Cloud missed the boat in terms of being a point of integration for the Silicon Valley enterprise ecosystem. I think last year I asked you if AI represented a new opportunity to do that. However, is there a bit where the models, and you’re in this game because you have one of the leading models, is just going to eat everything and is going to gradually expand to do the jobs and everyone else is just going to be a system of record? It’s going to be all one interface, that the integration, such that it is, is all under the surface, it’s not necessarily tying things together in user space. Is Gemini going to be all the user needs in the long run? TK: We don’t see it that way. In fact, one announcement you’ll see us make next week is how many third-party SaaS and ISV [independent software vendors] vendors are embedding Gemini not just as a model, but as an agent platform, because they want to build agents and our agent platform, you can use to build agents, not just our own agents, but they can use it and there’s a lot of independent software vendors embedding those agents. And do they see you as like, “Hey, you’re another established guy, let’s go with you because we don’t know what these other folks are up to, they want to eat all of us”? TK: It’s also the capabilities. The differentiation, I would say, is just think about you’re a bank or an insurance company, and think about you’re a SaaS vendor selling to them or an independent software vendor, there’s a number of things around identity, policy management. For example, if you’re a bank and you have documentation about a person and their credit, you cannot have that egress the bank’s boundary, so we have a gateway that protects against that, that’s part of our agent platform. You want to have auditability on the agent to say which agent did what task on what system when, that’s built into the platform. You want to have a registry where you expose all your skills so that people are not duplicate building all these things, we have a registry that does that. This is sort of the bit we started with at the beginning, it’s not just going to benefit your agents it’s going to benefit all agents, that’s sort of the pitch. TK: So one of the things that people like is the fact that we built all that plumbing for them, and so they don’t have to invest in it, they can focus on the value add that they have on their agent side. Additionally, for companies in this broader ecosystem, the cost of agents — and it becomes part of their bill of materials, if you will, the cost of goods sold — the fact that we have these super efficient chips that run inference with such efficiency eventually translates into cost efficiency for a third party that’s building on top of us. You can see that all of those benefits, we’re taking away all that complexity for these guys, so we definitely don’t see that all the ecosystem is going to die, we definitely don’t see that, we see us facilitating that ecosystem. You’ll see us announcing a number of things, including a substantial investment in dollars to accelerate the partner ecosystem around our platform. Thomas Kurian, great to talk to you again. TK: Thanks so much, Ben. And just in closing, the work that we announce every year at Next is a testament to all those customers and partners who gave us a shot to work with them. You’ll see them telling their story, and it’s a testament to all those people at our organization that made a bet to solve a technical problem a different way, or to bring our technology — we’ve hugely expanded our go-to-market organization, and doing all that with growing top line and operating income at the same time is a testament to the demand we see for our products and services. I mean, six, seven years ago, people used to tell us, “You have no shot in the market”, I think we are now truly uniquely positioned. Name one other player that has the stack of technology to do AI, when I look forward, I think there’s no question in people’s minds that the central problem that companies need to solve and technology providers need to solve is how good is the capability you offer for AI. We’re the only ones with chips, models, the context to feed the models from all of the data infrastructure, the cyber tools, and then a world-class agent platform. I would also add, you’re actually an enterprise company now. The things you talked about, pragmatism, listening to customers, all these pieces, GCP did not have at all a decade ago — there’s a bit where Wiz was ahead of its time, for sure, being forward-looking, but there’s a bit where the organization is ready for this moment in a way I don’t think it would have been previously. I find it very impressive. TK: We are very proud of the team. Also for Alphabet, to do AI well, you have to do a couple of things. One, see the breadth of problems that we see, we see all of the consumer problems, we see the enterprise problems, we see the problems that search sees, we see the problems that YouTube needs, we see all those that we’re solving with AI, that gives us a breadth of capability that the model needs to solve, that over time is a real strength because the diversity of problems we’re solving. Second, in order to do AI well, you have to invest, and in order to invest, you need to monetize in as many different ways as possible. I think we are very confident that our team, we do not have any hubris, but we are confident in where we stand. I think it’s very impressive. I look forward to your keynote. TK: Thanks so much Ben, it’s a privilege to talk to you every year and it’s great that you took the time to speak with me. And it’s all recorded, I can promise you that! This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day!

0 views
Manuel Moreale 1 months ago

Spending hard caps

I was catching up with some tech news yesterday and every time I read one of these “I woke up with a USD 18k bill in my Cloud account” articles, I am reminded about how fucking stupid—and predatory—this whole industry can be. The ability to set hard spending caps should be required by law. I think that’s another issue the EU should decide to tackle at some point. If I know I have a budget available, there should be an option for me to configure your service so that you don’t allow me to spend more than that. And if my product or site goes down as a result of that, it’s a choice I get to make. But the reason why hard caps are usually not an option is obvious: companies get to make more money this way. Hurray for capitalism! The sad part is seeing allegedly smart people arguing that no, the actual reason is that it’s a complex problem to solve, and no-one has figured out how to do it yet. An excuse so pathetic that it’s not even worth getting mad about it. There are people discussing plans to build moon bases, put servers in orbit, build digital gods, and yet setting a hard cap on billing is a complex problem to solve. Sure, I believe that. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
Stratechery 1 months ago

Mythos, Muse, and the Opportunity Cost of Compute

Listen to this post : In January 2025, Doug O’Laughlin at Fabricated Knowledge declared that o1 and reasoning models marked the end of Aggregation Theory: I believe that there is no practical limit to the improvements of models other than economics, and I think that will be the real constraint in the future. It is reasonable that if we spent infinite dollars on a model, it would be improved. The problem is whether infinite dollars would make sense for a business. That is going to be the key question for 2025. How do the economics of AI make this work? One of the core assumptions about the internet has just been broken. Marginal costs now exist again, meaning that most hyperscalers will become increasingly capital-intensive. The era of Aggregation Theory is behind us, and AI is again making technology expensive. This relation of increased cost from increased consumption is anti-internet era thinking. And this will be the big problem that will be reckoned with this year. Hyperscaler’s business models are mainly underpinned by the marginal cost being zero. So, as long as you set up the infrastructure and fill an internet-scale product with users, you can make money. This era will soon be over, and the future will be much weirder and more compute-intensive. Looking back on the 2010s, we will probably consider them a naive time in the long arc of technology. One of our fundamental assumptions about this period is unraveling. This will be the single most significant change in the technology landscape going forward. Aggregation Theory was, if I may say so myself, the single best way to understand the 2010s, particularly consumer tech. It explained the dynamics undergirding Google and Facebook’s dominance, as well as the App Store and Amazon’s e-commerce business; it was also a useful ( albeit incomplete ) framework to understand an entire host of consumer services like Uber, Airbnb, and Netflix. It’s worth pointing out, however, that some of the critical insights undergirding Aggregation Theory are much older, and are embedded in the fundamental nature of tech itself. They are, as O’Laughlin notes, rooted in the concept of zero marginal costs. Marginal costs are how much it costs to make one more unit of a good. Consider a widget-making factory: Land and machines are clearly fixed costs; you have to have both to get started, and you are paying for both whether or not you make one more widget. Raw material, on the other hand, is clearly a marginal cost: if you make one more widget, you need one more widget’s worth of raw material. When it comes to physical goods, electricity and humans are also marginal costs: you need more or fewer of them depending on whether you make more or fewer widgets. Where marginal costs matter is that they provide a price floor. Companies will operate unprofitably because profit and loss is an accounting concept that incorporates depreciation, i.e. your fixed costs. For example, imagine that a company spent $1,000 on a factory to make widgets that have a marginal cost of $10: as long as the price of widgets is >$10 the company will make them even if they don’t earn enough money to cover their depreciation costs (i.e. they operate at a loss) because at least they are still making a marginal profit on each widget (what the company may not do is invest in any more fixed costs, and, eventually, will probably go bankrupt from interest on the debt that likely financed those fixed costs). I explain all of this precisely because it’s almost completely immaterial to tech. First, there generally are no raw material costs, because the outputs are digital. Second, because there are no raw material costs, and because the fixed costs are so large, electricity and humans are generally treated as fixed costs, not marginal costs: of course you will run your servers all of the time and at full capacity, because every scrap of additional revenue you can generate is worth it. AI very much fits in this paradigm: the output is digital, and while AI chips use a lot of electricity, the cost is a fraction of the cost of the chips themselves, which is to say that no one with AI chips is making marginal cost calculations in terms of utilizing them. They’re going to be used! Rather, the decision that matters is what they will be used for. Consider Microsoft: last quarter the company missed the Street’s Azure growth expectations not because there wasn’t demand, but because the company decided to use its capacity for its own products. CFO Amy Hood said on the company’s earnings call : I think it’s probably better to think about the Azure guidance that we give as an allocated capacity guide about what we can deliver in Azure revenue. Because as we spend the capital and put GPUs specifically, it applies to CPUs, the GPUs more specifically, we’re really making long-term decisions. And the first thing we’re doing is solving for the increased usage in sales and the accelerating pace of M365 Copilot as well as GitHub Copilot, our first-party apps. Then we make sure we’re investing in the long-term nature of R&D and product innovation. And much of the acceleration that I think you’ve seen from us and products over the past a bit is coming because we are allocating GPUs and capacity to many of the talented AI people we’ve been hiring over the past years. Then, when you end up, is that, you end up with the remainder going towards serving the Azure capacity that continues to grow in terms of demand. And a way to think about it, because I think, I get asked this question sometimes, is if I had taken the GPUs that just came online in Q1 and Q2 in terms of GPUs and allocated them all to Azure, the KPI would have been over 40. And I think the most important thing to realize is that this is about investing in all the layers of the stack that benefit customers. And I think that’s hopefully helpful in terms of thinking about capital growth, it shows in every piece, it shows in revenue growth across the business and shows as OpEx growth as we invest in our people. The cost that Microsoft is contending with here is not marginal cost, but rather opportunity cost: compute spent in one area cannot be used in another area; in the case of these earnings, Microsoft was admitting that they could have made their Azure number if they wanted to, but chose to prioritize their own workloads because, as CEO Satya Nadella noted later in the call, those have higher gross margin profiles and higher lifetime value. It’s opportunity costs, not marginal costs, that are the challenge facing hyperscalers. How much compute should go to customers, and which ones? How much should be reserved for internal workloads? Microsoft needs to balance Azure — both for its enterprise customers and OpenAI — and its software business; Amazon needs to balance its e-commerce business, AWS, and its strategic investments in both Anthropic and OpenAI. Google has to balance GCP, its own strategic investment in Anthropic, and its consumer businesses. Last week Anthropic released announced Mythos, its most advanced model. And, in somewhat typical Anthropic fashion, it did so by focusing on its dangers; from the introductory post for Project Glasswing , the company’s initiative for leveraging Mythos to address security: We formed Project Glasswing because of capabilities we’ve observed in a new frontier model trained by Anthropic that we believe could reshape cybersecurity. Claude Mythos Preview is a general-purpose, unreleased frontier model that reveals a stark fact: AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities. Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes. In an Update last week I analogized Anthropic’s “disaster-porn-as-marketing-tool” approach to The Boy Who Cried Wolf ; what’s important about that analogy is not just that the boy raised false alarms, but also that, in the end, the wolf did come. To that end, I wrote two weeks ago about the myriad of security issues that underpin all software, and my optimism that AI would solve these issues in the long run, even if it made things much worse in the short run. In other words, it’s actually not important whether or not Mythos represents a major security threat: if this model doesn’t, a future model will; to that end, I do support leveraging Mythos to proactively find and fix bugs before bad actors can find and exploit them. At the same time, it’s also worth noting that there are other reasons for Anthropic to not make Mythos widely available, limiting access to a finite number of companies with a high capacity and willingness to pay. The first are those opportunity costs: Anthropic is already short on compute serving its current models; X was overrun with complaints and debates this weekend about Anthropic allegedly dumbing down Claude over the last month or so . Making Mythos more widely available — particularly to subscription plans that don’t pay per usage — would make the situation much worse. In other words, Anthropic isn’t facing a marginal cost problem, but an opportunity cost problem: where to allocate its compute. Of course this could become a margin problem: I suspect that Anthropic is going to overcome its conservatism in terms of compute by acquiring more compute from hyperscalers and neoclouds, and paying dearly for the privilege. The key to handling those costs will be to charge more for Claude going forward; that, by extension, means maintaining pricing power, which leads to a second benefit of not releasing Mythos broadly. Anthropic certainly faces competition from OpenAI; for both frontier labs, however, the real competition in the long run are open source models. Right now those primarily come from China, and a key ingredient in fast-following frontier models is distillation; from Anthropic’s blog : We have identified industrial-scale campaigns by three AI laboratories—DeepSeek, Moonshot, and MiniMax—to illicitly extract Claude’s capabilities to improve their own models. These labs generated over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts, in violation of our terms of service and regional access restrictions. These labs used a technique called “distillation,” which involves training a less capable model on the outputs of a stronger one. Distillation is a widely used and legitimate training method. For example, frontier AI labs routinely distill their own models to create smaller, cheaper versions for their customers. But distillation can also be used for illicit purposes: competitors can use it to acquire powerful capabilities from other labs in a fraction of the time, and at a fraction of the cost, that it would take to develop them independently. I absolutely believe this is a real problem, and wrote as much when DeepSeek R1 was released last year . I also think it’s in the interest of everyone other than the frontier labs to pretend that it isn’t; open source models are not subject to the frontier labs’ markup or compute constraints, which is exactly why it benefits most companies to have them available, whether or not they are distilled. Of course that doesn’t mean they are free to run: you still need to provide the compute. Notice, however, how that makes stopping distillation even more of a priority for the frontier labs: first, they want to protect their margins. Second, however, their biggest cost is opportunity cost: the customers they can’t serve because they don’t have enough compute. To the extent they can make compute less useful for their potential customers — by stopping open source models from distilling their models — is the extent to which they can acquire that compute for themselves at more favorable rates. Mythos wasn’t the only new model announced last week: Meta released the first fruit of their new frontier lab as well. From the company’s blog post : Today, we’re excited to introduce Muse Spark, the first in the Muse family of models developed by Meta Superintelligence Labs. Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration. Muse Spark is the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts. To support further scaling, we are making strategic investments across the entire stack — from research and model training to infrastructure, including the Hyperion data center… Muse Spark offers competitive performance in multimodal perception, reasoning, health, and agentic tasks. We continue to invest in areas with current performance gaps, such as long-horizon agentic systems and coding workflows. Muse Spark isn’t state of the art, but it’s in the game, and overall a positive first impression from Meta Superintelligence Labs. What is most notable to me, however, is the extent to which the last nine months of AI have made clear that CEO Mark Zuckerberg made the right call to embark on that “ground-up overhaul of [Meta’s] AI efforts”. The trigger for O’Laughlin’s post that I opened this Article with was reasoning, where models using more tokens led to better answers; since then agents have exponentially increased token demand , as they can use LLMs continuously without a human in the loop. This is a huge driver in sky-rocketing demand for Claude, as well as OpenAI’s Codex. Moreover, this use case is so potentially profitable that not only is Anthropic’s revenue sky-rocketing, but OpenAI is pivoting its focus to enterprise. Indeed, you can make the argument that one of OpenAI’s biggest challenges is the fact it has such a popular consumer product in ChatGPT. I, with my Aggregation Theory lens, have long maintained that that userbase was a big advantage for OpenAI, but that assumed that the company could effectively monetize it, which is why I have argued so vociferously for an advertising model . OpenAI has big projections for exactly that, but until that materializes, that big consumer base is a big opportunity cost in terms of OpenAI’s focus and compute. The company has, to its credit and in the face of widespread skepticism, made significant investments in more compute, but the temptation to allocate more and more compute to agentic use cases that enterprises will pay for, even at the expense of the consumer business, will be very large. This puts Meta in a unique position relative to everyone else in the industry: unlike any of the hyperscalers or the frontier labs, Meta does not have an enterprise or cloud business to worry about. That means that serving the consumer market comes with no opportunity costs. Of course those opportunity costs would be much smaller anyways, given that Meta already has an at-scale advertising business to monetize usage. In other words, Meta may actually face less competition in winning the consumer space than it might have seemed a few months ago, simply because that is their primary focus — and because they have their own model, which means they don’t need to worry about not having access to the frontier labs (much of this analysis applies to Google, of course). This, by the same token, is why Meta should open source Muse, just like they did Llama. The entities that will be the most hurt by widespread availability of a frontier model are other frontier labs, who will see their pricing power reduced and face increased competition for compute. This will make it even harder for them to bear the opportunity cost of pursuing the consumer market, leaving it for Meta. So is “the era of Aggregation Theory…behind us”? On one hand, the insight that the way to create and maintain value will come from owning the customer is almost certainly going to continue to be the case. On the consumer side owning customers leads to advertising which provides the revenue to provide services to customers. On the enterprise side — which, I would note, has never been an arena where Aggregation Theory was meant to be applied — I think it’s likely that both Anthropic and OpenAI continue to move up the stack and deliver features that compete with software providers directly (an approach that is also in line with not making leading edge models publicly available). On the other hand, O’Laughlin’s observation that we are and will continue to be compute constrained is an important one: companies will not be able to assume they can serve everyone, because serving one set of customers imposes the opportunity cost of not serving another. This won’t, at least in theory, last forever: at some point AI will be “good enough” for enough use cases that there will be enough compute capacity to take advantage of the fact that there really aren’t meaningful marginal costs entailed in serving AI; that theoretical future, however, feels further away than ever. OpenAI is betting that this compute constraint — and the deals they have made to overcome it — will matter more than Anthropic’s current momentum with end users. From Bloomberg : OpenAI told investors this week that its early push to dramatically increase computing resources gives it a key advantage over Anthropic PBC at a moment when its longtime rival is gaining ground and mulling a potential public offering. The ChatGPT maker said it has outpaced Anthropic by “rapidly and consistently” adding computing capacity to support wider adoption of its software, according to a note the company sent to some of its investors after Anthropic announced a more powerful AI model called Mythos. The ambitious infrastructure build-out, criticized by some as too costly, has enabled OpenAI to better keep pace with rising demand for AI products, the memo states. I’m less certain that this will be dispositive. When it comes to AI, distribution and transaction costs are still free — the two preconditions for Aggregators — which means that the winners should be those with the most compelling products. Those products will win the most users, providing the money necessary to source the compute to serve them; consider Anthropic’s deal to secure a meaningful portion of TPU supply , which, given the capacity constraints at TSMC, is ultimately an example of taking supply from Google. I suspect that Anthropic can take more, including already built hyperscaler and neocloud capacity. Yes, that compute will be more expensive, but if demand is high enough the necessary cash flow will be there. In other words, my bet is that owning demand will ultimately trump owning supply, suggesting that the underlying principles of Aggregation Theory lives on. To put it another way, I think that OpenAI will need to win with better products, not just more compute; then again, if more compute is the key to better products, then does supply matter most? Regardless, they’ll certainly be focused on delivering both to the enterprise customers who are driving Anthropic’s astonishing growth. The real cost may be the consumer market they currently dominate, given that Meta has nothing to lose and everything to gain. You need land for the factory You need machines for the factory You need electricity to operate the machines You need humans to operate the machines You need the raw material for the widgets

0 views
iDiallo 1 months ago

Your AWS Certificate Makes You an AWS Salesman

I must have been the last developer still confused by the AWS interface. I knew how to access DynamoDB, that was the only tool I needed for my daily work. But everything else was a mystery. How do I access web hosting? If I needed a small server to host a static website, what service would I use? Searching for "web hosting" inside the AWS console yielded nothing. After digging through the web, I found the answer: an Elastic Cloud Compute instance, better known as EC2. I learned that I could use it under the "Free Tier." Amazon offers free tiers for many services, but figuring out the actual cost beyond that introductory period requires elaborate calculation tools. In fact, I’ve often seen independent developers build tools specifically to help people decipher AWS pricing If you want to use AWS effectively, it seems the only path is to get certified. Companies send employees to conferences and courses to learn the platform. I took some of those courses and they taught me how to navigate the interface and build very specific things. But that skill isn't transferrable. In the course, I wasn't exactly learning a new engineering skill. Instead, I was learning Amazon. Amazon has created a complex suite of tools that has become the industry standard. Hidden within its moat of confusion, we are trained to believe it is the only option. Its complexity justifies the high cost, and the Free Tier lures in new users who settle into the idea that this is just "the way" to do web development. When you are presented with a simple interface like DigitalOcean or Linode and a much cheaper price tag, you tend to think that something is missing. Surely, a cheaper, simpler service must lack half the features, right? The reality is, you don't need half the stuff AWS offers. Where other companies create tutorials to help you build, Amazon offers certificates. It is a powerful signal for enterprise legitimacy, but for most developers, it is overkill. This isn't to say AWS is "bad," but it obscures the reality of running a web service. It is much easier than it seems. There are hundreds of alternatives for hosting. You can run your services reliably on a VPS without ever breaking the bank. Most web programming is free , or at the very least, affordable.

0 views
Giles's blog 1 months ago

Writing an LLM from scratch, part 32j -- Interventions: trying to train a better model in the cloud

Since early February, I've been trying various interventions on a 163M-parameter GPT-2-style model that I trained from scratch on my local RTX 3090 , using code based on Sebastian Raschka 's book " Build a Large Language Model (from Scratch) ". My original model got a loss of 3.944 on my test set, while the original GPT-2 weights got 3.500 on the same dataset. I wanted to see if I could close that gap, and had a list of potential changes to the training setup, and to the model itself. Which of them would help? I found a list of solid-looking interventions, and in my last post I came to the conclusion that the improvements in loss I had seen with all of them -- with two possible exceptions -- seemed unlikely to be in the noise. What would happen if I tried to put them into a new model? Let's start by looking at the results that we have for the interventions so far -- this is the table I've been using as I go through them, but I've updated it to contain the loss figures for each model to six decimal places instead of three, and made each model name link to the associated post. I've also corrected the loss for the model, which was mistakenly using the training loss at the end of the run rather than the loss on the test set 1 . As I've mentioned before, simply moving to training in the cloud improved things markedly, getting loss down from 3.944 to 3.691526; I suspect this was due to having a closer-to-optimal batch size (more about that in my next post). What to do about the other interventions, though? It seemed clear that two of them were not helping: weight tying, and the one using the figure for weight decay that I'd (I suspect incorrectly) derived from a paper by Cerebras Research. The "no-AMP" run (which would be better described as "full-fat float32") had a small positive effect, but was so costly in terms of both time and money that it wasn't worthwhile. So we had five interventions to try: How would they stack up? It seemed pretty unlikely that their independent contributions would just sum up neatly so that we got a total improvement of 0.013209 + 0.022141 + 0.048586 + 0.050244 + 0.089609 = 0.223789 (though that would certainly be nice!). One question to consider was how independent they were. For any set of interventions, you can imagine them being independent and adding up nicely, or pulling in separate directions so that the combined effect is worse than the sum, or pulling in the same direction so that they amplify each other. My intuition was that gradient clipping and removing dropout were pretty independent, at least conceptually. They might affect other interventions indirectly (eg. via changing the training run's use of the random number generator) but they'd be unlikely to have a direct effect. QKV bias I was less sure about, but it seemed -- again, just intuitively -- at least reasonably independent of the others, with one important exception (which I'll get into below). By contrast, weight decay and the learning rate interact together quite strongly, at least in standard gradient descent, and I'd tested them in isolation. The result for changing the weight decay to 0.01 was based on a fixed learning rate of 0.0004, and the result for scheduling the learning rate was based on a weight decay of 0.1. That felt like an issue, and definitely needed some thought. Additionally, there were some issues with which interventions might have not had a real effect, and instead just been the results of the use of randomness. While my analysis of how that might have affected things was somewhat limited by the number of test runs I could afford to do, it did show up two plausible issues: After some thought, I came up with a plan. If I were doing this properly and scientifically, I suppose I'd try every combination of interventions, but that would be ruinously expensive 2 , so a sensible minimal set of training runs felt like this: When those completed, I'd find the test set loss for both models. I'd choose the best run, and then do another run with those settings, but with weight decay switched back to the original value of 0.1. I chose to revert weight decay rather than the learning rate stuff because this was the one I was least sure about -- the updated "GPT-2" value of 0.01 is very unusual by today's standards, and I'd come to it via a rather circuitous route -- see the post for more details. The best of the three runs would be the winning combination of interventions. Again, this was not an exhaustive plan 3 . But it seemed to make sense. Let's see how it turned out. Just to recap, this one had these interventions against the baseline: It did not have QKV bias. You can see the config here . Here's the loss chart over the course of the training run: As normal with learning rate scheduling, I also charted that to make sure it was doing the right thing (you can see that it was): And I also tracked the gradient norms -- you can see that there was some clipping happening near the start of the run: At the end of the run, it reported this: That's a slightly lower final train loss than normal, and it took 3h10m, which is faster than usual, but about the same as the other train we did without dropout -- that makes sense, as the process of zeroing out random activations isn't free. I downloaded the model -- here it is -- and then ran the smoke test: ...and got its loss on the test set: Not bad at all -- the best result we've had so far, albeit not quite up to the standard of the original GPT-2 weights. Now the next one, with QKV bias. This one had these interventions: You can see the config here . Here's the loss chart: ...the learning rate: ...the gradient norms (note that we had more clipping, about halfway through): ...and the final printout at the end. That final train loss is slightly higher, which is normally an indicator that the test loss will be higher, but we'll have to see. Time to download the model -- here it is -- and on to the smoke test: ...and then the moment of truth -- what was its loss on the test set? As I suspected from the training loss at the end, slightly worse than the run without QKV bias. So, that meant that we should do the next run, with a weight decay of 0.1, with no QKV bias. Given the above results, this one had these interventions vs the baseline: Weight decay was back to the baseline value of 0.1, rather than the value of 0.01 used in the previous two runs, and QKV bias was switched back off. You can see the config here . Here's the loss chart: You can see that it's much choppier than the previous two runs; that initially surprised me, as the higher weight decay means that we're regularising the model more than we were with those, which I thought would "calm things down". But on reflection, I had it backward. Hand-waving a bit, a more regularised model is fitting less closely every detail to the data it has seen, considering the typical stuff more than it does the outliers. That means that when something a bit more out-of-distribution appears, it might not have yet learned how to integrate it into its model of the world. Well, it sounds plausible, anyway :-) On to the learning rate (just to double-check), and it's fine: And again, the gradient norms: ...which similarly to the loss chart show more occasions where gradients spiked and had to be clipped -- even towards the end of the training run this time. The final printout at the end: Once again, although the final train loss is not definitive, it tends to be indicative of the test loss. It's in between the last two runs, so we'd expect the test loss to be likewise in between theirs: Time to download the model -- here it is -- and on to the smoke test: Hmm. At least vaguely coherent, though I'm not 100% convinced. It looks like ads for personal injury lawyers have crept into FineWeb somehow... Still, it's time for the test loss (drumroll): As predicted from the train loss, it's in between the two runs above. Let's put these three runs into the results table: As a reminder: You can see that adding on QKV bias actually made the model worse than the learning-rate-only intervention. That pushes me slightly away from the "it's all about the initial weights" direction; perhaps instead the bias adds some kind of stability that the learning rate scheduling also provides, and they fight against each other? Unfortunately I think the only way to pick it apart would be to do a full set of runs, switching each intervention on and off independently, and that would be too costly. The fact that the weight decay change from 0.1 to 0.01 actually did help when combined with the learning rate change and scheduling was a bit of a surprise; because they're both coupled when we think about standard gradient descent, I was expecting them to be too intertwined for my tests of them in isolation to have been valid. Quite pleased that it didn't work out that way, though, because sweeping across values for different parameters is much easier than it would be if they were connected. However, at this point it occurs to me that it might be because we're using the AdamW optimiser. As I understand it, its big difference versus Adam is that it decouples weight decay. I don't have a solid mental model of what that means exactly (will read up and post about it eventually), but it certainly seems pertinent here. Anyway, I have to say, I'm both pleased with and disappointed by these results. Pleased because we got a result by putting interventions together that was better than any of them in isolation, but disappointed that the end result wasn't even better. The difference between 's loss, at 3.691526, and original GPT-2 small's, at 3.5, was 0.191526. Our best result, for , was 3.577761, so an improvement of 0.113765. That's about 60% of the way there. That said, by sheer chance, while trying out the different sizes of cloud machines, I'd got from a loss of 3.944 training locally to the baseline's value of 3.691526 -- I suspect due to the fact that training in the cloud meant that I could use batch sizes of 96. So a different way of looking at it is that we should include that in the calculations too. From 3.944 to 3.5, the gap with GPT-2 small was 0.444. And we went from 3.944 to 3.577761, an improvement of 0.366239. And that means that we managed to get 82% of the improvement we needed. On the other hand, it means that in terms of my improvements, 0.252474 came from a happy accident, while all of my careful work on interventions only got me 0.113765. :-( Anyway, I think that for now, I'll have to rest happy with that as a result -- and next time around, let's see if we can get to the same level of improvement locally, using gradient accumulation. Luckily the difference was small enough that it doesn't change any of the conclusions I'd made about it.  ↩ Because there are five interventions, and each can be on or off, then it's equivalent to a 5-digit binary number. So that's 2 5 trains, less the five ones I'd already done and the baseline, for a total of 32 − 6 = 26 . At US$50-odd for a train, that's definitely a no-go.  ↩ I did also consider changing the random seed at the start of the code to 67 rather than 42, given that it seemed to provide better initial weights when I was exploring the effects of random noise on the training. I even started the first two training runs with that in place. However, on reflection I realised that it would be one step too far away from scientific rigour. I'm not trying to be 100% rigorous in these posts, but it seemed like a step too far to diligently test all of the interventions against one seed, and then YOLO in a different one for the final training runs.  ↩ Gradient clipping. QKV bias (that is, adding bias to the attention weight matrices). Changing weight decay to the GPT-2 value (0.01 rather than the 0.1 that is typical nowadays). Removing dropout Updating the learning rate from 0.0004 to 0.0014, but also scheduling it so that it varies over the course of the training run. Adding gradient clipping looked like it might have been within the training run noise. Adding QKV bias would have had a large effect on the model's initial weights. All of the others would have started with essentially the same weights (apart from weight tying, though even that would have had the same values for the initial weights apart from the tied ones). But adding the bias would have completely changed them, and its effect size was comfortably within the range of differences you might expect from that. Start a training run with all of the interventions apart from QKV bias. In parallel (Lambda instance availability permitting) run another one, with all of the interventions including QKV bias. Gradient clipping at 3.5 Weight decay changed from 0.1 to 0.01 Dropout removed Learning rate changed from 0.0004 to 0.0014, with a warmup over 5% of the run then a cosine decay to 0.00014. Gradient clipping at 3.5 Weight decay changed from 0.1 to 0.01 Dropout removed Learning rate changed from 0.0004 to 0.0014, with a warmup over 5% of the run then a cosine decay to 0.00014. QKV bias switched on. Gradient clipping at 3.5 Dropout removed Learning rate changed from 0.0004 to 0.0014, with a warmup over 5% of the run then a cosine decay to 0.00014. was gradient clipping at 3.5, weight decay changed from 0.1 to 0.01, dropout removed, and the learning rate intervention, but no QKV bias was gradient clipping at 3.5, weight decay changed from 0.1 to 0.01, dropout removed, and the learning rate intervention, with QKV bias was gradient clipping at 3.5, dropout removed, and the learning rate intervention, but no QKV bias, and no change to weight decay . Luckily the difference was small enough that it doesn't change any of the conclusions I'd made about it.  ↩ Because there are five interventions, and each can be on or off, then it's equivalent to a 5-digit binary number. So that's 2 5 trains, less the five ones I'd already done and the baseline, for a total of 32 − 6 = 26 . At US$50-odd for a train, that's definitely a no-go.  ↩ I did also consider changing the random seed at the start of the code to 67 rather than 42, given that it seemed to provide better initial weights when I was exploring the effects of random noise on the training. I even started the first two training runs with that in place. However, on reflection I realised that it would be one step too far away from scientific rigour. I'm not trying to be 100% rigorous in these posts, but it seemed like a step too far to diligently test all of the interventions against one seed, and then YOLO in a different one for the final training runs.  ↩

0 views