Latest Posts (20 found)

DeepSeek V4 - almost on the frontier, a fraction of the price

Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) last December . They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash . Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They're using the standard MIT license. I think this makes DeepSeek-V4-Pro the new largest open weights model. It's larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B) and more than twice the size of DeepSeek V3.2 (685B). Pro is 865GB on Hugging Face, Flash is 160GB. I'm hoping that a lightly quantized Flash will run on my 128GB M5 MacBook Pro. It's possible the Pro model may run on it if I can stream just the necessary active experts from disk. For the moment I tried the models out via OpenRouter , using llm-openrouter : Here's the pelican for DeepSeek-V4-Flash : And for DeepSeek-V4-Pro : For comparison, take a look at the pelicans I got from DeepSeek V3.2 in December , V3.1 in August , and V3-0324 in March 2025 . So the pelicans are pretty good, but what's really notable here is the cost . DeepSeek V4 is a very, very inexpensive model. Here's DeepSeek's pricing page . They're charging $0.14/million tokens input and $0.28/million tokens output for Flash, and $1.74/million input and $3.48/million output for Pro. Here's a comparison table with the frontier models from Gemini, OpenAI and Anthropic: DeepSeek-V4-Flash is the cheapest of the small models, beating even OpenAI's GPT-5.4 Nano. DeepSeek-V4-Pro is the cheapest of the larger frontier models. This note from the DeepSeek paper helps explain why they can price these models so low - they've focused a great deal on efficiency with this release, especially for longer context prompts: In the scenario of 1M-token context, even DeepSeek-V4-Pro, which has a larger number of activated parameters, attains only 27% of the single-token FLOPs (measured in equivalent FP8 FLOPs) and 10% of the KV cache size relative to DeepSeek-V3.2. Furthermore, DeepSeek-V4-Flash, with its smaller number of activated parameters, pushes efficiency even further: in the 1M-token context setting, it achieves only 10% of the single-token FLOPs and 7% of the KV cache size compared with DeepSeek-V3.2. DeepSeek's self-reported benchmarks in their paper show their Pro model competitive with those other frontier models, albeit with this note: Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months. I'm keeping an eye on huggingface.co/unsloth/models as I expect the Unsloth team will have a set of quantized versions out pretty soon. It's going to be very interesting to see how well that Flash model runs on my own machine. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views

Extract PDF text in your browser with LiteParse for the web

LlamaIndex have a most excellent open source project called LiteParse , which provides a Node.js CLI tool for extracting text from PDFs. I got a version of LiteParse working entirely in the browser, using most of the same libraries that LiteParse uses to run in Node.js. Refreshingly, LiteParse doesn't use AI models to do what it does: it's good old-fashioned PDF parsing, falling back to Tesseract OCR (or other pluggable OCR engines) for PDFs that contain images of text rather than the text itself. The hard problem that LiteParse solves is extracting text in a sensible order despite the infuriating vagaries of PDF layouts. They describe this as "spatial text parsing" - they use some very clever heuristics to detect things like multi-column layouts and group and return the text in a sensible linear flow. The LiteParse documentation describes a pattern for implementing Visual Citations with Bounding Boxes . I really like this idea: being able to answer questions from a PDF and accompany those answers with cropped, highlighted images feels like a great way of increasing the credibility of answers from RAG-style Q&A. LiteParse is provided as a pure CLI tool, designed to be used by agents. You run it like this: I explored its capabilities with Claude and quickly determined that there was no real reason it had to stay a CLI app: it's built on top of PDF.js and Tesseract.js, two libraries I've used for something similar in a browser in the past . The only reason LiteParse didn't have a pure browser-based version is that nobody had built one yet... Visit https://simonw.github.io/liteparse/ to try out LiteParse against any PDF file, running entirely in your browser. Here's what that looks like: The tool can work with or without running OCR, and can optionally display images for every page in the PDF further down the page. The process of building this started in the regular Claude app on my iPhone. I wanted to try out LiteParse myself, so I started by uploading a random PDF I happened to have on my phone along with this prompt: Regular Claude chat can clone directly from GitHub these days, and while by default it can't access most of the internet from its container it can also install packages from PyPI and npm. I often use this to try out new pieces of open source software on my phone - it's a quick way to exercise something without having to sit down with my laptop. You can follow my full conversation in this shared Claude transcript . I asked a few follow-up questions about how it worked, and then asked: This gave me a thorough enough answer that I was convinced it was worth trying getting that to work for real. I opened up my laptop and switched to Claude Code. I forked the original repo on GitHub, cloned a local copy, started a new branch and pasted that last reply from Claude into a new file called notes.md . Then I told Claude Code: I always like to start with a plan for this kind of project. Sometimes I'll use Claude's "planning mode", but in this case I knew I'd want the plan as an artifact in the repository so I told it to write directly. This also means I can iterate on the plan with Claude. I noticed that Claude had decided to punt on generating screenshots of images in the PDF, and suggested we defer a "canvas-encode swap" to v2. I fixed that by prompting: After a few short follow-up prompts, here's the plan.md I thought was strong enough to implement. I prompted: And then mostly left Claude Code to its own devices, tinkered with some other projects, caught up on Duolingo and occasionally checked in to see how it was doing. I added a few prompts to the queue as I was working. Those don't yet show up in my exported transcript, but it turns out running in the relevant folder extracts them. Here are the key follow-up prompts with some notes: I've started habitually asking for "small commits along the way" because it makes for code that's easier to understand or review later on, and I have an unproven hunch that it helps the agent work more effectively too - it's yet another encouragement towards planning and taking on one problem at a time. While it was working I decided it would be nice to be able to interact with an in-progress version. I asked a separate Claude Code session against the same directory for tips on how to run it, and it told me to use . Running that started a development server with live-reloading, which meant I could instantly see the effect of each change it made on disk - and prompt with further requests for tweaks and fixes. Towards the end I decided it was going to be good enough to publish. I started a fresh Claude Code instance and told it: After a bit more iteration here's the GitHub Actions workflow that builds the app using Vite and deploys the result to https://simonw.github.io/liteparse/ . I love GitHub Pages for this kind of thing because it can be quickly configured (by Claude, in this case) to turn any repository into a deployed web-app, at zero cost and with whatever build step is necessary. It even works against private repos, if you don't mind your only security being a secret URL. With this kind of project there's always a major risk that the model might "cheat" - mark key features as "TODO" and fake them, or take shortcuts that ignore the initial requirements. The responsible way to prevent this is to review all of the code... but this wasn't intended as that kind of project, so instead I fired up OpenAI Codex with GPT-5.5 (I had preview access) and told it: The answer I got back was enough to give me confidence that Claude hadn't taken any project-threatening shortcuts. ... and that was about it. Total time in Claude Code for that "build it" step was 59 minutes. I used my claude-code-transcripts tool to export a readable version of the full transcript which you can view here , albeit without those additional queued prompts (here's my issue to fix that ). I'm a pedantic stickler when it comes to the original definition of vibe coding - vibe coding does not mean any time you use AI to help you write code, it's when you use AI without reviewing or caring about the code that's written at all. By my own definition, this LiteParse for the web project is about as pure vibe coding as you can get! I have not looked at a single line of the HTML and TypeScript written for this project - in fact while writing this sentence I had to go and check if it had used JavaScript or TypeScript. Yet somehow this one doesn't feel as vibe coded to me as many of my other vibe coded projects: Most importantly, I'm happy to attach my reputation to this project and recommend that other people try it out. Unlike most of my vibe coded tools I'm not convinced that spending significant additional engineering time on this would have resulted in a meaningfully better initial release. It's fine as it is! I haven't opened a PR against the origin repository because I've not discussed it with the LiteParse team. I've opened an issue , and if they want my vibe coded implementation as a starting point for something more official they're welcome to take it. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . - I've written more about red/green TDD here . (it was messing around with pdfium) - I had a new idea for how the UI should work - see below - it's important to credit your dependencies in a project like this! - it was testing with Playwright in Chrome, turned out there was a bug in Safari - dropping screenshots in of small UI glitches works surprisingly well - it still wasn't working in Safari... - but it fixed it pretty quickly once I pointed that out and it got Playwright working with that browser As a static in-browser web application hosted on GitHub Pages the blast radius for any bugs is almost non-existent: it either works for your PDF or doesn't. No private data is transferred anywhere - all processing happens in your browser - so a security audit is unnecessary. I've glanced once at the network panel while it's running and no additional requests are made when a PDF is being parsed. There was still a whole lot of engineering experience and knowledge required to use the models in this way. Identifying that porting LiteParse to run directly in a browser was critical to the rest of the project.

0 views
alikhil Today

How to Quickly Prepare for Software Engineering Interviews

A few months ago, I found myself needing to prepare for a series of job interviews within a very limited timeframe. It was a stressful experience, but it ultimately worked out well. I decided to share my notes and reflections in case they’re helpful to others in a similar situation. This is especially relevant if you’re not actively job hunting and suddenly receive an interview invitation, leaving you with limited time to prepare but a strong desire to maximize your chances of success. Disclaimer : The tips described in this post may be more useful for senior engineers with hands-on experience and engineering intuition. The internet is full of articles listing all possible HR interview questions. I recommend spending a bit of time on them just to understand what to expect and not be surprised. However, in my humble opinion, there are two main points to focus on during HR interview preparation. First, you need a short story that tells your experience briefly. Avoid listing every bullet point from your CV. Instead, focus on highlighting your key achievements. Also, your story must be aligned with the position you are applying to. Yes, you might need to adjust your story for different jobs at different companies. Second, it’s important to have a clear motivation. Why do you want to change your job, and why this company/role? What kind of job are you looking for? If you have some experience doing System Design interviews or have never done it, start by learning the Delivery framework . Understand each section. Watch at least one video on how it’s done. The more, the better. These videos from Hello Interview channel are really good, though. If you are applying to a FAANG company, you may search for leaked system design questions from that company and spend some time preparing for them. But there is no guarantee that you will get the same topic, thus I would not recommend spending all your time here. If you can, do a mock interview. Ask a friend or find someone to practice with. If you can’t, then try to walk through alone, but talk through everything out loud. During the interview, treat the interviewer as a colleague, ask questions, ensure you understand the problem, and that you have not missed any important requirements before building the design of the system. Don’t rush. This part is really tricky. If the company tends to use LeetCode-style interviews, there is no shortcut here. You need to solve hundreds of them to really feel confident. You may need to refresh your memory on algorithms you feel less confident about (for example, I always forget about corner cases for binary search). Again, if it’s a big / well-known company, you can try to search for leaked coding interview questions. S.T.A.R (situation task action result) & C.A.R.L (context action result learning) There are dozens of questions you could be asked in behavioral interviews. And you’re expected to structure your answers using the STAR framework. This means you need to tell a story by defining a context, your actions, and results. You could go and just prepare a STAR format answer to all such questions, but it will take a lot of time, and it’s suboptimal. This, combined with the fact that the same stories can be used for different questions, makes the situation easier for you. You can prepare 7–10 stories that will cover most of the questions. During preparation, you can write them as text, but don’t read them during the interview. It tends to sound unnatural. When telling your story using the STAR method, make sure your final sentence clearly highlights a positive outcome. Adjust your tone to emphasize this closing part so it stands out. The STAR framework is a standard. But also check CARL in some questions, it would be good to tell what you have learned from that story. Here are some materials that helped me to prepare for a behavioral interview: Some companies have such an interview stage. It’s quite unpopular but still exists. You’re asked to present a project or problem you worked on. You explain the context, problem, solution, results, and your role in this story. It’s like showing the result of your work to colleagues from different departments/teams. This stage is very open-ended. You are not given specific instructions, and there is not much information on the internet with recommendations on how to prepare and conduct such interviews. When I found out I would have this interview, I was initially shocked and unsure how to prepare, as I didn’t know what to expect. It wasn’t until I realized that in reality, it’s you, the interviewee, who rules this interview . You choose the project, decide what to include and omit, control the level of detail, and you are coming up with the story you know, with all the answers for all possible questions, because it’s your story. So, make the most of this stage. Prepare your story, make a few slides / notes / architecture sketches. Don’t dig into details too much. Leave a space for the questions. And even if there is no dedicated interview, you may be asked to tell in detail about a certain problem/project you were working on. So, be prepared. Have your story! When answering open-ended questions, aim to tell stories where the scale of the problem matches the level of the role you’re applying for . For example, if you are asked, “Tell me about a challenging/interesting problem/task you were working on recently.” Optimizing an SQL query by adding an index may be fine for junior roles, but it won’t carry enough weight for senior positions. Interviewers would expect to hear something bigger, challenging, higher stakes, and often involving cross-team collaboration, such as migrating a large system to Kubernetes. Question back . You should ask questions to learn more about the company, their culture, the hiring manager’s management style, and what they like or dislike about their work. Prepare a list of questions before the interview. Start preparing in advance . Even if you’re not planning to change jobs anytime soon, you can begin investing in your future by: Hello Interview - Behavioral Interview Discussion with Ex-Meta Hiring Committee Member - must watch Behavioral interview, although I would recommend watching it even before the HR interview, because it gives a bunch of helpful tips about self-presentation https://thebehavioral.substack.com/ - Strategies, tips, and resources to prepare for your next behavioral interview from a FAANG+ insider. solving one LeetCode problem a day keeping track of tasks/projects you’ve completed, along with your achievements (many companies require this anyway for performance reviews) – this would be a foundation for your stories in behavioral and project walkthrough interviews. keeping your CV and LinkedIn up to date.

0 views

A pelican for GPT-5.5 via the semi-official Codex backdoor API

GPT-5.5 is out . It's available in OpenAI Codex and is rolling out to paid ChatGPT subscribers. I've had some preview access and found it to be a fast, effective and highly capable model. As is usually the case these days, it's hard to put into words what's good about it - I ask it to build things and it builds exactly what I ask for! There's one notable omission from today's release - the API: API deployments require different safeguards and we are working closely with partners and customers on the safety and security requirements for serving it at scale. We'll bring GPT‑5.5 and GPT‑5.5 Pro to the API very soon. When I run my pelican benchmark I always prefer to use an API, to avoid hidden system prompts in ChatGPT or other agent harnesses from impacting the results. One of the ongoing tension points in the AI world over the past few months has concerned how agent harnesses like OpenClaw and Pi interact with the APIs provided by the big providers. Both OpenAI and Anthropic offer popular monthly subscriptions which provide access to their models at a significant discount to their raw API. OpenClaw integrated directly with this mechanism, and was then blocked from doing so by Anthropic. This kicked off a whole thing. OpenAI - who recently hired OpenClaw creator Peter Steinberger - saw an opportunity for an easy karma win and announced that OpenClaw was welcome to continue integrating with OpenAI's subscriptions via the same mechanism used by their (open source) Codex CLI tool. Does this mean anyone can write code that integrates with OpenAI's Codex-specific APIs to hook into those existing subscriptions? The other day Jeremy Howard asked : Anyone know whether OpenAI officially supports the use of the endpoint that Pi and Opencode (IIUC) uses? It turned out that on March 30th OpenAI's Romain Huet had tweeted : We want people to be able to use Codex, and their ChatGPT subscription, wherever they like! That means in the app, in the terminal, but also in JetBrains, Xcode, OpenCode, Pi, and now Claude Code. That’s why Codex CLI and Codex app server are open source too! 🙂 And Peter Steinberger replied to Jeremy that: OpenAI sub is officially supported. So... I had Claude Code reverse-engineer the openai/codex repo, figure out how authentication tokens were stored and build me llm-openai-via-codex , a new plugin for LLM which picks up your existing Codex subscription and uses it to run prompts! (With hindsight I wish I'd used GPT-5.4 or the GPT-5.5 preview, it would have been funnier. I genuinely considered rewriting the project from scratch using Codex and GPT-5.5 for the sake of the joke, but decided not to spend any more time on this!) Here's how to use it: All existing LLM features should also work - use to attach an image, to start an ongoing chat, to view logged conversations and to try it out with tool support . Let's generate a pelican! Here's what I got back : I've seen better from GPT-5.4 , so I tagged on and tried again : That one took almost four minutes to generate, but I think it's a much better effort. If you compare the SVG code ( default , xhigh ) the one took a very different approach, which is much more CSS-heavy - as demonstrated by those gradients. used 9,322 reasoning tokens where the default used just 39. One of the most notable things about GPT-5.5 is the pricing. Once it goes live in the API it's going to be priced at twice the cost of GPT-5.4 - $5 per 1M input tokens and $30 per 1M output tokens, where 5.4 is $2.5 and $15. GPT-5.5 Pro will be even more: $30 per 1M input tokens and $180 per 1M output tokens. GPT-5.4 will remain available. At half the price of 5.5 this feels like 5.4 is to 5.5 as Claude Sonnet is to Claude Opus. Ethan Mollick has a detailed review of GPT-5.5 where he put it (and GPT-5.5 Pro) through an array of interesting challenges. His verdict: the jagged frontier continues to hold, with GPT-5.5 excellent at some things and challenged by others in a way that remains difficult to predict. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Install Codex CLI, buy an OpenAI plan, login to Codex Install LLM: Install the new plugin: Start prompting:

0 views

Update on My Coffee Ridden Framework 13

A week or so ago, I talked about how I might have killed my Framework 13 by dumping a full mug of coffee over it while it was running. In that last post I explained how I'd stripped the laptop down and was waiting for some isopropyl alcohol (IPA) to be delivered so I could more thoroughly clean it. Well dear reader, the IPA turned up, I cleaned it as best I could, and left it for 24 hours to dry off. The next day I came back to it, re-assembled it and hit the power button with a fair amount of trepidation. I think it's dead, Jim. And I can't help thinking that turning the laptop on in haste, after the first clean is what completely screwed it. Oh well, we live and learn. In my desperation, I contacted Framework support and explained the whole saga to see if there was anything I was missing. There wasn't. They told me that the LED pattern I was seeing when powered on was indicative of a communication error with the board, so it's dead and needed to be replaced. Problem is, a new board is £700 (~$950) and I didn't fancy shelling out that much money out of my own pocket, so I contacted my home insurance provider to make a claim, and to be fair they were great. A case was logged and a couple of days later I had a payout that would cover the whole amount. The payout from the insurance was more than the repair cost, so I decided to upgrade from my current Ryzen 7 7840, to an AI 300 series board instead - nice little upgrade! The Framework site said it would be shipped in 5 days, and would probably be subject to delays of a further 7 days due to global freight disruptions. So I bought myself a ThinkPad T480 to see me through (which I'm typing this post on) as I couldn't bear to be on MacOS for another second. Framework overachieved again and the board is due for delivery tomorrow (Friday 24th April 2026). Once the board is delivered and my beloved Framework is (hopefully) working again, this nice little ThinkPad will go to my wife as an upgrade from here 2014(!) Gen 2 X1 Carbon. I've had a few people reach out telling me that they'd done something similar and their device's had survived. Unfortunately I wasn't as lucky, so what happened? I think it's because I didn't spill the coffee on my laptop, but next to it. Then as the puddle of coffee made its way over my desk and inevitably under my laptop, the spinning fan must have sucked it up and perfectly spread the coffee all over the main board. Thanks for that. Stupid fan. 🤣️ Had I spilled the coffee on my laptop, it would have had to make its way through the keyboard and chassis before it got to the board, by which point I would have had the laptop switched off and draining. I can't say for sure, but that's my theory. So anyway, wish my luck with the new board, folks! Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views
Stratechery Yesterday

An Interview with Google Cloud CEO Thomas Kurian About the Agentic Moment

Listen to this post: Good morning, This week’s Stratechery Interview is with Google Cloud CEO Thomas Kurian . Kurian joined Google to lead the company’s cloud division in 2018; prior to that he was President of Product Development at Oracle, where he worked for 22 years. I previously spoke to Kurian in March 2021 , April 2024 , and April 2025 . The occasion for these interviews, at least for the last three years, is Kurian’s annual keynote at Google Cloud Next. You can watch the keynote here , and read the blog about Google’s announcements here . I spoke to Kurian a week ago, on April 15, and at that time only had access to the afore-linked blog post. With regards to the keynote, which I have since watched, I thought it was a powerful opening: Kurian returned to last year’s theme, about a unified architecture, but emphasized that the use cases were no longer theoretical or pilots but running at scale for real users. He also emphasized — in a foreshadowing of a point we discussed below — that Google itself was running on the same infrastructure as Google Cloud. Google CEO Sundar Pichai, meanwhile, talked about Google’s capex investment, and that (1) half of it was going towards Google Cloud, and (2) that Google Cloud was running the same stack as Google itself. I sense a theme! Pichai also emphasized security, a point that Kurian was also careful to raise in our talk, before discussing the shift to agents. To that end, in this interview — which again, was conducted before the keynote — we discuss agents. Specifically, I wanted to get Kurian’s take on the quality of Gemini’s harness (unsurprisingly, he thinks it’s great). Google has an integration advantage, but is it paying off in such a large company? I was also curious about how Google thinks about TPUs specifically and the cloud business generally in terms of balancing its internal needs with external customers like Anthropic. We also talk about the software ecosystem, why Google still believes in partnerships, and why the company was ready to seize the AI moment (hint: it’s because of Kurian). As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for clarity. Thomas Kurian , welcome back to Stratechery. I promise I have recording turned on this year — in fact, I have two recordings turned on. TK: Thank you so much, Ben. Good to see you, thanks for taking the time. Well, I look forward to talking to you. It’s good to talk to you for multiple interviews, much better than talking to you multiple times in one interview, so we’re already doing better this year. But like last year, we are recording before your Google Next keynote . We’re actually quite a bit ahead, I think we’re several days ahead, but this podcast won’t be released until after the keynote. Therefore, I’m going to ask the exact same question I asked last year. Specifically, I like watching keynotes, not for the announcements, but for the framing that happens up front. Last year, that framing was infrastructure, [Google CEO] Sundar Pichai actually delivered that at the opening, then you came in and talked about that, and that was the context for everything that you talked about. What is the framing this year? TK: The framing this year is that as AI models have become more sophisticated, we see customers evolving the use of AI models from being used to answer questions in a chatbot-like fashion, to actually automating tasks on their behalf, and to automate process flows within the organization. By automating process flows, you both get efficiency improvements, productivity improvements, frankly, you can also change the way that you introduce new products and services to market, for example. In order to do that well, the technology, what you need is a world-class agent platform and to underpin the agent platform, you need world-class infrastructure. You need the way that the agents interact with your company’s data and your business — so you need capabilities to help an agent really understand the company’s business information and context. I think, as you’ve seen in the press, AI and cyber have become very contextual now, there’s a lot of concerns that AI will accelerate the speed of cyber attacks on people’s systems, and so we’re going to be talking about how we’re bringing AI and our cyber technology together to protect, including the integration of Wiz , and then we’re introducing Gemini Enterprise and our agent platform to customers. That’s sort of the theme of what we’re talking about. You mentioned agents last year, everyone was talking about them to a degree, what has really changed from last year to this year that makes this different? I read your whole blog post, it’s very long, and I think the word “agent” may appear in every single paragraph. TK: There’s three or four big things that have changed. The first is capabilities of models — Gemini is able to reason much more effectively as new versions of Gemini have come out. Second, they’re able to maintain long-running memory, which you require if you have an agent that’s automating tasks over many, many steps, it has to maintain a lot of state in memory. Third, their interaction with tools and the rest of the world, there have been good abstractions, skills, tools, MCPs [ Model Context Protocol ], as they’re called, they’re all abstractions for how an agent reasons and interacts with the rest of a company’s systems. All of them have advanced and so the core capabilities that the models themselves have gotten a lot better, the capability and the ability to use tools and interact with the rest of the world has become a lot better, the abstractions that the world exposes itself to the model has improved and so now you have models have these capabilities to do these very complex tasks. That all makes sense and certainly tracks. A lot of these announcements, though, as I was going through them, a lot was about the infrastructure around agents, which makes sense — the orchestration, registry, identity, security, all these bits and pieces. All of this is clearly necessary for large enterprises, something they’re going to worry about and ask about. But the agents have to actually work; do Gemini agents actually work? Because there’s a lot of talk, you know, Gemini was the belle of the ball four months ago, but over the last little bit, it’s been mostly a lot about Anthropic and Claude, Codex, a lot of talk about that, and Gemini, not much talk. What’s your feeling about your actual capabilities, not just agents in general? TK: I’ve always said when people ask us about it, I always say, “Let our customers talk about it, rather than we talk about it”, I think you’re going to hear from 500 customers telling their stories at Next. Even people building agents, we have a whole range of them, from Citigroup to Bosch to eBay to Virgin Voyages to Walmart, there’s a whole range of them, Food and Drug Administration, etc., Comcast, Unilever, all of them are going be talking about specific business problems they had. For example, for Citi, they’ll be talking about a new wealth advisor, Investment Management, where they’re using our agents to research a person’s investment priorities. So a person says, “Here’s my priorities for investment, my kids are going to school, I need this kind of cash flow in order to fund it”, and then it researches your financial portfolio and interacts with you to give you recommendations. If you look at Comcast, they’re using us for all of the work that they do for consumer services — this is repair, scheduling appointments, dispatching field technicians, there’s very complex flows that have many, many steps and interact with you with a lot of complex systems. If you look at some of these flows, they require all of the capabilities I talked about. So as an example, I want the capability to call a set of tools, and those tools may be I want to book an appointment, so I need calendar, I need to look up, if I’m dispatching a technician, I need to look up spare parts so I need to pull up from my inventory that spare parts inventory, I need to schedule that to be available at the same time as the person who’s going out, I need to update my inventory that have taken something out of it. I mean, these are very, very complex steps. What’s interesting about all these complex steps and going through all these bits and pieces, it sounds like you’re saying that almost the more constraints there are, the more things you’re bumping up into, is that actually a better environment for instituting these sort of flows just because what you need to do is clearly defined? TK: Just being perfectly frank, Ben, having constraints requires the model to be even more intelligent. Just as an example, the number of variants in a process flow that’s complicated many, many steps, the number of different idiosyncratic situations that you may encounter are large so you cannot a priori program every one of them. You need to teach the model to use, for example, to be able to spin up a virtual machine and use a tool in the virtual machine to generate code to deal with some of these situations. So the most sophisticated thing is where you can give the model a high level set of instructions and have it goal seek an outcome. So you say, “I need to schedule this appointment”, and it turns out there may be 19 different conditions that occur when you’re trying to schedule an appointment and as part of that, you can’t a priori tell the model every single possible condition deterministically. So you need to teach the model, “Okay, the user did not tell you what to do, but the goal was to schedule an appointment, so here is how you generate code to then create a collection of things that can interact with the model and understand what to do”. This is very interesting, you’re walking through this process, this makes a lot of sense. How do you have that conversation with DeepMind? You’re connecting the, “This is the workflow that is needing to happen, these are what we need the model to do, this is where it does well, where it doesn’t”, what’s the working relationship there? TK: We have a harness in which all these flows journeys, for example, as we see them with customers, we put them into the harness and they get into the reinforcement loop for Gemini. How tight is that process? TK: Very tight. We have people sitting next to [DeepMind CEO] Demis’ [Hassabis] team, in fact I just came from a meeting with them, that loop is what allows us — we are in a unique position in the market. We’re unique in three different ways, we’re unique because we have the whole stack of AI technology. In order to do agents well, you need to have a model that takes all these journeys and puts it into the harness that handles the improvement, as we call it, hill climbing, literally every hour of every day, and the complexity of the journeys we see are in some ways much more complicated because in companies, you have many different systems, different conditions, different flows, you may not see that in other domains, like in a pure consumer domain. In order to do these well, you also need, for example, models need to spin up compute, models need to now hold on to tokens for longer because they need to hold, for example, a KV cache that holds memory about what’s happening during the transaction flow. Having awesome infrastructure, both classical, what we call classical compute machines, and TPUs gives us real strength there. Third, as you walk through these, one of the things you find is a lot of the systems these models interact with are things like databases, enterprise applications. So understanding the context of these, like for example, “How much inventory do you have?”, defining “What is inventory?”, “What part are you talking about?”, “What part number are you talking about?”, those things require you to have technology that understands the business graph and the dictionary of all the objects and the sources of information in your company. Our strength in data processing gives us some technology that we’re going to be talking about next week around something we call Knowledge Catalog, think of it as as your global dictionary for all information within the company, that’s a unique strength. And then obviously you don’t want information that’s critical to your company exposed on the Internet, you don’t want your model to get attacked because now it’s handling very complex process flows, you don’t want it hijacked, and so all the anxiety around cyber, we have very specific tools on, so our differentiation is all these pieces working together. That makes sense, the integration is a big part of your pitch. At the same time, you’re also a big, sprawling company and I think there’s maybe a perception, that I maybe hold, that some of the frontier labs are much more focused, they’re much more top-down about, “This is how our harness is going to work, the way it’s going to use tooling”, and all the things you’re talking about having this feedback flow back in sounds great unless there’s so many different takes on the way it should work and then you have your own internal customers as well. How do you balance having a point of view versus getting stuck in the muck? TK: Every product that Google has is on the same Gemini version, on the same day, on the same hour, every one of us is using the same harness. And you feel good that that harness is where it needs to be — it’s not getting pulled in 50 million directions thanks to all your customers and Google’s workloads? TK: Absolutely not, we are very focused on working with Demis and [DeepMind CTO] Koray [Kavukcuoglu] who lead our team to make sure they see the sophistication of these scenarios and we work literally side-by-side, hour-to-hour with them. There’s been a lot of speculation on are we distracted the company… I don’t think you’re distracted, I think it’s more just a matter of it’s a classic big company versus small company bit. Like a startup comes in and you have a very clear point of view and you don’t have all the enterprise stuff, you don’t have all this protecting the data, or permissions and all those structures, and yet that stuff sort of gets pulled along because there’s such demand to use your product that works really well and then over here it’s like, “Hey, we have everything protected and we have all these things around it”, but does the core product actually deliver? TK: The core product is being used by lots of people. The proof of that — we generate 16 billion tokens a minute, up from 10 just last December or January. Well, your financial results certainly showed that as well. There’s a bit where you’re doing so well, I have to be a little hard on you here. TK: A lot of people told us we were dead in 2023 — we’re still living. I think you’re doing more than living, you’re doing very well. TK: And so we never say anything negative about anybody else, our results prove for themselves. I always say, let our customers tell the story, they’re doing amazing things with Gemini in companies, enterprise, and they see the value of what we’re delivering for them. You mentioned that everyone in Google is on the same version of Gemini, using the same harness. Does that also apply to all this infrastructure around agents you’re doing, around sort of identity and security? TK: Yeah, in the enterprise, the way that all the infrastructure works is we have configurable mechanisms. Like for example, when you configure an agent, a very simple thing is you want to configure the agent with a different identity from a person, just a very simple example so that you can track, “Who did this transaction? Was it the human or the agent?, because there’s issues like liability. You may want to revoke permissions for the agent at a certain point in time, you want to allow it to only do certain tasks and not everything that the human does so there are controls you want to put around an individual agent and a collection of things that’s separate from the person. As we bring agents to consumers as part of our Gemini app, very similar concepts want to be exposed, and so the architecture that we use allows us to have those things. The sources of that may be different. In the consumer world, they may use the Google login account, in the enterprise world, they may use a directory to store it, but that’s just an abstraction of our technology to the rest of the world. We’ve been talking a lot about Gemini agents and the whole Gemini platform, but you also have just the broader Google Cloud platform. One of your major tenants is a company I was just sort of referring obliquely to, which is Anthropic, they’re doing a lot of inference on TPUs in particular. If Anthropic wins deals at the expense of Gemini, is that still a win? TK: We sell different parts of our stack. One of the things people don’t realize is we monetize many different parts of the stack in different ways. Like Anthropic, there’s a lot of labs that use our stack — in fact, most of the large AI labs use our stack. So if somebody uses TPUs to either to train their model or to use it for inference, we’re monetizing that part of the stack, that gives us resources to then fund our R&D and other investments. Some of the labs use our TPU and our Gemini model, others may use our TPU and then buy our cybersecurity protection for their models. So as a platform player, we have to allow our technology to be monetized in as many ways as possible and we don’t see it as a zero sum. Sometimes, though, if you have the SaaS layer and the platform layer and the infrastructure, is there one that is the most important? On one hand, SaaS has the highest margins, it kind of decreases going down. On the other hand, that infrastructure needs to be used, you’re spending a lot of money on it, you want full utilization. How do you think about that in terms of what’s the most important? I know they’re all important, but how do you think about that tradeoff? TK: If we were making TPUs just for ourselves, we would have lower volume than we do as a general purpose TPU supplier, which means there would be times of day that we would not be using those TPUs. Do you follow me? Like if you think how chat systems work, they’re very diurnal in nature, because you ask questions when you’re awake and we have a great search business and we have a great Gemini app business, but there would be a certain diurnalty to it during the daytime, there’d be a lot of questions, what about in the evening? Because we sell TPUs in the market, we’re able to offer it at spot to the rest of the world because we have such a large business. We’re able to also get manufacturing, better terms with suppliers and other things because of a real volume player, and that in turn lowers our cost of goods sold. So there are many more dynamics. The company is very focused on ensuring we win every part of this, not just one part of it. Gemini is obviously a super important initiative for us, and you’ll see the big announcements are around— For sure, it’s almost all Gemini. TK: But I wouldn’t assume that if we do that, the only way to do that is to offer our chips along with our model. We see a strong business offering our chips to many other people and you’ll see all of this is what’s accelerating our differentiation, and you see it in our financial results. Your financials are incredible, your revenues up, margins are up hugely, I’ve been posting that chart of them for a long time, last quarter was amazing . I do have to ask about TPUs, though. You talk about selling our TPU chips, to date that has meant TPU instances on GCP, but now there’s talk about actually selling TPU chips, what’s the status of that? What’s the official word, can I go buy a TPU? TK: I’ll explain a little bit what we see. So let me talk briefly about what the announcements we’re making, what the product is being used for, and then how we bring some of it to market. TK: We’re introducing two big new TPUs next week. One is TPU 8t, which “t” stands for training, it’s more optimized for training, think of it as 9,600 TPU chips, a single pod, as we call it, it has three times better performance than the current generation, which is already the leading one in the market. Then there’s 8i, which is “i” for inference, it’s 1,152 chips, three times the SRAM, and it has a new thing called the Collectives Engine, which gives you super efficient calculation performance for inference. Now, along with that, we are introducing Nvidia VR200, we’re also introducing more ARM capability for classical compute, because people who use models increasingly need to spin up a VM in order to do tasks, and that VMs we see interest in. We’re introducing not just new compute families, but also new storage, there are two new storage offerings. There’s one, the fastest Lustre solution in the market, it’s 10 terabits per second, that’s just to give you a sense, it’s like five times number two. We’re also introducing a new thing for ultra low latency — when you do inference, you want super low latency in accessing storage, we call it Rapid Storage, it can give you 15 terabits per second with ultra low latency, like microsecond latency. So why are we introducing all this stuff? TPUs, definitely a big market is the AI labs, but we’re seeing interest from new segments of the market. So a big new segment is financial services and when I say financial services, capital markets, and the reason is that today, if you’re a trading firm, a capital markets firm, you spend a lot of time running algorithmic trading and algorithmic trading is running numerical algorithms on traditional Intel type cores, x86 cores. Now what they find is that models can do inferencing and the inference performance is actually better than traditional numerical computing. So that’s one new segment, the second segment is high performance compute. We see a ton of people wanting to do energy modeling, computational fluid dynamics, solid state, there’s a whole bunch of parameters there too. What’s interesting about those is, you will see at our event, Citadel Securities for example, talk in the keynote about how they’re using TPU. Citadel, as you know, is a large capital markets firm. Department of Energy, they have a mission called Genesis , which is the new national lab mission on changing the energy infrastructure for the United States. There’s a big Brazilian largest utility in Brazil, Axia, all of them are examples of people who are part of just the keynote talking about how they use TPUs. When we look at that, there’s a couple of different things we see. Capital markets firms say, “Hey, if we’re going to replace our algorithmic trading solution, you have to bring TPU to where the venue is”. Right, because they care about the latency of going to a data center, that’s why they’re all New Jersey. TK: Secondly, if you’re a national lab, you have so much data you’ve collected over the last X number of years with your experiments — saying you have to bring all that data to the cloud to reason on it doesn’t make sense, so you will see us putting TPU in other people’s venues, and when we do that, we’re introducing new ways of people also procuring it. When I say procuring it, you buy it as a system, you don’t have to buy it just as a cloud source. How does this new way of selling, which is almost like a third way, so you have in Google’s data centers, you have bringing TPUs to customers, but then you have a deal like last week where between Anthropic and Broadcom and Google, this is going in their data centers. There’s these sort of renegade data centers that have access to power, maybe they were doing Bitcoin or whatever it might be, there’s been a big push to get TPUs into those. Where does that fit into this? TK: I would not assume everything you read in the press is true. Well, the Anthropic announcement was definitely a a big announcement. TK: Just to be honest with you, we have a flavor that runs in the cloud and a flavor that runs in third-party data center. The technology, the machines are identical. My question here is, where is that coming from? Is that part of your TSMC allocation? Is that Broadcom’s? Because no one can get enough compute, so ultimately that goes all the way back to the root. TK: The chips are all part of our global — TPU is a Google chip, as you know. So it’s part of global allocation, Broadcom partner who manufactures the TPUs with us and so it’s just part of the overall business. The new thing we’re talking about is just that you can run TPU in other venues. Makes sense. Will we ever have enough compute? Last year you said, “I think we’re going to resolve it shortly”, it doesn’t seem very resolved, what’s the status there? TK: We’ve worked super hard as an organization, our team that’s done our compute infrastructure, our global data centers, machines, all that, they’ve done an amazing job, there’s always a shortage, there’s never enough. But it doesn’t mean that we’re not — we would not be growing at the rate we are if we didn’t have enough compute. And so there’s more that we want, but there’s also the reality of our teams have done an amazing job, and our customers who are using it will tell you they’re seeing the benefits of the hard work our teams have done. There’s potential customers in the market, maybe current customers, who may be willing to pay basically any price for compute at this point. How do you think about the short term, “Wow we can actually just make a lot of money right now”, versus, “We need to invest in our products” — you had Microsoft, who I’m not going to ask you to comment on, but last quarter they’re like, “Yeah, we allocated less to Azure because we had our own internal workloads”. These are real trade-offs that you need to think about, how do you think about that in terms of GCP? TK: We run a balanced portfolio, we want to grow different parts of our business, we sit down as an executive team and also with Sundar and work through how we’re going to balance the different parts of our portfolio. We see, broad brush, three to four buckets of things. One bucket of things is where we want to grow Gemini as a business, our core Gemini business is doing super well, 16 billion tokens a minute, up 40% since last quarter, even this product called Gemini Enterprise , which is our core agent platform, has grown 40% sequentially quarter-over-quarter. So that part of the business, we’re committed to making it super successful, it’s a priority for us. Second segment of the business is where Gemini is being used inside of some of our core products, so I’ll give you an example. We’ve introduced Gemini inside our threat intelligence tools. Why is that? Because we have real expertise at Google scanning the dark web to identify threats, the problem is there’s so many of them, an average organization doesn’t know which of those many threats apply to them. So we use Gemini to process and prioritize which threats might affect you, it’s 98% accurate and has processed 3.9 million threats in the last year, so that’s an example of Gemini being used as an embedded capability. Right. The whole SaaS, PaaS, IaaS — the SaaS bit is still important. TK: There’s that capability, there’s people who want to use Gemini to reason on data in our analytics infrastructure so there’s a second big set where Gemini is an embedded capability and that in turn depends on chips and TPUs and GPUs. And the third one is offering our compute platform to people. We balance across those because we want all of them to be successful by bringing hardware or out machines to other people’s venues. We’re broadening our TAM, total addressable market, in that part of the business also we see a different cash flow model than if you were putting CapEx so there’s a lot of different parameters we have to balance. All those ones you listed for you to make trade-offs on, but then you also have to get in a meeting with Sundar and the other leaders of Google to make trade-offs with DeepMind and their R&D and with the consumer products. What are those meetings like? TK: We have a regular set of cadence of meetings and we balance the different priorities and we want to be successful on many different dimensions. I wouldn’t assume all of these dimensions are zero sum. Like, for example, when we offer our product in other venues, we drive cash flow in a different way than putting CapEx — so to some extent, that changes the boundary of how we offer our capital boundary as a company also. So I think there’s a general view of there’s a compute shortage, and if you give one, you will have to take from another, I think that’s an overly simplistic view of it, having been in this for long enough and having been, my team does both parts. We are responsible for delivering all the infrastructure for Alphabet, and they’ve done an amazing job doing that, and I’m also responsible for running the cloud business, and you can tell that our differentiation, I come back to this, it would be a different problem if you didn’t have demand. You can, and whenever I ask us to prove that you’ve got demand, I always say, “Look at our results”. Well that’s been the biggest change even since January where there was still some sort of latent skepticism about, “Is all this CapEx worth it?”, feels like those questions have been completely erased at this point. Speaking of markets in the last couple months, all these SaaS companies are getting killed in the market, you have a big SaaS business, you’re definitely not getting killed in the market, why are you escaping it? TK: I think we have transitioned. The core fundamentals is finding, and this is the way we approach our product portfolio, I’ll give you a very simple example — 2023, we said, “Hey, at 2022, we said, we’re not just going to build a secure cloud, we’re also going to start offering cybersecurity products”. When we entered the market and then we looked at what other things people — the value of cyber is driven by two dimensions. Dimension one, “What is it protecting?”, because it has to protect high value things, and the other element is, “How good is it at protecting?”, “What’s the technology that it’s going to use to protect?”. So we said, “There are only two valuable places to protect, there’s either the endpoint”, which is your desktop on which apps run, other people are doing a good job there, the rest of the world is moving all their applications and data to the cloud, let’s protect that. Second, we said AI is going to find vulnerabilities because at the end of the day, finding vulnerabilities is a question of a model really understanding code, and if you can find vulnerabilities at a much more accelerated rate, people need to fix vulnerabilities at an incredibly aggressive, fast rate, and so we started a set of work back then and we said to ensure that we have the leading product portfolio, let’s acquire Wiz. We’re now working on, you’ll see a number of announcements, there’s the Threat Intelligence Agent that allows us to you know understand the threat landscape and use Gemini to prioritize what you should pay attention to where a lot of people are using Gemini to actually scan their code, and then we’re introducing three new Gemini-powered agents with Wiz , one called Red Agent — think of it as continuous red-teaming of your infrastructure, a Blue Agent that says, “Okay, I looked at what’s happening with the Red team and I know what you need to go fix”, and a Green Agent that says, “I’ll fix it for you”, and that’s going to cut the cycle time. Like our Threat Intelligence Agent, you will see reference customers from Chicago Mercantile Exchange, there’s a whole bunch of them talking next week, about how it takes an investigation that just take 30 minutes and does it in 30 seconds, that allows you to get response. Now, this is an example of when we started, people said, “Why would a hyperscaler want to become a cyber company?”, and we were like, “It’s not about being a hyperscaler, it’s about solving that problem at the intersection of — AI is going to accelerate cyber threats and you cannot do repair the old way”. Yep, it really answers the question that people had when you acquired Wiz, which is, “ Why do you need to buy it , why can’t you just build it?”. It’s like, “Well, in two years, it’s going to be too late”. That’s, I think, also felt very tangibly right now. TK: Today, we are where we are because we made that bet. TK: So when people ask, “Why are you guys growing even in sectors that may be struggling?”, it’s because we have differentiation and we made those decisions early. That makes sense. One of the interesting product announcements this year is this cross-cloud lakehouse which lets customers leave their data in AWS and Azure while still being query-able by by your services instantly. Is this the final admission that even if enterprises love your AI and love Gemini, they’re not going to shift all their workloads if they’re already on other clouds? Lots of your products have been about that in the past — even Wiz is about that to a certain exten — but is that just the reality? There’s not going to be a huge amount of spillover as far as pulling things from other clouds to Google. TK: If you use BigQuery today, you don’t have to move your transactional applications to BigQuery. If you’re using Gemini today, you can keep your applications in another cloud and use Gemini to reason on it. The problem we were trying to solve is a very specific problem. Today, when people talk about lakehouses, they say, “We have a multi-cloud lakehouse”. What they really mean is their lakehouse can be run on any cloud, but when it’s running on a particular cloud, you can only access the data in that cloud. And then people say, “That’s crazy, because I’ve got data in a SaaS app like Salesforce”, “I’ve got data in an ERP system”, “I’ve got data in Azure and Amazon, and I’d like to use analysis across all this”, one choice to customers is copy all that data out, that’s expensive for them because of the egress tax that everybody imposes. So we said, “Keep your data there, we can still give you world-class analysis”, and so it’s solving that custody. The customer has a problem, they want to do analysis, there are four things we’re giving them. Keep your data where it is, no matter how many clouds. We’re not talking about a single cloud lakehouse, we’re talking about across all the clouds and across all your SaaS apps, we can do analysis, one. Two, people said, “How fast can you run?”, the proof that we’re going to show is we’re 2x better in price performance than the market leader, right out of the gate. The third one, people said, “I’m not an expert on writing Python and Spark, can you give me essentially vibe coding for Python and Spark?” — yes, you’ll see us introduce a agent manager to generate Python and Spark code using Gemini. And then the last one people said today, Ben, if you ask a question, I was using that example of field service, I’m running a query on, “How much inventory do I have in parts?”, before I send the technician — that information sits inside an application in a set of tables in a database, most organizations have thousands of databases, teaching the model which system has what information, and the notion of part is split across 10 different tables in this particular database, you need a system that builds that semantic graph of all the information in your company. Right, this is the Knowledge Catalog . TK: That’s the catalog, and that gives you super good accuracy when you’re researching information. So we put all this together and back to, we’ve always been super pragmatic. I always say enterprises have certain problems that they see independent of a cloud. For example, security — they don’t want to buy three different security tools from three different hyperscalers. Analytics — they don’t want to buy three different analytic tools from three different hyperscalers. Others have chosen to say, “My stuff only works with my cloud”, that’s why enterprises often choose us, because we work across all the clouds and all the security environments you have and you can keep stuff wherever you are and use Gemini to access and automate stuff for you, so all that is just part of listening to customers. This all makes perfect sense, particularly this bit about the Knowledge Catalog definitely fits how I’ve been thinking. I wrote about this a few years ago about this importance of this whole layer and understanding it, it’s a bit of a big lift to get this in place. You have some sort of analog, say, with like a Palantir that’s putting in like their ontology thing . They have FDEs out on the site, multi-month projects doing this. You have OpenAI talking about Frontier , their agent layer, and they’re partnering with all the tech consultancies to build this out. Is this going to entail a lot of boots on the ground to get this graph working and functional in a way that your agents can operate effectively across it? TK: We’re not competing with Palantir, we’re not building a semantic dictionary or an ontology. What we’re doing is, today I’ll give you the closest analogy. TK: Today when you use a model, let’s say you use Gemini, and you ask a question, Gemini goes through reasoning, and then it shows you a citation. A citation is, “How did I answer the question and what’s the source I derived from?” Now imagine that citation was a query that needed to go to a folder in, for example, a storage system because there’s some documents there and a database because, for example, in a part number, just think about there’s a part number document that lists all the part numbers and sits in a drive and then that part number you need to fetch out to say it’s the modem that the guy is coming to repair, and that’s mapped to a table in a database. So what the graph does, we use Gemini, so we don’t need humans, we use Gemini to say, “Hey, go and read all these documents in these drives and extract the information from it and then match that to the database table that has the reference to the part number”, and so then when Gemini turns around and says, “I got this query about how much inventory of modems they are”, the first thing it does is it says, “Okay, go to the Knowledge Catalog and it says modem is part number one, two, three, four, five”, and then it says, “By the way the table in the database that has the inventory information about this part number is this table, here’s a SQL”, it then makes the quality of what we generate higher and then when it answers the question it shows back — back to your, “Trust my data”, it shows a grounding citation saying, “That’s where we got it from.” What do you need from everyone in the ecosystem if this is going to work, all these SaaS applications and across all these entities, not just what’s in your databases, but what’s in a SAP database or whatever it might be. How do you get them on board so you can understand their data and build this Knowledge Catalog? TK: Really easy, the first thing is to use the lakehouse we support a standard format, industry is very standardized on it, it’s called Iceberg , so anybody who supports Iceberg we can talk to it and so that’s pretty much the whole world right now, so we don’t need them to do anything special to make it work. Second, all of these business systems have API specifications, and our Catalog can learn off of those API specifications, we just teach Gemini to process those, and so we can build a catalog pretty quickly. There are reports that OpenAI on Amazon Bedrock has been massively popular. Are we going to get OpenAI on Vertex? TK: We would love to have them. We are announcing a variety of third-party models on Vertex, including Anthropic, including open source, we’re open to any model provider on Vertex. I believe you. That’s going to be great, when and if it happens. Just one last question. We’ve talked in this interview series previously about how I think, and this is before your time, it’s not your fault, that Google Cloud missed the boat in terms of being a point of integration for the Silicon Valley enterprise ecosystem. I think last year I asked you if AI represented a new opportunity to do that. However, is there a bit where the models, and you’re in this game because you have one of the leading models, is just going to eat everything and is going to gradually expand to do the jobs and everyone else is just going to be a system of record? It’s going to be all one interface, that the integration, such that it is, is all under the surface, it’s not necessarily tying things together in user space. Is Gemini going to be all the user needs in the long run? TK: We don’t see it that way. In fact, one announcement you’ll see us make next week is how many third-party SaaS and ISV [independent software vendors] vendors are embedding Gemini not just as a model, but as an agent platform, because they want to build agents and our agent platform, you can use to build agents, not just our own agents, but they can use it and there’s a lot of independent software vendors embedding those agents. And do they see you as like, “Hey, you’re another established guy, let’s go with you because we don’t know what these other folks are up to, they want to eat all of us”? TK: It’s also the capabilities. The differentiation, I would say, is just think about you’re a bank or an insurance company, and think about you’re a SaaS vendor selling to them or an independent software vendor, there’s a number of things around identity, policy management. For example, if you’re a bank and you have documentation about a person and their credit, you cannot have that egress the bank’s boundary, so we have a gateway that protects against that, that’s part of our agent platform. You want to have auditability on the agent to say which agent did what task on what system when, that’s built into the platform. You want to have a registry where you expose all your skills so that people are not duplicate building all these things, we have a registry that does that. This is sort of the bit we started with at the beginning, it’s not just going to benefit your agents it’s going to benefit all agents, that’s sort of the pitch. TK: So one of the things that people like is the fact that we built all that plumbing for them, and so they don’t have to invest in it, they can focus on the value add that they have on their agent side. Additionally, for companies in this broader ecosystem, the cost of agents — and it becomes part of their bill of materials, if you will, the cost of goods sold — the fact that we have these super efficient chips that run inference with such efficiency eventually translates into cost efficiency for a third party that’s building on top of us. You can see that all of those benefits, we’re taking away all that complexity for these guys, so we definitely don’t see that all the ecosystem is going to die, we definitely don’t see that, we see us facilitating that ecosystem. You’ll see us announcing a number of things, including a substantial investment in dollars to accelerate the partner ecosystem around our platform. Thomas Kurian, great to talk to you again. TK: Thanks so much, Ben. And just in closing, the work that we announce every year at Next is a testament to all those customers and partners who gave us a shot to work with them. You’ll see them telling their story, and it’s a testament to all those people at our organization that made a bet to solve a technical problem a different way, or to bring our technology — we’ve hugely expanded our go-to-market organization, and doing all that with growing top line and operating income at the same time is a testament to the demand we see for our products and services. I mean, six, seven years ago, people used to tell us, “You have no shot in the market”, I think we are now truly uniquely positioned. Name one other player that has the stack of technology to do AI, when I look forward, I think there’s no question in people’s minds that the central problem that companies need to solve and technology providers need to solve is how good is the capability you offer for AI. We’re the only ones with chips, models, the context to feed the models from all of the data infrastructure, the cyber tools, and then a world-class agent platform. I would also add, you’re actually an enterprise company now. The things you talked about, pragmatism, listening to customers, all these pieces, GCP did not have at all a decade ago — there’s a bit where Wiz was ahead of its time, for sure, being forward-looking, but there’s a bit where the organization is ready for this moment in a way I don’t think it would have been previously. I find it very impressive. TK: We are very proud of the team. Also for Alphabet, to do AI well, you have to do a couple of things. One, see the breadth of problems that we see, we see all of the consumer problems, we see the enterprise problems, we see the problems that search sees, we see the problems that YouTube needs, we see all those that we’re solving with AI, that gives us a breadth of capability that the model needs to solve, that over time is a real strength because the diversity of problems we’re solving. Second, in order to do AI well, you have to invest, and in order to invest, you need to monetize in as many different ways as possible. I think we are very confident that our team, we do not have any hubris, but we are confident in where we stand. I think it’s very impressive. I look forward to your keynote. TK: Thanks so much Ben, it’s a privilege to talk to you every year and it’s great that you took the time to speak with me. And it’s all recorded, I can promise you that! This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day!

0 views
Zak Knill Yesterday

SSE token streaming is easy, they said

I wrote about AI having ‘durable sessions’ to support async agentic applications, and in the comments everyone said: “Token streaming over SSE is easy” . …so I figured I’d dig into that claim. Agents used to be a thing you talked to synchronously. Now they’re a thing that runs in the background while you work. When you make that change, the transport breaks.

0 views

Spending hard caps

I was catching up with some tech news yesterday and every time I read one of these “I woke up with a USD 18k bill in my Cloud account” articles, I am reminded about how fucking stupid—and predatory—this whole industry can be. The ability to set hard spending caps should be required by law. I think that’s another issue the EU should decide to tackle at some point. If I know I have a budget available, there should be an option for me to configure your service so that you don’t allow me to spend more than that. And if my product or site goes down as a result of that, it’s a choice I get to make. But the reason why hard caps are usually not an option is obvious: companies get to make more money this way. Hurray for capitalism! The sad part is seeing allegedly smart people arguing that no, the actual reason is that it’s a complex problem to solve, and no-one has figured out how to do it yet. An excuse so pathetic that it’s not even worth getting mad about it. There are people discussing plans to build moon bases, put servers in orbit, build digital gods, and yet setting a hard cap on billing is a complex problem to solve. Sure, I believe that. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
annie's blog Yesterday

It’s a lot to process

… everything. I need to know less, but I know more. Trying to cultivate a life which allows me to know less while still participating in society requires me to know more and do more than simply laying back and passively allowing the unending flood of information to drown me. Please note that we are all being drowned. What is it that is drowning us? Information and misinformation. Part of the drowning is the effort required to try to distinguish between the two. You’re trying to keep your head above water and there are waves and in order to not be pulled under by a wave you have to quickly look at it (while it’s looming larger and larger above you) and decide: Real or not real? Looks real. Is it real? Decide! Quick! I think it’s worth noting that when people don’t seem interested in the distinction between real and not real it may not be that they don’t care about what’s real. It may be that their capacity, their energy, their ability to distinguish is less than yours. And here I am. I’m adding to the information by writing this and publishing it. How do I feel about that? Weird. Really terribly weird and odd and disjointed and uncertain. Perhaps it would be better to stfu, one part of me says. Sometimes that is absolutely what is best. But not always. I don’t know about a lot of things. I can’t have that many opinions. I can’t understand that many issues. I can’t research that many topics. And I don’t like the pressure to be certain about things. All the things, all the time. It’s okay to say I don’t know . What I do know: I am real. Here’s a vignette: I’m on my balcony. Of course I’m on my balcony. I love this tiny little space. I mention it. I post photos sometimes, the sunset view through power lines or my feet up on a small table that wasn’t meant to be outdoor furniture. I can hear a kiddo inside talking to his girlfriend on the phone. The traffic, slowing but still there, on the road. The sound of neighbors as they walk in, talking softly. Here’s what I want to tell you. First, let’s imagine you’re here on the balcony too. There’s another chair. Let me know if you want a beverage. We have options. Don’t worry about the cats. I promise they won’t jump off. I want to tell you that I am real and you are real and that’s enough to know right now.

0 views

Debugging WASM in Chrome DevTools

When I was working on the WASM backend for my Scheme compiler , I ran into several tricky situations with debugging generated WASM code. It turned out that Chrome has a very capable WASM debugger in its DevTools, so in this brief post I want to share how it can be used. I'll be using an example from my wasm-wat-samples project for this post. In fact, everything is already in place in the gc-print-scheme-pairs sample. This sample shows how to construct Scheme-like s-exprs in WASM using gc references and print them out recursively. The sample supports nested pairs of integers, booleans and symbols. To see this in action, we have to first compile the WAT file to WASM, for example using watgo : The browser-loader.html file in that directory already expects to load gc-print-scheme-pairs.wasm . But we can't just open it directly from the file-system; since it loads WASM, this file needs to be served with a local HTTP server. I personally use static-server for this, but you can use anything else - like Python's built-in http.server : Now it can be opened in the browser by following the printed link and selecting the browser-loader.html file. Open the Chrome DevTools, and in Sources , open the Page view on the left. It should have one entry under wasm , which will show the decompiled WAT code for our module. Note: this code is disassembled from the binary WASM, so it will lose some WAT syntactic sugar (like folded instructions): You can set a breakpoint by clicking on the address column to the left of the code, and then refresh the page. The DevTools debugger will run the program again and stop at the breakpoint: Here you can step over, into, see local values and call stack, etc - a real debugger! The most important use case for me while developing the compiler was debugging unexpected exceptions (coming from instructions like ref.cast ). Notice the checkboxes saying "Pause on ... exceptions" on the right-hand side of the previous screenshot. With these selected, the DevTools debugger will automatically stop on an exception and show where it is coming from. Let's modify the gc-print-scheme-pairs.wat sample to see this in action. The $emit_value function performs a set of ref.test checks to see which kind of reference it's dealing with before casting; let's add this line at the very start: It's clearly wrong to assume that $v is a bool reference without first testing it; this is just for demonstration purposes. Without setting any breakpoints, recompiling this code with watgo and reloading the page, we get: The debugger stopped at the instruction causing the exception; moreover, in the Scope pane on the right we can see that the actual type of $v is (ref $Pair) , so it's immediately clear what's going on. I've found this capability extremely valuable when writing (or emitting from a compiler) non-trivial chunks of WASM code using gc types and instructions. "Should I use a debugger or just printfs" is a common topic of debate among programmers. While I'm usually in the "printf debugging" camp, I'm not dogmatic, and will certainly reach for a debugger when the situation calls for it. Specifically, when investigating reference exceptions in WASM, two strong factors tilt the decision towards using a debugger: In general, WASM's printf capabilities aren't great. We can import print-like functions from the host (and - in fact - our sample does just that), but they're not very flexible and dealing with strings in WASM is painful in general. This is compounded even more when working with gc types, because these aren't even visible to the host (they're opaque references). If we want to do printf debugging of gc values, we have to build a lot of scaffolding first. Exception debugging - in general - is much easier with a supportive debugger in hand. Our ref.cast exception from the example above could have happened anywhere in the code. Imagine having to debug a very large WASM program (emitted by a compiler) to find the source of a failed ref.cast ; the debugger takes you right to the spot! In fact, even for C programming, I've always found gdb most useful for pinpointing the source of segmentation faults and similar crashes. In general, WASM's printf capabilities aren't great. We can import print-like functions from the host (and - in fact - our sample does just that), but they're not very flexible and dealing with strings in WASM is painful in general. This is compounded even more when working with gc types, because these aren't even visible to the host (they're opaque references). If we want to do printf debugging of gc values, we have to build a lot of scaffolding first. Exception debugging - in general - is much easier with a supportive debugger in hand. Our ref.cast exception from the example above could have happened anywhere in the code. Imagine having to debug a very large WASM program (emitted by a compiler) to find the source of a failed ref.cast ; the debugger takes you right to the spot! In fact, even for C programming, I've always found gdb most useful for pinpointing the source of segmentation faults and similar crashes.

0 views
Corrode Yesterday

Helsing

Jon Gjengset is one of the most recognizable names in the Rust community, the author of Rust for Rustaceans , a prolific live-streamer, and a long-time contributor to the Rust ecosystem. Today he works as a Principal Engineer at Helsing, a European defense company that has made Rust a foundational part of its engineering stack. Helsing builds safety-critical software for real-world defense applications, where correctness, performance, and reliability are non-negotiable. In this episode, Jon talks about what it means to build mission-critical systems in Rust, why Helsing bet on Rust from the start, and what lessons from his years of Rust education have shaped the way he writes and thinks about production code. CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch. Start for free today and enjoy 40% off any paid plan by using this link . Founded in 2021, Helsing is a European defence company building AI-enabled software for some of the most demanding environments imaginable. Helsing’s software runs where correctness is non-negotiable. That philosophy led them to Rust early on and they’ve leaned into it fully. From coordinate transforms to CRDT document stores to Protobuf package management, almost everything they build ends up being written in Rust. Jon holds a PhD from MIT’s PDOS group, where he built Noria, a high-performance streaming dataflow database, and later co-founded ReadySet to continue that work commercially. He then spent time building infrastructure at AWS, before joining Helsing as a Principal Engineer. Outside of his day job, he’s been teaching Rust to the world through his livestreams and writing for years, which makes him a rare combination: someone who thinks deeply about both how to use Rust and how to explain it. Helsing AI selected for Eurofighter upgrade - Helsing’s Eurofighter Project CA-1 Europa - Helsing’s Autonomous Uncrewed Combat Aerial Vehicle Rust in Python cryptography - Rust being used in a Python library Clippy Documentation: Adding Lints - How to add custom lints to (your own fork of) clippy anyhow’s .context() - Use it everywhere, it’s very very helpful eyre - A fork of with support for customizable, pluggable error report handlers miette - Fancy, diagnostic-rich error reporting for Rust with source snippets and labels buffrs - Helsing’s Cargo-inspired package manager for Protocol Buffers, written in Rust sguaba - Helsing’s Rust crate for type-safe coordinate system math, preventing unit and frame mix-ups at compile time Sguaba: Type-safe spatial math in Rust - Jon’s talk at Rust Amsterdam introducing sguaba and the type-system techniques behind it Apache Avro - A compact binary serialization format for streaming data, with a Rust implementation available via the crate pubgrub - A Rust implementation of the PubGrub version-solving algorithm, as used in Cargo and uv CRDTs - Conflict-free Replicated Data Types: data structures that can be merged across distributed nodes without conflicts ADR (Architecture Decision Record) - A lightweight way to document important architectural decisions and their context DSON: JSON CRDT using delta-mutations for document stores - The 2022 paper that was the basis for Helsing’s CRDT implementation dson - Helsing’s Rust implementation of DSON Jon’s Livestreams on YouTube - Deep-dive Rust coding sessions where Jon implements real-world libraries and systems from scratch WebAssembly with Rust - The official Rust and WebAssembly book, covering a cool technology and useful skills to have as a Rust developer Rust for Rustaceans - Jon’s book for intermediate Rust developers covering ownership, traits, async, and the finer points of the language CVE-2024-24576: Cargo/tar supply chain vulnerability - A security issue in the crate that affected Cargo’s package extraction Wikipedia: Defence in Depth - The security principle of using multiple independent layers of protection; Even with Rust you need multiple layers, there is no silver bullet SBOMs (Software Bill of Materials) - A machine-readable inventory of all components in a software artifact; Cargo’s lock files make this tractable for Rust projects Helsing: AI-assisted vetting of software packages - Make it more efficient to review dependencies you take in Bevy - A game engine built entirely in Rust, and a notable example of a large, complex Rust dependency Tauri - A Rust-powered framework for building lightweight desktop and mobile apps from a web frontend, an alternative to Electron Helsing Website Helsing Tech Blog Helsing on GitHub Helsing on LinkedIn Jon Gjengset’s Website Jon Gjengset on GitHub Jon Gjengset on YouTube Jon Gjengset on Bluesky Rust for Rustaceans

0 views
Giles's blog Yesterday

Writing an LLM from scratch, part 33 -- what I learned from finally getting round to the appendices

After finishing the main body of " Build a Large Language Model (from Scratch) ", I set myself three follow-on goals . The first was training a full GPT-2-small-style base model myself. That was reasonably easy to do but unlocked a bunch of irresistible side quests ; having finally got to the end of those, it's time to move on to the others: reading through the book's appendices, and building my own GPT-2 style model in JAX. This post is about the appendices. The TL;DR: there was stuff in there that could have saved me time in my side-questing, but I think that having to work those things out from scratch probably helped me learn them better. This is an excellent overview of PyTorch, and given that I'm writing for people who are reading the book too, all I can really say is that it's well worth reading, even if you have some experience in it. He gives an intro to what it is, some details on how to choose to use GPUs (or Apple Silicon) if you have them, and an overview of tensors. He then goes on to explain the basics of automated differentiation and back-propagation, with a bit of background detail about the chain rule. I think this bit is useful at a "how-to" level, but the mathematical details felt like they were summarised too briefly to be all that useful. I can see why -- this is an appendix to a book on an adjacent subject, not a textbook on the mathematics of training ML models. But something this brief feels like it would be confusing for people who don't know it already, but not really useful for those that do. Perhaps I'm underestimating the typical reader, but if and when I write up my own explanation of how this works (perhaps as a follow-up to " The maths you need to start understanding LLMs "), I'll go quite a lot slower and try to explain things in more detail. Anyway, as I said, the explanation is more of a bonus in this book, quite far from its main focus, so this is a nit. He then goes on to a high-level explanation of PyTorch's s and s. This was quite useful for me. I must admit that I've been struggling a bit to see the value of DataLoaders -- indexing directly into Datasets has worked very nicely for me. I suspect this is a question of scale more than anything; even my big training runs, 44 hours of training a 163M-parameter model on 3 billion tokens, worked fine without a DataLoader. But after reading this section, I felt I was getting some way towards having more of a handle on how they might help. I'm not quite there yet, but hopefully soon... Next, there are sections on training loops, both with and without GPU support. Nothing new there for me, at least. Then came the real surprise: a really solid walkthrough on training models across multiple GPUs with DistributedDataParallel! That's something I learned from the documentation and various online tutorials back in January , and reading this appendix first would have saved some time. But thinking back on it, I think that the way I did it was better pedagogically for me. By having to grind through it from first principles -- following the docs, coding something, seeing it break, trying again, and eventually getting there -- I think I internalised the knowledge much better. It's a balance, really. If I read explanations, I learn faster, but the knowledge is shallower. Learning by doing is slower but deeper. Working out a good balance is hard. It feels like I've struck a good balance on this one, but I suppose it's difficult to know for sure. The one thing in the DDP section that did stand out for me, though, was the use of a for the . That might have made some of my DDP code a bit simpler! On to the next appendix. I won't go through this in detail; it does what it says on the tin, and there's a bunch of interesting stuff in there. I scanned through and nothing felt like a must-read right now, but I'll be checking it in the future if I'm looking for suggestions for things to read about. Another one that is exactly what it says it is. Once again, something I could have saved time by reading first! In it, he covers gradient clipping, which I went over back in February , and warming up and then doing a cosine decay on the learning rate, which was something I looked into in March . Just like with DDP, I think that having to learn about these from resources I could find on the Internet meant that I got to a deeper understanding than I would have if I'd just been following the book. This is not a point against the book, of course! Again, it's one of those balancing acts: do it yourself and learn more, or read about it and learn faster. Still well worth reading though. This was a really interesting read. I've been reading about LoRA on the side, but most treatments I've seen started with an explanation of the maths, but then essentially said "now, to do it, install PEFT" (or Unsloth, or something similar). Raschka gives the full code, showing how you can write your own LoRA stuff, and I think this is excellent. Digging into it right now would be a side quest, but I'm inspired by it and might do my own LoRA writeup after finishing this LLM from scratch arc. Let's see if I manage that or if I get distracted by something shiny first... The last page in the book. Well, the first page of the index. Done. Wow! But before I start the celebrations, there's one last step. As I said last November , I wanted to: [Build] my own LLM from scratch in a different framework, without using the book. That is, I think, essential, and perhaps would be the crowning post of this series. It would be a nice way to end it, wouldn't it? I think I was right, so that's what's next. I asked people on Twitter which framework I should use, and the winner was JAX -- and so that's what's coming next. Watch this space!

0 views

Exclusive: Microsoft Moving All GitHub Copilot Subscribers To Token-Based Billing In June

Documents viewed by Where’s Your Ed At shed additional light on Microsoft’s transition to token-based billing for GitHub Copilot, as the company grapples with spiraling costs of AI compute. As reported on Monday ( and as announced soon after by Microsoft ), the company has taken the step to suspend new sign-ups for individual and student accounts, has removed Anthropic’s Opus models from the cheapest $10-a-month plan, and plans to further tighten usage limits. According to the documents, the announcement for token-based billing will be tomorrow (4/23), with changes to GitHub Copilot rolling out at the beginning of June. Users will pay a monthly subscription to access GitHub Copilot, and receive a certain allotment of AI tokens based on their subscription level. Organizations paying for GitHub Copilot will have “pooled” AI credits, meaning that tokens are shared across the entire organization. GitHub Copilot Business Customers will pay $19 per-user-per-month and receive $30 of pooled AI credits, and Copilot Enterprise customers will pay $39 per-user-per-month and receive $70 of pooled AI credits. While the documents refer to moving “all” GitHub Copilot users to token-based billing, it’s unclear at this time how Microsoft will be handling individual Pro or Pro+ subscribers. If you liked this news hit and want to support my independent reporting and analysis, why not subscribe to my premium newsletter? It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words, including vast, detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large . I recently put out the timely and important Hater’s Guide To The SaaSpocalypse , another on How AI Isn't Too Big To Fail , a deep (17,500 word) Hater’s Guide To OpenAI , and just last week put out the massive Hater’s Guide To Private Credit . Subscribing to premium is both great value and makes it possible to write these large, deeply-researched free pieces every week.  Internal documents reveal Microsoft’s planned rollout for token-based billing for all GitHub Copilot customers starting in June. Copilot Business Customers will pay $19 per-user-per-month and receive $30 of pooled AI credits. Copilot Enterprise customers will pay $39 per-user-per-month and receive $70 of pooled AI credits. Sources say that these amounts may change before the launch of token-based billing. It is unclear what will happen to individual subscribers. The company is expected to make the announcement on 4/23.

1 views

Systems Thinking Explained

☕ Welcome to The Coder Cafe! In a previous post , I briefly touched on systems thinking after reading Learning Systems Thinking . My honest take: it was an interesting introduction, but I wasn’t fully convinced. The concepts felt abstract, the examples too sparse. Then I read Thinking in Systems by Donella Meadows. It might be one of the best books I’ve read in my career (and it’s not even a computer science book). This post is my own introduction to the core concepts, grounded in a real example from my experience. Get cozy, grab a coffee, and let’s begin! Introduction Have you ever fixed an incident, only to see it come back two weeks later? Or made a change that improved one metric while quietly degrading another? Or spent months firefighting without ever feeling like things were actually getting better? These aren’t signs of bad engineering. They’re signs of reacting to events without understanding the structures that produce them. Understanding those structures requires a different kind of thinking, and that’s exactly what systems thinking is: the ability to shift from reacting to events through responsive patterns of behaviors to generating improved systemic structures. This post is an introduction to systems thinking, covering the core concepts through a real example from my experience at Google. First, let’s define what a system is. In essence, a system is: A set of elements Interconnected To achieve something Distributed systems are an obvious example. For example, a 3-node, single leader database is composed of: 3 nodes (elements) Connections from the leader to the replicas (interconnections) With the goal of storing data reliably over time Interestingly, this is why distributed systems can surprise even their own designers: add enough nodes, replication lag, and competing writes, and the system starts behaving in ways no single component would predict. To reason about how systems change over time, we need two important concepts: A stock is an accumulation of material or information that has built up in a system over time. For example: the number of machines available in a cluster, the size of a message queue, the amount of technical debt in a codebase. A flow is what changes a stock: material or information entering or leaving it. For example: machines being added or removed from service, messages being enqueued and consumed, or requests being received and processed. The key thing to keep in mind: stocks take time to change because flows take time to flow . You can’t instantly restore machine availability or drain a queue with a single action. This has real consequences for how systems behave under pressure. We will come back to it. One of the most important concepts in systems thinking is the feedback loop . A feedback loop is what the system does automatically because its own result feeds back into it. Said differently: If causes , then influences . Let’s take a concrete example. Suppose you live in a house with a central thermostat set at 20°C. It turns the heating on when the temperature drops to 19°C, and off when it reaches 21°C. The feedback loop works like this: : Temperature change : Thermostat turns heating on or off The thermostat turning on or off ( ) is caused by the temperature change ( ). But the temperature change ( ) is in turn influenced by the thermostat ( ). Each effect feeds back into its own cause. This is a feedback loop. There are two kinds of feedback loops. A balancing feedback loop resists change : It pushes the system back toward a goal or limit. Think of it as a stabilizer: when something moves away from the target, the loop acts to bring it back. The thermostat is a perfect example. As the temperature drifts away from 20°C, the thermostat reacts, and the system returns to equilibrium. A reinforcing feedback loop amplifies change : More leads to more, less leads to less. An action produces a result that drives more of the same action, generating growth or decline at an accelerating rate. The YouTube algorithm is a clear illustration: the more a video is viewed, the more the algorithm surfaces it; the more it’s surfaced, the more views it gets. More formally, we can have 4 cases of feedback loops: Balancing ceiling : If causes , then influences Balancing floor : If causes , then influences Reinforcing growth : If causes , then influences Reinforcing collapse : If causes , then influences The more feedback loops a system contains, the more complex and surprising its behavior becomes, especially when those loops interact. An often overlooked but critical property of feedback loops is the delay between an action and its effects . Delays are pervasive in systems and strong determinants of behavior. When the gap between action and effect is long, two things happen: Foresight becomes essential : Acting only when a problem becomes obvious means missing the window to address it early. Oscillations become likely : We overreact because the system hasn’t had time to respond, then overreact again in the other direction. Think of an autoscaler that takes 3 minutes to provision new instances. By the time the new capacity is ready, the traffic spike has already peaked. The window to act had opened before the problem was even visible on the dashboard. This is why foresight matters: when there is a significant delay between action and effect, reacting to what you see now means always acting too late. And the consequences compound. The autoscaler, still responding to the old signal, overshoots. Then it sees too much capacity and scales down, right before the next spike arrives. One example, two problems: a system that needed foresight got a reaction, and then oscillated because of it. The delay didn’t change the goal. It made the system work against itself. System boundaries are artificial . They help us frame a problem, but in reality, everything is interconnected. The boundaries we draw determine what we see and, therefore, what we miss. Consider a microservices architecture in which each team owns a service. Every team has solid SLOs, careful on-call rotations, and clean dashboards. And yet end-to-end latency keeps creeping up, and users are complaining. Each team looks at its own service and sees green. The problem is that the boundary is wrong; no one is looking at the system as a whole . This is one of the most common traps in engineering: optimizing within a boundary while the real issue lives outside it. Before changing a system, it is worth asking: Am I looking at the right boundary? When something goes wrong in a system, what do we actually see? Usually just the surface: an incident, a spike, an outage. The iceberg model gives us a way to think beneath it. The model has four levels: Events are what’s visible: the incident alert, the latency spike on the dashboard. This is where most of our attention goes, and where reactive thinking lives. Patterns and trends are what you find when you zoom out. Has this happened before? At what frequency? Under what circumstances? Patterns reveal that what felt like a one-off event is actually part of a larger rhythm. Structure is the underlying system design: the feedback loops, the incentives, the processes that produce the patterns. You can’t fix a pattern without understanding the structure that generates it. Mental models are the beliefs and assumptions that shaped the structure in the first place. They’re the hardest to see and the hardest to change. Credits Most incident response lives at the event level. Systems thinking asks us to go deeper. As an SRE, this model resonates: we’re trained not just to react to incidents but to understand the why: the patterns, the structures, and eventually the assumptions that caused them. Let me now bring all of these concepts together through a concrete example from my previous role at Google, where I worked on the systems powering Google’s ML infrastructure. I was heavily focused on a system called the Safe Removal Service 1 (SRS). This service had a simple API and one core responsibility: to say yes or no when another system requested permission to disrupt a given entity . Indeed, most disruptive services at Google, the ones that reboot machines, drain jobs, or take clusters offline, were designed to ask this service before acting. In our context, the key constraint was preserving capacity, meaning ML TPUs and GPUs. For example, within a given cluster, at least 90% of TPUs must remain available at all times. So if 95% were currently available, SRS could approve disruptions, as long as availability didn’t drop below 90%. NOTE : The threshold values and other details have been altered for confidentiality reasons. The API was deliberately simple: “ Can I reboot this machine? ” → Yes/No “ Can I drain this job? ” → Yes/No “ Can I take down this cluster? ” → Yes/No SRS implemented several balancing feedback loops . For example, when available capacity dropped toward 90%, the service would start refusing disruptive requests, pushing availability back up. This was the primary loop: a governor that kept the system in a safe zone. There was also an implicit reinforcing loop on the positive side: by allowing maintenance to proceed when capacity was healthy, the service enabled machines to be upgraded, patched, and kept in good shape, which in turn kept capacity high. So far, so good. But here’s where it gets interesting. The balancing loop protected current capacity. What it didn’t account for was what happened when capacity was already constrained. When available capacity hovered near 90%, SRS would block most maintenance requests. Machines couldn’t be patched. Hardware with known error trends couldn’t be swapped. Security upgrades were deferred. Maintenance debt accumulated, silently, invisibly. This created a first hidden reinforcing loop: Less capacity → Deferred maintenance → More failures → Even less capacity The balancing loop was actively feeding the very problem it was trying to prevent. A second reinforcing loop emerged from human behavior: Low capacity → More incidents → Bypass mechanisms invoked → Riskier actions taken → Capacity lower still When the system was under stress, operators would sometimes override SRS to unblock critical work. Each bypass, reasonable in isolation, eroded the safety margins that the balancing loop was designed to protect. There’s a principle from Thinking in Systems that describes this precisely: System behavior is particularly sensitive to the goals of feedback loops . If the goals—the indicators of satisfaction of the rules—are defined inaccurately or incompletely, the system may obediently work to produce a result that is not really intended or wanted. Specify indicators and goals that reflect the real welfare of the system . Be especially careful not to confuse effort with result or you will end up with a system that is producing effort, not result. SRS was measuring the right-looking metric: current capacity. But the current capacity was not the same as the real health . A cluster at 92% availability, accumulating maintenance debt and hardware errors, was far more fragile than a cluster at 91% that was fully patched and stable. The balancing loop couldn’t tell the difference. The deeper fix wasn’t just tuning the threshold. It was making the controller health-aware, not just capacity-aware . Rather than gating only on “ % available right now ,” the system needed to incorporate slow indicators: maintenance backlog growth rate, share of fleet on known-bad firmware versions, hardware error trendlines, override and bypass rates. By the time the reinforcing loops made their effects visible, the stock (cluster health) had already been degrading for weeks. The delay between cause and effect made the problem invisible until it was expensive to fix. This example was not about a flawed design. It was about a structure that, taken as a whole, was quietly working against itself. A system is a set of elements interconnected to achieve a goal. Stocks are accumulations that change over time through flows; stocks take time to change. A feedback loop occurs when an effect feeds back into its own cause. Balancing feedback loops resist change and push toward equilibrium; reinforcing feedback loops amplify change. Delays between action and effect can cause oscillations and make problems invisible until too late. System boundaries are artificial; the boundary we draw determines what we see and miss. The iceberg model: events are visible, but patterns, structure, and mental models lie beneath. System goals must reflect real welfare, not just what’s measurable; inaccurate goals lead to unwanted behaviors. A well-designed balancing loop can mask hidden reinforcing dynamics. The most dangerous moment is when a system appears to be working. AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI won’t replace. Written by a Google SWE, trusted by thousands of engineers worldwide. Working on Complex Systems Probabilistic Increment Thinking In Systems Learning Systems Thinking Leverage Points: Places to Intervene in a System ❤️ If you enjoyed this post, please hit the like button. 💬 Have you ever built or maintained a system that looked healthy on the dashboard while something was quietly accumulating underneath? Leave a comment I already mentioned that service in a previous post. You can find more information in this whitepaper: VM Live Migration At Scale . Introduction Have you ever fixed an incident, only to see it come back two weeks later? Or made a change that improved one metric while quietly degrading another? Or spent months firefighting without ever feeling like things were actually getting better? These aren’t signs of bad engineering. They’re signs of reacting to events without understanding the structures that produce them. Understanding those structures requires a different kind of thinking, and that’s exactly what systems thinking is: the ability to shift from reacting to events through responsive patterns of behaviors to generating improved systemic structures. This post is an introduction to systems thinking, covering the core concepts through a real example from my experience at Google. What Is a System? First, let’s define what a system is. In essence, a system is: A set of elements Interconnected To achieve something 3 nodes (elements) Connections from the leader to the replicas (interconnections) With the goal of storing data reliably over time A stock is an accumulation of material or information that has built up in a system over time. For example: the number of machines available in a cluster, the size of a message queue, the amount of technical debt in a codebase. A flow is what changes a stock: material or information entering or leaving it. For example: machines being added or removed from service, messages being enqueued and consumed, or requests being received and processed. : Temperature change : Thermostat turns heating on or off A balancing feedback loop resists change : It pushes the system back toward a goal or limit. Think of it as a stabilizer: when something moves away from the target, the loop acts to bring it back. The thermostat is a perfect example. As the temperature drifts away from 20°C, the thermostat reacts, and the system returns to equilibrium. A reinforcing feedback loop amplifies change : More leads to more, less leads to less. An action produces a result that drives more of the same action, generating growth or decline at an accelerating rate. The YouTube algorithm is a clear illustration: the more a video is viewed, the more the algorithm surfaces it; the more it’s surfaced, the more views it gets. Balancing ceiling : If causes , then influences Balancing floor : If causes , then influences Reinforcing growth : If causes , then influences Reinforcing collapse : If causes , then influences Foresight becomes essential : Acting only when a problem becomes obvious means missing the window to address it early. Oscillations become likely : We overreact because the system hasn’t had time to respond, then overreact again in the other direction. Events are what’s visible: the incident alert, the latency spike on the dashboard. This is where most of our attention goes, and where reactive thinking lives. Patterns and trends are what you find when you zoom out. Has this happened before? At what frequency? Under what circumstances? Patterns reveal that what felt like a one-off event is actually part of a larger rhythm. Structure is the underlying system design: the feedback loops, the incentives, the processes that produce the patterns. You can’t fix a pattern without understanding the structure that generates it. Mental models are the beliefs and assumptions that shaped the structure in the first place. They’re the hardest to see and the hardest to change. Credits Most incident response lives at the event level. Systems thinking asks us to go deeper. As an SRE, this model resonates: we’re trained not just to react to incidents but to understand the why: the patterns, the structures, and eventually the assumptions that caused them. A Concrete Example: Safe Removal Service Let me now bring all of these concepts together through a concrete example from my previous role at Google, where I worked on the systems powering Google’s ML infrastructure. I was heavily focused on a system called the Safe Removal Service 1 (SRS). This service had a simple API and one core responsibility: to say yes or no when another system requested permission to disrupt a given entity . Indeed, most disruptive services at Google, the ones that reboot machines, drain jobs, or take clusters offline, were designed to ask this service before acting. In our context, the key constraint was preserving capacity, meaning ML TPUs and GPUs. For example, within a given cluster, at least 90% of TPUs must remain available at all times. So if 95% were currently available, SRS could approve disruptions, as long as availability didn’t drop below 90%. NOTE : The threshold values and other details have been altered for confidentiality reasons. The API was deliberately simple: “ Can I reboot this machine? ” → Yes/No “ Can I drain this job? ” → Yes/No “ Can I take down this cluster? ” → Yes/No A system is a set of elements interconnected to achieve a goal. Stocks are accumulations that change over time through flows; stocks take time to change. A feedback loop occurs when an effect feeds back into its own cause. Balancing feedback loops resist change and push toward equilibrium; reinforcing feedback loops amplify change. Delays between action and effect can cause oscillations and make problems invisible until too late. System boundaries are artificial; the boundary we draw determines what we see and miss. The iceberg model: events are visible, but patterns, structure, and mental models lie beneath. System goals must reflect real welfare, not just what’s measurable; inaccurate goals lead to unwanted behaviors. A well-designed balancing loop can mask hidden reinforcing dynamics. The most dangerous moment is when a system appears to be working. Working on Complex Systems Probabilistic Increment Thinking In Systems Learning Systems Thinking Leverage Points: Places to Intervene in a System

0 views

Automation conformity

Rollo May asserts plainly in the opening pages of The Meaning of Anxiety that anxiety in fact has meaning, and that our aim cannot be to eliminate it but to work with it, and through it, to use it to propel our creativity and vigor for life. And yet, anxiety is often deeply, even intolerably, unpleasant, and the effort to embrace it can test us beyond our abilities. We are wont, then, to look for an escape hatch, an easy path to relief; but those paths always come with a cost. It is to be expected that certain “mechanisms of escape” from the situation of isolation and anxiety should have developed. The mechanism most frequently employed in our culture, [Erich] Fromm believes, is that of automation conformity. An individual “adopts entirely the kind of personality offered to him [ sic ] by cultural patterns; and he therefore becomes exactly as all others are and as they expect him to be.” This conformity proceeds on the assumption that the “person who gives up his individual self and becomes an automaton, identical with millions of other automatons around him, need not feel alone and anxious any more.” It does not take a hard look to spot the evidence of this conformity in our own time. Millions of nearly identical LinkedIn posts, all saying the same thing, in the same jittery staccato, the same strained performance of revelation when in fact nothing at all is being revealed. (I am picking on LinkedIn here, but it is symptom of this phenomena, not the cause.) Worse, we now have chatbots who will produce and reproduce this pablum at scale, bringing a kind of double-edge to that conformity: we conform when we use those tools, when we accede to the assertion of their inevitability; and we conform again when we place them in our mouths and in our minds, when we outsource our speech and thinking to them. We become automatons twice over. Fromm and May here posit that when we make this trade, when we adopt those cultural patterns, we give up our unique selves in exchange for a relief of anxiety. In the light of our current drive for automation, I will make a counter proposal: we give up our unique selves in the hope that it will bring some relief, but that relief is ever deferred. For at present, becoming an automaton nearly guarantees that you will be left out to dry, as the promise of so-called AI is that the more you use it, the more it uses you . Such that in the act of becoming automatons, we bring ourselves that much closer to the thing we really fear: being left alone, without any of the care or materials we need to survive. We give up our individual selves for the appearance of security, without any of the conditions that can actually create it. This is the trap anxiety lays for us: in our effort to escape it, we run further into its jaws. But perhaps there are yet alternatives. May connects that impulse to escape with the experience of isolation: can we become less isolated without becoming automatons? Can we find community not in the center, but on the outskirts, among the weirdos and the outsiders, the people who never seem to fit in, who are always playing a different game? There are fewer of them, by definition, but not so few that we cannot find them. We won’t find the comfort of the majority among them, of course—but as we have seen, that comfort is mere illusion—but perhaps we can find the community and camaraderie that is so necessary for our survival, and without giving up our precious selves to get it. View this post on the web , subscribe to the newsletter , or reply via email .

0 views

The Meaning of Anxiety

Rollo May refutes the assertion that mental health is living without anxiety, proposing instead that anxiety is a necessary condition for creativity, intellect, and freedom. He defines anxiety as the “experience of Being affirming itself against Nonbeing,” as that which propels us to more self-awareness, consciousness, and life. He likewise shows that the refusal to embrace this anxiety, to attend to it and work with it and through it, is an invitation to authoritarianism and fascism. When we lack the skills of being with our anxiety, and feel our only option is to flee, we often flee right into the hands of a strongman who promises security at the cost of liberty. May wrote during the height of fascism in the last century; we read it during the renewal of the same in this one. The lessons hold. View this post on the web , subscribe to the newsletter , or reply via email .

0 views
iDiallo Yesterday

How to Come Up With Great Ideas

There's a story about an art teacher who divides a class into two groups. The first group is given one task. Design a single, perfect pot. The second group has a different instruction entirely. Make as many pots as you can before time is up. The first group measures, plans, and deliberates. They sketch ratios, debate proportions, and handle the clay with care. They have one shot, so they treat it like one. The second group just makes pots. Terrible ones. pots that collapse on themselves, crack at the base, and lean sideways. They don't stop to mourn any of them, they just start the next. When time expires, the results are revealed. The first group has a pot... technically. But it wobbles. Before the session is over, it breaks. The second group's first pot is a disaster. But their second is better. Their third, better still. By the time they've burned through a dozen attempts, they've internalized something the first group never had the chance to learn. What doesn't work. Their final pot is flawless The second group won not because they were more talented, but because they were given the opportunity to experiment and gain experience. When it comes to writing, it's tempting to believe you need the entire story mapped out before you begin. That you need to fully understand the premise, complete the research, and know the ending before the first sentence. That's never been true for me. When I start writing, I often don't know what it's going to be until the words appear on the page. The research doesn't precede the writing. It follows it. I remember watching the Pixar documentary about Finding Nemo. They produced hundreds of story sketches and character drawings that never made it into the film. The director admitted that a lot of those ideas were pretty terrible. But without those sketches, they wouldn't have arrived at the final story. They learned from the volume of attempts. And we got to watch the final high quality result. I've been following what I call the 100-times rule . For any new skill, or anything new really, I give myself permission to do it 100 times before judging the results. The goal is simple. I want to narrow the gap between idea and execution. When I was learning to program, I'd open a blank JavaScript file and just start writing. Half the time, I'd finish a session with something unrelated to what I started. A half-baked startup idea buried in the comments, or a function that suggested an entirely new project. The work generated the ideas, not the other way around. When I decided to write regularly, I had about a dozen ideas ready to go. I dreaded the moment I'd exhaust them. But by the time I published the twelfth post, I had several dozen more. They didn't come from some secret source of inspiration. They came from momentum. Writing regularly has trained my mind to notice things worth writing down. Conversations, observations, small frustrations, fleeting questions. The ideas were always there. I just hadn't built the habit of catching them. This is the part most people get backwards. They wait for a great idea before they begin. But great ideas are rarely the starting point. They're the result of effort. You don't think your way to good work. You train yourself to good thinking by using repetition. When I have an idea now, I don't wait for the right moment. I write it down immediately in my phone, in a note, sometimes directly on my blog before I fully understand it. Publishing something unpolished is uncomfortable. But an unpolished idea that's out in the world can be refined, responded to, and built upon. A perfect idea that lives only in your head cannot. In fact, I've forgotten so many great ideas! Since the beginning of last year, I've written at least 230 articles. When I run low on new ideas, I have hundreds of old posts I can return to, deepen, and remake into something more considered. The volume created the options. If there is any secret to coming up with great ideas it's this: Start before you're ready. The first group in that classroom failed not because they were less capable, but because they optimized for perfection at the expense of experience. Every moment they spent planning was a moment they weren't learning what the clay actually does in your hands. If you're waiting for the perfect idea, the right time, or enough certainty before you begin then you are the first group. You have one pot and you're protecting it. Make more pots.

2 views

High-Quality Chaos

As I have been preparing slides for my coming talk at foss-north on April 28, 2026 I figured I could take the opportunity and share a glimpse of the current reality here on my blog. The high quality chaos era, as I call it. I complained and I complained about the high frequency junk submissions to the curl bug-bounty that grew really intense during 2025 and early 2026. To the degree that we shut it down completely on February 1st this year. At the time we speculated if that would be sufficient or if the flood would go on. Now we know. In March 2026, the curl project went back to Hackerone again once we had figured out that GitHub was not good enough. From that day, the nature of the security report submissions have changed. The slop situation is not a problem anymore. AI slop rate The report frequency is higher than ever. Recently it’s been about double the rate we had through 2025, which already was more than double from previous years. Number of hours between security reports The quality is higher. The rate of confirmed vulnerabilities is back to and even surpassing the 2024 pre-AI level, meaning somewhere in the 15-16% range. Confirmed vulnerability rate In addition to that, the share of reports that identify a bug, meaning that they aren’t vulnerabilities but still some kind of problem, is significantly higher than before. Share of reports that were bugs, not vulnerabilities Everything is AI now Almost every security report now uses AI to various degrees. You can tell by the way they are worded, how the report is phrased and also by the fact that they now easily get very detailed duplicates in ways that can’t be done had they been written by humans. The difference now compared to before however, is that they are mostly very high quality. The reporters rarely mention exactly which AI tool or model they used (and really, we don’t care), but the evidence is strong that they used such help. I did a quick unscientific poll on Mastodon to see if other Open Source projects see the same trends and man, do they! Friends from the following projects confirmed that they too see this trend. Of course the exact numbers and volumes vary, but it shows its not unique to any specific project. Apache httpd, BIND, curl, Django, Elasticsearch Python client, Firefox, git, glibc, GnuTLS, GStreamer, Haproxy, Immich, libssh, libtiff, Linux kernel, OpenLDAP, PowerDNS, python, Prometheus, Ruby, Sequoia PGP, strongSwan, Temporal, Unbound, urllib3, Vikunja, Wireshark, wolfSSL, … I bet this list of projects is just a random selection that just happened to see my question. You will find many more experiencing and confirming this reality view. When we ship curl 8.20.0 in the middle of next week – end of April 2026, we expect to announce at least six new vulnerabilities. Assuming that the trend keeps up for at least the rest of the year, and I think that is a fair assumption, we are looking at an estimated explosion and a record amount of CVEs to be published by the curl project this year. We might publish closer to 50 curl vulnerabilities in 2026. Number of published vulnerabilities Given this universal trend, I cannot see how this pattern can not also be spotted and expected to happen in many other projects as well. The tools are still improving. We keep adding flaws when we do bugfixes and add new features. Someone has suggested it might work as with fuzzing, that we will see a plateau within a few years. I suppose we just have to see how it goes. This avalanche is going to make maintainer overload even worse. Some projects will have a hard time to handle this kind of backlog expansion without any added maintainers to help. It is probably a good time for the bad guys who can easily find this many problems themselves by just using the same tools, before all the projects get time, manpower and energy to fix them. Then everyone needs to update to the newly released fixed versions of all packages, which we know is likely to take an even longer time. We are up for a bumpy ride.

0 views
マリウス Yesterday

Privacy Setup for Android 16 with GrapheneOS

GrapheneOS is a free and open-source mobile operating system, built on top of the Android Open Source Project (AOSP) but with a strong focus on privacy and security. It’s developed independently, with no ties to Google or any hardware vendor, and it’s the operating system I’ve been recommending (and using on my own devices) for years, both on the phone side and on the tablet side . Compared to the Android you get out of the box on a new Samsung Galaxy , nothing phone or even Google Pixel , GrapheneOS is a fundamentally different thing. Where stock Android ships deeply integrated with Google ’s services, that constantly sync contacts, calendars, search history, advertising identifiers, approximate location, and trickle telemetry back to Mountain View , GrapheneOS strips all of that out by default. Where vendor Android additionally ships with preloaded apps from Facebook , Microsoft , Amazon and the manufacturer’s own ecosystem, each with their own telemetry pipeline, GrapheneOS ships with almost nothing at all. And where stock Android relies on Google for things like push notifications, attestation, captive portal checks and time synchronization, GrapheneOS routes these through its own infrastructure, or makes them optional entirely. On top of that, GrapheneOS adds a substantial amount of hardening at every layer of the stack, from a hardened memory allocator and stricter sandboxing rules, all the way up to user-facing tools like per-app network and sensor permission toggles that simply don’t exist on stock Android. In short, GrapheneOS is what Android could look like if the people building it weren’t in the business of selling your data. And because it’s open source, independently audited and developed with a clear threat model in mind, it has earned the trust of journalists, activists, engineers and plenty of ordinary people who simply don’t want their phone to be a surveillance device. With all that said, there’s a common misconception that I keep encountering, that simply flashing GrapheneOS onto a compatible device is enough to magically protect its owner from Big Tech or other adversaries spying on them and their data. While GrapheneOS goes to great lengths to disable and circumvent the tracking that smartphone vendors like Google usually build into their Android phones, and hardens various aspects of the system on top of that, the main cause for concern is usually less the bare naked Android system, but more often than not the apps running on top of it. If you are using apps like Facebook , TikTok , Outlook and Amazon , the surveillance happens within these apps and platforms, regardless of what operating system they’re running on. Common questions from others that I’m encountering with regard to the use of GrapheneOS are along the lines of “I need to use this banking app on my phone, can I do that with GrapheneOS?” , or “I need to use Microsoft Teams for work, does GrapheneOS support it?” . While many of these questions can be answered with yes , there’s a fundamental issue with this approach, in which people think that if only they switch the base operating system of their smartphone, all of the sudden they will become invisible to the companies behind these apps. This is sadly a misconception. The operating system is, albeit an important part, only one layer of the stack. Flashing GrapheneOS protects you from a lot of what Google bakes into stock Android, and it adds a surprising amount of defense in depth via things like the hardened memory allocator , the network permission toggle or storage scopes . What it cannot do, however, is change what the apps you install are sending to their backends. If you depend heavily on using apps that are inherently privacy-invasive, it doesn’t make much sense to limit yourself to the few devices that an operating system like GrapheneOS is able to run on, and then go through all the hoops of getting the apps that you need to work on those devices. In such a case, compartmentalization is the better approach: Run these type of apps on e.g. a modern iOS device, which is a platform with industry leading out-of-the-box security for the average user, and only use a GrapheneOS device for the apps and platforms that you have full control over or can reasonably trust to not spy on you. This is in my opinion the most important mental model to internalize before starting down this path. The goal isn’t “one device that does it all, perfectly private” , as that device doesn’t exist and chasing it will only give you a false sense of privacy. The goal is to make sure that the device which lives in your pocket, the one that knows where you drive, where you sleep and who you talk to, is running a minimal, trustworthy and hardened stack. Everything that brings known spyware into the mix, like corporate communication suites, banking apps, rideshare apps, airline loyalty clients, food delivery apps, all the usual suspects, belongs on a separate, deliberately untrusted device. That device can happily be a stock iPhone or a stock Pixel. Don’t fight that reality, use it in the most minimal way possible. That device does not need a copy of your full address book and calendar, nor does it needs access to your primary password vault. And it most certainly doesn’t need your family vacation photos or your Taylor Swift concert videos. It can co-exist just fine on a dedicated SIM card, with a dedicated phone number and everything else that the corporate you needs. Using the spyware device in such a conscious way ultimately benefits your privacy alter-ego , as it maintains a public persona of yourself that hAs NoThInG tO hIdE . Many people recoil at the idea of carrying two phones, but in practice the spyware device rarely needs to leave your desk or (Faraday-)bag. You pull it out when you need to check in for a flight, pay a bill, submit an expense report or hop on a corporate video call. For everything else, the GrapheneOS device is more than sufficient. And because it doesn’t carry the weight of two dozen chatty apps, its battery life and overall responsiveness will improve dramatically as a side effect. However, because life is never as clear cut as this, with Android 16 there is a new Private Space feature that can be utilized to further compartmentalize apps within the same device. Private Space is essentially a separate user, nested inside of the owner user, with its own isolated storage, its own set of installed apps and its own work/background state. The apps inside a Private Space don’t share any common data with the rest of your apps and they don’t even necessarily share the same network routes. Therefor, if you are using a VPN on your main profile, your Private Space apps won’t see this and hence won’t be using the connection, and vice-versa. That last bit is worth pausing on. You can have a completely different VPN configuration, a completely different set of DNS settings and, effectively, a completely different exit IP for the apps inside your Private Space , without having to juggle user profiles via the lockscreen. When the space is locked , the apps inside it are frozen, their processes are torn down, their notifications are silenced and their icons disappear from the app drawer and the recents view. When the space is unlocked , it’s as if you briefly teleported to a second phone, used the app you needed, and then went back. Examples of apps which would make sense to run inside the Private Space would be for example the Uber app. This app contains your private information (name, payment info) and is something you don’t want to be running in the background 24/7, as you quite likely only need it sporadically, whenever you have to hail a ride. By installing Uber only inside the Private Space , it will only be allowed to run once you unlock the space. You don’t need to worry about Uber continuing to track your location after you completed your ride ever again. A similar argument can be made for a messenger like WhatsApp . I would not recommend relying on WhatsApp as your primary means of communication, but if you have that one group chat with family members that absolutely refuses to move off WhatsApp , or that one client who insists on sending you voice notes there, installing it inside the Private Space and only unlocking it when you actually need to check in is a reasonable middle ground. You get the communication channel, Meta doesn’t get a background service on your primary profile 24/7. However, this approach clearly only makes sense for apps that you only need to use sporadically or in emergency situations in which you might not have your dedicated spyware device with you. If you need to use something like Microsoft Teams on a constant basis, putting it into the confined Private Space might not make much sense as, unless the space is unlocked, the app won’t deliver message notifications. The official AOSP documentation even carries a warning that Private Space is not suitable for apps that need to run in the background or send critical notifications, such as medical apps. Treat it as the right tool for “occasional use” , not as a replacement for proper profile hygiene. People new to GrapheneOS often ask how Private Space differs from the traditional secondary user profiles that GrapheneOS has supported for years. The short answer is that Private Space is strictly more convenient, and secondary profiles are strictly more isolated. Secondary user profiles have their own encryption keys, derived from their own unlock credential. When you switch out of a profile or, even better, explicitly end the session of the profile, its data goes back to rest on disk and no longer resides in memory in a decrypted state. Private Space , on the other hand, lives inside the owner profile and piggybacks on its encryption context. When the owner profile is unlocked, the mere existence of data inside the Private Space can be inferred, even if the contents themselves remain protected. For most threat models this difference is purely academic, but it’s worth being aware of. In practice, my recommendation, and the one GrapheneOS itself tends to partially make , is roughly as follows: If you’re coming from a setup that relied solely on secondary profiles, you’ll notice that Private Space eliminates the lockscreen dance for the casual apps, while leaving the cryptographic isolation of secondary profiles available for the things that truly warrant it. The GrapheneOS installation itself is a breeze and, in my experience, the easiest way to put a non-stock operating system onto a smartphone. No , no , no fiddling with recovery images or sideloading obscure ZIPs. You unlock the bootloader, connect the phone to a computer and open GrapheneOS’ WebUSB installer in a compatible browser. From there, the installer walks you through the individual steps. The whole process takes around fifteen minutes and results in a factory-fresh GrapheneOS device. Make sure your device is in the list of officially supported models . Up until nowUp until now GrapheneOS specifically targets Google Pixel phones because Pixels offer verified boot with user-controllable root-of-trust, proper firmware and driver updates, the Titan M2 security chip and a bunch of other hardware-level properties that other Android vendors simply don’t match. This, however, is supposed to change with compatible devices from Motorola hitting the market in 2027. Running a “privacy ROM” on an unsupported device is in many ways worse than running stock Android, since you lose verified boot and in some cases even timely security patches. Once the device boots into GrapheneOS for the first time, resist the urge to immediately install all the apps you’re used to. Walk through the setup wizard, set a strong PIN or passphrase (six digits minimum!) and then, before doing anything else, spend fifteen minutes in the settings. This is the part most guides gloss over. GrapheneOS ships with sensible defaults, but a handful of additional tweaks can noticeably harden the device against both remote and physical threats. GrapheneOS adds a network permission toggle that appears on the install dialog of every new app and as a toggle in the app’s permissions screen. Habitually uncheck network access for any app that has no business talking to the internet. A gallery viewer, a calculator, a local file manager, a launcher, none of these should need network access. It’s a tiny friction with a disproportionately large effect on the amount of telemetry and personal data leaving your device. The sensors permission toggle covers everything the regular Android permissions don’t: Accelerometer, gyroscope, compass, barometer, and so on. You can block these on a per-app basis, which is particularly valuable for apps that have no legitimate reason to know how often you pick up your phone. GrapheneOS also exposes quick-toggle tiles for the camera and microphone in the pull-down menu, which cut access at the system level rather than the per-app level and are convenient for walking into a sensitive meeting or leaving the phone on the nightstand. Under Settings ➔ Network & internet ➔ Private DNS you can point the system resolver at a DNS-over-TLS provider of your choice. Quad9 , Mullvad DNS and NextDNS are all reasonable options. Cloudflare is (sadly) GrapheneOS’ default fallback. If you run your own recursive resolver, which I’d argue is the gold standard, even better. Keep in mind that the Private DNS hostname is looked up once via plaintext, so use a provider you’re okay briefly touching in the clear. With the base system locked down, it’s time to think about what actually goes on it. My general recommendation is to solely use F-Droid for free-software apps. Yes, F-Droid has its well-documented issues as is far from perfect, but for technically literate users who can read source code it remains the best option available in terms of provenance and privacy. For a browser, Vanadium is the default and the safest pick from a pure security standpoint, as it’s a hardened Chromium fork maintained by the GrapheneOS team, with strict site isolation, JIT disabled by default and a per-site JavaScript toggle. The main tradeoff is the lack of proper extension support, which rules out more sophisticated blocking support. If that’s a dealbreaker, install Cromite alongside Vanadium and reserve it for sites where you really need content blocking, while keeping Vanadium as your default for general browsing and anything sensitive. Also make sure to disable JavaScript by default and only enable it for sites that you know and trust! Once the setup is done, the real work is maintaining the discipline. A few habits that have served me well over the years: GrapheneOS on a recent Pixel remains, in my opinion, the closest thing to a genuinely private and secure mobile device that a non-state-actor can own today, despite Google ’s hardware being absolute garbage from quality control and performance perspectives. What GrapheneOS is not , however, is a magic spell that undoes the surveillance business models of the companies whose apps we’ve allowed into our lives. If you take one thing away from this post, let it be the compartmentalization mindset. Use a dedicated stock iOS or Android device for the stuff that absolutely demands surveillance-laden apps like banking portals that only ship as an app, corporate messaging suites, airline loyalty programs, food delivery, and rideshare. Use your GrapheneOS device for everything else, and save the Private Space on that GrapheneOS device for the in-between category, the apps you genuinely only need once in a while, like Uber while traveling, or a messenger like WhatsApp that a handful of people in your life refuse to leave behind. Reserve secondary user profiles for the hard cases that require Google services but that you don’t want bleeding into your daily profile. For new GrapheneOS users, the temptation will be to replicate your old app collection one-to-one. Don’t. Treat the move as an opportunity to audit what you actually need, and keep the owner profile as boring and empty as possible. For experienced users, the addition of Private Space in Android 16 is, I think, the single biggest quality-of-life improvement in years. It lets you retire a bunch of those one-off secondary profiles you created for “that one app” , without giving up meaningful isolation. Revisit your profile layout, consolidate where it makes sense, and lock the rest away behind a space that is off until you explicitly ask for it. None of this replaces thinking about your own threat model, your own habits and the people you communicate with. But on top of a thoughtful threat model, GrapheneOS with Android 16 is sadly about as good as it gets. Footnote: The cover image is a parody ( “meme” ) made from a screen capture of Google ’s Made by Google event with Jimmy Fallon . The host sadly did not publicly endorse GrapheneOS the same way he e.g. endorsed the highly questionable Bored Ape NFT . Owner profile: Lean, minimal, no Google services. F-Droid, trustworthy apps, a solid browser like Vanadium or Cromite . Secondary user profile for sandboxed Google Play : Install sandboxed Google Play here, along with the handful of apps that genuinely require Play Services, like certain banking apps. Keep this profile as small as possible, enable notifications so you don’t miss a transfer confirmation, and end the session whenever you’re done. Private Space inside the owner profile: The occasional use bucket. Uber, Lyft, food delivery, maybe WhatsApp for that one stubborn contact, loyalty apps that you open once a quarter. Lock it when you don’t need it. Auto-reboot: Settings ➔ Security & privacy ➔ Auto reboot . By default GrapheneOS reboots the device after 18 hours of being locked, putting all data back at rest and rendering cold-boot and many forensic attacks significantly harder. I personally lower this to eight or twelve hours. Duress PIN: Settings ➔ Security & privacy ➔ Device unlock ➔ Duress Password . This lets you configure an alternate PIN or password that, when entered on the lockscreen, irreversibly wipes the device in the background without any warning or confirmation. Useful if you’re ever in a situation where you’re compelled to hand over the device unlocked. Lockdown: The standard Android lockdown action (long-press the power button ➔ Lockdown ) disables biometrics and notification previews until the next successful PIN/passphrase entry. Make this a reflex whenever you hand the phone to someone or walk into a situation where you might be compelled to unlock it with your face or fingerprint. PIN scrambling and two-factor fingerprint unlock: Both are available in the lockscreen settings. The former randomizes the keypad layout to defeat shoulder-surfing, the latter requires a PIN after the fingerprint as a second factor. USB-C port control: Settings ➔ Security & privacy ➔ More security & privacy ➔ USB-C port . Set this to Charging-only when locked , or even Charging-only at all times if you rarely use the port for data. This prevents a plugged-in cable from establishing a data connection without your explicit consent. Resist re-installing apps you just removed. The whole point of going through this exercise is to shrink your attack surface. If you find yourself missing Instagram after two weeks, it’s worth asking whether you actually miss Instagram or whether you miss the dopamine loop. Review permissions periodically. Keep the spyware device actually separate. No shared WhatsApp account, no shared password manager vault. Treat it as a different person’s phone. Hit lockdown before boarding a plane or crossing a border. Biometrics offer essentially no legal protection in most jurisdictions. Lockdown forces the next unlock to require a passphrase and if things go sideways there’s the duress PIN. Reboot the device before sleep. Before First Unlock is a meaningfully different security state from After First Unlock . A fresh reboot means the keys haven’t been touched since the last time you intentionally typed your passphrase.

0 views