Latest Posts (20 found)

The Crumbling Workflow Moat: Aggregation Theory's Final Chapter

For decades, software companies commanded premium pricing not only for their data, but for their interfaces . The specialized keyboards. The Excel integrations. The workflow automations. Users spent years mastering these systems. Companies built processes hardcoded to specific tools. Switching meant massive productivity loss. The interface WAS the product. I haven’t used Google in a year. An LLM chat is my browser. Soon, knowledge workers won’t use specialized software interfaces either. The LLM chat will be their interface to everything. This isn’t incremental change. This is the completion of Ben Thomson’s Aggregation Theory. In this article: Why Aggregation Theory left suppliers with one critical asset: their interface How vertical software built empires on workflow complexity, not data Why LLMs absorb the interface layer entirely When interfaces are commoditized, it’s API versus API Valuation Framework: the math is brutal Who wins, who loses, and what comes next Subscribe now Ben Thompson’s framework reshaped how we think about internet economics. The value chain was simple: Suppliers → Distributors → Consumers . Pre-internet, high distribution costs created leverage for distributors. TV networks controlled what content got aired. Newspapers decided which stories mattered. Retailers chose which products reached shelves. Then distribution costs collapsed to zero. Transaction costs followed. Power shifted from distributors to a new species: aggregators. The classic aggregators emerged: Google aggregated websites via search. Facebook aggregated content via social graph. Amazon aggregated merchants via marketplace. Uber and Airbnb aggregated physical supply via mobile apps. Thompson identified the virtuous cycle: Better UX → More users → More suppliers → Better UX. The aggregator wins by owning the consumer relationship, commoditizing suppliers until they become interchangeable. THE WEB 2.0 AGGREGATION STACK But suppliers retained two critical assets. Their interface and their data. The paradox of Web 2.0 aggregation was structural. Google commoditized discovery. When you search “best Italian restaurant SF,” you don’t care which site ranks #1. The source is fungible. But you still visit that site. You see their brand. You experience their UX. You navigate their reservation system. This created a hard limit on commoditization: Discovery : Commoditized (Google owns it) Interface : Protected (suppliers own it) Data : Protected (suppliers own it) The interface layer mattered for four reasons: Brand persistence : Users saw the New York Times, not just “a news source.” Brand equity survived aggregation. UX differentiation : Suppliers could compete on design, speed, features. A better interface meant higher conversion. Switching costs : Users developed muscle memory, workflow habits. Learning a new system had real friction. Monetization control : Suppliers owned their conversion funnels. They controlled the paywall, the checkout, the subscription flow. Vertical software is the perfect case study. Financial data terminals, legal research platforms, medical databases, real estate analytics, recruiting tools. They all pull from data that’s largely commoditized or licensable. Yet they command premium pricing. Why? Because the interface IS the moat. THE INTERFACE MOAT IN VERTICAL SOFTWARE Same data. Different interfaces. Premium pricing. Knowledge workers spent years learning specialized interfaces. The muscle memory is real. They’re not paying for data. They’re paying to not relearn a workflow they’ve spent a decade mastering. Companies built models and processes hardcoded to specific plugins. Changing providers means rebuilding workflows, retraining teams, risking errors during the transition. Switching costs weren’t about data. They were about the interface. This is why vertical software traded at 20-30x earnings. The market believed the interface was defensible. But is it today? Subscribe now LLMs don’t just aggregate suppliers. They absorb the interface itself. When LLMs commoditize the interface, what’s left? Just the data. And then it’s API against API. Pure commodity competition. The three-layer collapse: What changes structurally: THE VISIBILITY COLLAPSE Users never see the supplier’s brand Users never experience the supplier’s UX Users don’t know where information originated The entire web becomes a backend database Consider a knowledge worker today using specialized vertical software. They open the application. Navigate to the screening tool. Set parameters. Export to Excel. Build a model. Run scenarios. Each step involves interacting with the software’s interface. Each step reinforces the switching cost. Now consider a knowledge worker with an LLM chat: “ Show me all software companies with >$1B market cap, P/E under 30, growing revenue >20% YoY. “ “ Build a DCF model for the top 5. “ “ Run sensitivity analysis on discount rate.” The user never touched any specialized interface. They don’t know (or care) which data provider the LLM queried. The LLM found the cheapest available source with adequate coverage. This is complete commoditization. Not just of discovery, but of the entire supplier experience. When interfaces are commoditized, all that remains is API versus API. What happens to pricing power when interfaces disappear: The old model (vertical software): $10-25K/seat/year Multi-year contracts with annual escalators 95%+ retention because switching means retraining Gross margins >80% The new model: Data licensing fees (pennies per query) No user lock-in (LLM can switch sources instantly) Margin compression to commodity levels Retention based purely on data quality and coverage The math is brutal. If a vertical software company’s interface was 60% of their value, and LLMs eliminate interface value entirely, what remains is pure data value. And if that data isn’t proprietary, if it can be licensed or replicated, there’s nothing left. VALUE DECOMPOSITION If no proprietary data you are in big trouble. This is Aggregation Theory applied to its logical conclusion. Look at financial data software. Companies that built empires on interface complexity are watching their moats evaporate. A $20B market cap company with no truly proprietary data should trade at $5-8B once LLMs absorb their interface value. That’s not a bear case. That’s math. The same logic applies everywhere interfaces created moats: Financial data : Terminals that charge $12-24K/year for interfaces over largely commoditized data feeds. When an LLM can query the same data directly, the interface premium evaporates. Legal research : Platforms charging premium prices for interfaces over case law that’s largely public domain. The specialized search and citational tools become worthless when an LLM can do it better. Medical databases : Clinical decision support tools that charge physicians for point-of-care recommendations. Exactly what LLMs excel at. Real estate analytics : Comprehensive databases accessed through specialized workflow tools. LLMs querying the same data through APIs eliminate the workflow lock-in. Recruiting : Search and outreach tools charging $10K+/year. When an LLM can query professional networks and draft personalized outreach, the interface value disappears. The only survivors: companies with truly proprietary data that cannot be replicated or licensed. If interfaces are irrelevant, what do suppliers need? The old stack: Frontend framework (React, Vue) Design system (component library) UX research (user testing, A/B tests) Brand marketing (differentiation) SEO optimization (Google discovery) The new stack: Clean, structured data (markdown, JSON) API/MCP endpoints (machine accessibility) Data quality monitoring (accuracy, freshness) That’s it. All software becomes API. A restaurant today invests in a beautiful website with parallax scrolling, professional food photography, reservation system integration, review management, local SEO. All to make humans want to click “Book Now.” A restaurant in the LLM era needs: # Bella Vista Italian Restaurant ## Location: 123 Main St, San Francisco ## Hours: Mon-Thu 5-10pm, Fri-Sat 5-11pm ## Menu: - Margherita Pizza: $22 - Spaghetti Carbonara: $24 ## Reservation API: POST /book {date, time, party_size} That’s everything an LLM needs. The $50K website becomes a text file and an API endpoint. Vertical software’s beautiful interfaces become: MCP endpoint: /query Parameters: {filters, fields, format} Returns: [structured data] No keyboard shortcuts to learn. No plugins to install. No interface to build. Just data, accessible via API. Subscribe now Traditional REST APIs had structural limitations that preserved switching costs: Rigid schemas requiring exact field names Extensive documentation humans had to read Bespoke integration for every service Stateless interactions without conversation context This created a moat: integration effort. Even if data was commoditized, the cost of switching APIs was non-trivial. Someone had to write new code, test edge cases, handle errors differently. MCP changes this. Model Context Protocol eliminates integration friction: When switching between data sources requires zero integration work, the only differentiator is data quality, coverage, and price. This is true commodity competition. SWITCHING COST COLLAPSE The New Aggregation Framework Reframing Thompson’s model for the LLM era: AGGREGATION EVOLUTION Original Aggregation Theory (2015): Suppliers → [Aggregator] → Consumers The aggregator (Google/Facebook) achieved zero distribution cost, zero transaction cost, and commoditized suppliers. But suppliers kept their interface and their data. LLM Aggregation Theory (2025): APIs → [LLM Chat] → Consumers The LLM achieves zero distribution cost, zero transaction cost, AND zero interface cost. Complete supplier invisibility. What remains is API versus API. The aggregator layer gets thicker while the supplier layer gets thinner . In Web 2.0, Google was a thin routing layer. It pointed you to suppliers who owned your attention once you clicked. The supplier had the relationship. The supplier had the interface. The supplier converted you. In the LLM era, the chat owns your entire interaction. Suppliers are invisible infrastructure. You don’t know where the information came from. You don’t experience their brand. You never see their interface. Vertical software in 2020: The product that owned the workflow. Vertical software in 2030: An API that the LLM queries. The moat wasn’t data. It was that knowledge workers lived inside these interfaces 10 hours a day. That interface now lives inside the LLM chat. The New Value Matrix The Winners: LLM Chat Interface Owners: Whoever owns the chat interface owns the user relationship. OpenAI with ChatGPT. Anthropic with Claude. Microsoft with Copilot. Google with Gemini. They capture the interface value that vertical software loses. The new aggregators. Proprietary Data Owners: Companies with truly unique, non-replicable data. The key test: Can this data be licensed or scraped? If yes, not defensible. If no, you survive. MCP-First Startups : Companies building for agents, not humans. No legacy interface to protect. No beautiful UI to maintain. Just clean data served through MCP endpoints that LLMs can query. They can undercut incumbents on price because they have no interface investment to recoup. The Losers: Interface-Moat Businesses : Any vertical software where “workflow” was the value. The interface that justified premium pricing becomes worthless. A $20B company with no proprietary data becomes a $5-8B company. Traditional Aggregators (Maybe): Google and Meta commoditized suppliers. Now LLMs could commoditize them. But here’s the nuance: only if they fail to own the LLM chat layer themselves. Google has Gemini and insane distribution. Meta has Llama. The race is on. If they win the chat interface, they stay aggregators. If they lose it, they become the commoditized. Content Creators : UGC platforms lose relevance when AI generates personalized content. The creator economy inverts: infinite AI content, zero human creators needed for most use cases. The UI/UX Industry: Beautiful interfaces become irrelevant when the LLM chat is the only interface. Hundreds of billions per year in frontend development... for what? Figma (amazing product!) is down by 90%. The framework for repricing interface businesses is simple: How much of the business is interface versus data? Most vertical software is 60-80% interface, 20-40% data. When LLMs absorb the interface, that value evaporates. Is the data truly proprietary? If it can be licensed, scraped, or replicated, there’s no moat left. Pure commodity competition. This is not a bear case. This is math. The market hasn’t priced this in because LLM capabilities are new (less than 2 years at scale), MCP adoption is early (less than 1 year), enterprise buyers move slowly (3-5 year contracts), and incumbents are in denial. But the repricing is coming in my opinion. The arc of internet economics: Pre-Internet (1950-1995) : Distributors controlled suppliers. High distribution costs created leverage. Web 1.0 (1995-2005) : Distribution costs collapsed. Content went online but remained siloed. Web 2.0 (2005-2023) : Transaction costs collapsed. Aggregators emerged. Suppliers were commoditized but kept their interfaces. LLM Era (2023+) : Interface costs collapse. LLMs complete aggregation. Suppliers become APIs. It’s API versus API, and whoever has no proprietary data loses. What Thompson got right: Suppliers would be commoditized. Consumer experience would become paramount. Winner-take-all dynamics would emerge. What Thompson couldn’t have predicted: The interface itself would be absorbed. Suppliers would become invisible. The aggregator would BE the experience, not just route to it. All software would become API. In the LLM era, the internet becomes a database. Structured data in, natural language out. No websites, no interfaces, no brands. Just APIs serving data to AI. For someone who spent a decade building beautiful interfaces, this is bittersweet. All those carefully crafted interactions, pixel-perfect layouts, workflow optimizations... obsolete. But this is what progress looks like. The UX of chatting with an LLM is infinitely better than navigating specialized software. And that’s all that matters. Aggregation Theory told us suppliers would be commoditized. LLMs are finishing the job. The interface moat is dead. What remains is data. And if your data isn’t proprietary, neither is your business. Subscribe now For decades, software companies commanded premium pricing not only for their data, but for their interfaces . The specialized keyboards. The Excel integrations. The workflow automations. Users spent years mastering these systems. Companies built processes hardcoded to specific tools. Switching meant massive productivity loss. The interface WAS the product. I haven’t used Google in a year. An LLM chat is my browser. Soon, knowledge workers won’t use specialized software interfaces either. The LLM chat will be their interface to everything. This isn’t incremental change. This is the completion of Ben Thomson’s Aggregation Theory. In this article: Why Aggregation Theory left suppliers with one critical asset: their interface How vertical software built empires on workflow complexity, not data Why LLMs absorb the interface layer entirely When interfaces are commoditized, it’s API versus API Valuation Framework: the math is brutal Who wins, who loses, and what comes next Subscribe now But suppliers retained two critical assets. Their interface and their data. The Interface Moat: Why Commoditization Had a Ceiling The paradox of Web 2.0 aggregation was structural. Google commoditized discovery. When you search “best Italian restaurant SF,” you don’t care which site ranks #1. The source is fungible. But you still visit that site. You see their brand. You experience their UX. You navigate their reservation system. This created a hard limit on commoditization: Discovery : Commoditized (Google owns it) Interface : Protected (suppliers own it) Data : Protected (suppliers own it) Same data. Different interfaces. Premium pricing. Knowledge workers spent years learning specialized interfaces. The muscle memory is real. They’re not paying for data. They’re paying to not relearn a workflow they’ve spent a decade mastering. Companies built models and processes hardcoded to specific plugins. Changing providers means rebuilding workflows, retraining teams, risking errors during the transition. Switching costs weren’t about data. They were about the interface. This is why vertical software traded at 20-30x earnings. The market believed the interface was defensible. But is it today? Subscribe now LLMs: The Final Aggregator LLMs don’t just aggregate suppliers. They absorb the interface itself. When LLMs commoditize the interface, what’s left? Just the data. And then it’s API against API. Pure commodity competition. The three-layer collapse: What changes structurally: THE VISIBILITY COLLAPSE Users never see the supplier’s brand Users never experience the supplier’s UX Users don’t know where information originated The entire web becomes a backend database $10-25K/seat/year Multi-year contracts with annual escalators 95%+ retention because switching means retraining Gross margins >80% Data licensing fees (pennies per query) No user lock-in (LLM can switch sources instantly) Margin compression to commodity levels Retention based purely on data quality and coverage If no proprietary data you are in big trouble. This is Aggregation Theory applied to its logical conclusion. Look at financial data software. Companies that built empires on interface complexity are watching their moats evaporate. A $20B market cap company with no truly proprietary data should trade at $5-8B once LLMs absorb their interface value. That’s not a bear case. That’s math. The same logic applies everywhere interfaces created moats: Financial data : Terminals that charge $12-24K/year for interfaces over largely commoditized data feeds. When an LLM can query the same data directly, the interface premium evaporates. Legal research : Platforms charging premium prices for interfaces over case law that’s largely public domain. The specialized search and citational tools become worthless when an LLM can do it better. Medical databases : Clinical decision support tools that charge physicians for point-of-care recommendations. Exactly what LLMs excel at. Real estate analytics : Comprehensive databases accessed through specialized workflow tools. LLMs querying the same data through APIs eliminate the workflow lock-in. Recruiting : Search and outreach tools charging $10K+/year. When an LLM can query professional networks and draft personalized outreach, the interface value disappears. The only survivors: companies with truly proprietary data that cannot be replicated or licensed. From Software to APIs: The New Supplier Stack If interfaces are irrelevant, what do suppliers need? The old stack: Frontend framework (React, Vue) Design system (component library) UX research (user testing, A/B tests) Brand marketing (differentiation) SEO optimization (Google discovery) Clean, structured data (markdown, JSON) API/MCP endpoints (machine accessibility) Data quality monitoring (accuracy, freshness) Rigid schemas requiring exact field names Extensive documentation humans had to read Bespoke integration for every service Stateless interactions without conversation context The New Aggregation Framework Reframing Thompson’s model for the LLM era: AGGREGATION EVOLUTION Original Aggregation Theory (2015): Suppliers → [Aggregator] → Consumers The aggregator (Google/Facebook) achieved zero distribution cost, zero transaction cost, and commoditized suppliers. But suppliers kept their interface and their data. LLM Aggregation Theory (2025): APIs → [LLM Chat] → Consumers The LLM achieves zero distribution cost, zero transaction cost, AND zero interface cost. Complete supplier invisibility. What remains is API versus API. The aggregator layer gets thicker while the supplier layer gets thinner . In Web 2.0, Google was a thin routing layer. It pointed you to suppliers who owned your attention once you clicked. The supplier had the relationship. The supplier had the interface. The supplier converted you. In the LLM era, the chat owns your entire interaction. Suppliers are invisible infrastructure. You don’t know where the information came from. You don’t experience their brand. You never see their interface. Vertical software in 2020: The product that owned the workflow. Vertical software in 2030: An API that the LLM queries. The moat wasn’t data. It was that knowledge workers lived inside these interfaces 10 hours a day. That interface now lives inside the LLM chat. Winners and Losers: A Framework The New Value Matrix The Winners: LLM Chat Interface Owners: Whoever owns the chat interface owns the user relationship. OpenAI with ChatGPT. Anthropic with Claude. Microsoft with Copilot. Google with Gemini. They capture the interface value that vertical software loses. The new aggregators. Proprietary Data Owners: Companies with truly unique, non-replicable data. The key test: Can this data be licensed or scraped? If yes, not defensible. If no, you survive. MCP-First Startups : Companies building for agents, not humans. No legacy interface to protect. No beautiful UI to maintain. Just clean data served through MCP endpoints that LLMs can query. They can undercut incumbents on price because they have no interface investment to recoup. Interface-Moat Businesses : Any vertical software where “workflow” was the value. The interface that justified premium pricing becomes worthless. A $20B company with no proprietary data becomes a $5-8B company. Traditional Aggregators (Maybe): Google and Meta commoditized suppliers. Now LLMs could commoditize them. But here’s the nuance: only if they fail to own the LLM chat layer themselves. Google has Gemini and insane distribution. Meta has Llama. The race is on. If they win the chat interface, they stay aggregators. If they lose it, they become the commoditized. Content Creators : UGC platforms lose relevance when AI generates personalized content. The creator economy inverts: infinite AI content, zero human creators needed for most use cases. The UI/UX Industry: Beautiful interfaces become irrelevant when the LLM chat is the only interface. Hundreds of billions per year in frontend development... for what? Figma (amazing product!) is down by 90%.

0 views

A Note on Blogging Anonymously

This blog is anonymous. I wrote a bit about that in my blogging journey , how I made the mistake of announcing my first blog to all my friends and family then got self-conscious, and how that really stifled what I wanted to write about. I wrote about it in more detail in another post, but the simple version is this: this space is mine, a Room of My Own . Blogging felt like it belonged to a privileged few (a leftover belief from the early 2000s — I binged on those blogs like no other), and it wasn’t until Facebook that writing in public under my own name felt accessible. I also believed continuation had to be earned - that validation or “success” would give me permission to keep going. That whole thing around visibility and validation is captured so well in this quote from Baby Reindeer …because… because fame encompasses judgment, right? And I… I feared judgment my entire life. That’s why I wanted fame, because when you’re famous, people see you as that, famous. They’re not thinking all the other things that I’m scared they’re thinking. Like, “That guy’s a loser or a drip or a fucking fa*ggot.” They think, “It’s the guy from that thing.” “It’s the funny guy.” And I wanted so badly to be the funny guy. “Why keep your blog anonymous, why not just journal then?” someone asked me after we emailed about one of my blog posts. And although I do journal privately, writing publicly (even anonymously) does something different. When I know someone might read what I’m saying, I have to distil the idea. It forces clarity. I stop rambling and try to focus. And the bonus is that sometimes what I write resonates with someone else, and we exchange ideas. Over the last few years, and through my blogging struggle (I hate that it was a struggle: start, stop, change domains, shut down, start again), I’ve also realised that what I want to write about here isn’t something I know many people in real life are interested in. And even when I do try to have those conversations, I don’t really get anywhere in depth. It almost feels like there’s no real interest in topics that are admittedly a bit niche: do I put my notes in Obsidian or Bear? Where do admin notes live? How do I track the books I read? Or my thoughts on success, scarcity, work, life, and all that. There are probably people in real life who are interested in productivity and examining life this way, but maybe, like me, they keep those opinions elsewhere. I do sometimes talk about productivity. People love discussing it at a high level, but I want details: where do you put your meeting notes? How do you track your to-dos, personal vs team vs project? Every now and then I meet someone at work who enthusiastically walks me through their  system, how they streamline OneNote with Teams and Outlook (which I also use at work). I love picking up little bits and pieces. And on that note, I secretly admire people who don’t care about any of this and just… get on with it somehow. What I’m trying to say is that I don’t necessarily need people who know me to know what I think about certain topics. Some things just aren’t for your professional life. For me, there’s a clear separation between work and life, and I like to keep it that way. Even though I do make friends at work, as I wrote about in (my very first!) blog post What Happens When Your 9–5 Defines You I still want a professional boundary between what I say here and who I am at work. I want the freedom to write whatever I want, without worrying whether it’s work-appropriate. If I want to write about weight loss, menopause, or something else like that, I don’t need everyone (not that everyone would be reading it, but it would feel that way to me) at work knowing about it. If I want to write about relationships, I haven’t really, so far, but I want that option, without wondering who might read it. My blog has mostly been about my favourite topic in the world: obsessing over tools - how I use them, why I use them - and optimising processes, alongside examining the life topics I tend to fixate on. I want this blog to be a mix of everything I am. Maybe if I wasn’t working, I’d feel comfortable opening it up at this point. But I haven’t told anyone about this blog at all. And if someone ever read it and worked out it was me, fine. But that’s not likely to happen any time soon. I know a lot of people use their blog as a professional CV. In some ways, I wish I could do that. I even had a domain with my full name, which has just expired. But I don’t think I’d ever be comfortable with it, and I don’t really need a static personal site. I have LinkedIn for that, and, I suppose,  I’m quite Gen X in that way. What I do want is a blog. Something I can be prolific on, or not, as much as I want. And that freedom, that anonymity, is what makes it possible.

0 views

Date Arithmetic in Bash

Date and time management libraries in many programming languages are famously bad. Python's datetime module comes to mind as one of the best (worst?) examples, and so does JavaScript's Date class . It feels like these libraries could not have been made worse on purpose, or so I thought until today, when I needed to implement some date calculations in a backup rotation script written in bash. So, if you wanted to learn how to perform date and time arithmetic in your bash scripts, you've come to the right place. Just don't blame me for the nightmares.

0 views

Apple Earnings, Supply Chain Speculation, China and Industrial Design

Apple's earnings could have been higher but the company couldn't get enough chips; then, once again a new design meant higher sales in China.

0 views

Writing an LLM from scratch, part 32a -- Interventions: training a baseline model

I'm rounding out my series of posts on Sebastian Raschka 's book " Build a Large Language Model (from Scratch) " by seeing how I could train the best base model I can from scratch on my own hardware. I started by training one in two days on my RTX 3090 , and found that while it was a decent little model, it wasn't as good as the original GPT-2 small, either in terms of the loss it got on my test dataset, or in terms of how good it was at following instruction prompts after fine-tuning on them. I decided that I wanted to see what levers I could pull -- dropout, attention weight biases, and so on -- to make it better. For that, I didn't want to have my PC tied up for days at a time with multiple long training runs, so I learned how to train faster in the cloud . That led to some refinements in the prompt-following test I was using , and I also spent a bit of time on a side quest getting the various models I'd trained onto Hugging Face Hub . Now it's time to try the various "interventions", as I'll call them -- the levers to pull to see if I can make the model better. This post is to recap what they are, and to describe what I did to establish a baseline model to compare to. I listed a number of possible interventions at the end of the RTX 3090 post; I'm not going to do them all, but for completeness, here's the full list: I'm going to work through each of those apart from the first two and the batch size (and will retrospectively add links to the posts when I do), trying a train with just that intervention and nothing else, on a cloud machine. Once that's done, I'll bake all of the things that helped into the training loop, and do another local train -- with gradient accumulation to make the batch size match the cloud instances'. The cloud machine size that I decided to use for this was the one that came out the most cost-effective (and due to its VRAM size, had the best loss) in my earlier cloud training test: an 8x A100 machine with 40 GiB VRAM per GPU. But first, we need a baseline model. I've already done a train on an 8x A100 40 GiB machine -- why do we need a new one? In my cloud training post, I came to the conclusion that the cost in terms of training time of running a periodic validation loop as we trained was not really worth it, at least in this case. Two of the biggest reasons to have validation during training are to work out when you're overfitting on a multi-epoch train, and to see how your model can handle datasets that it has not been trained on. In a single-epoch train like this, you're not going to overfit -- every sample it sees will be new to it -- and the training loss itself is over samples it's not been trained on at the time it was calculated, for the same reason (though of course it will be trained on them as soon as we do the backward pass starting with that loss). Of course, it's not perfect -- a big benefit of the validation loss is that it's over the same held-back dataset on every run -- and there are arguments for keeping it (albeit, perhaps doing full runs less frequently than I was). But for these experiments, I decided that I'd simply drop it. I also wanted to introduce a consistent random seed at the start of the training loop. I didn't have that in my cloud trains, and of course if we want to have solid results on whether each intervention really does improve matters, then we need one so that we can be sure they're all starting from the same point. Both of those meant that I couldn't use the earlier train on the 8x A100 40 GiB machine as a baseline; I'd need a new one, introducing those two changes: no validation during the training run (using training loss as a proxy), and setting a random seed at the start for reproducibility. So: what was the baseline train going to look like? The first step was to strip out the validation code and to replace it with code that just took periodic checkpoints, keeping track of which one had the best average training loss over the period since the previous one. Next, I decided to plot on the training chart that is generated during the run not just the training loss, but also an indicator of the maximum and minimum training loss over all of the steps in that period. Then I added the random seed , which I set to 42. A couple of bugfixes, and we were left with this version of the code . One thing to highlight: in the file that specifies the various training parameters, I set the per-GPU micro-batch size to 12 rather than the 13 I'd used on this size of machine earlier. Two reasons for that: Firstly, I'm going to want to do a local run with gradient accumulation later, using all of the helpful interventions. With gradient accumulation, you do a number of steps with batches that you can fit into your memory, but you don't update the gradients each time. After a number of those, you do one big update based on the accumulated gradients -- hence the name. The full batch is all of those smaller batches taken together. If I want that to closely match the cloud train, I'll want the accumulated batches to be the same size as each global batch in the cloud. Now, on my local machine, I can fit a batch of 6 into VRAM. So that means that the full batch needs to be divisible by 6 1 . On the cloud train, with a micro-batch of 13 and 8 GPUs, we had an overall batch size of 104 in the previous train. 104 is not divisible by 6: no joy. But with a micro-batch size of 12, we have an overall batch of 12 × 8 = 96 , which means we'd be able to do gradient accumulation and do a parameter update every 96 ÷ 6 = 16 steps. Secondly, while my estimate of the ideal overall batch size was based on a rather arbitrary bit of curve-fitting, it did say that 97 was the ideal size. So it could be interesting to see whether it did help! So, having coded that up and set up the configuration, it was time to run it. Here's the training chart it came up with: Note the loss spikes at around global steps 4,200, 13,000 and 23,000. Those are important, I'll explain why later. The training run reported this at the end: So it took about 3h24m to train, even less than we expected from the previous cloud experiments' estimates of how long it would take excluding validation. About US$35 in cost. Here is the model on Hugging Face Hub . Let's see how it looks. For these intervention posts, I won't run the instruction-following tests, as they can only be run against a batch of models in one go to get results that are consistent with each other . But the smoke test -- how does it complete the sequence is worthwhile: Looks good! Reasonably coherent. Now we can find the loss on our held-back test set: That's a bit worse than the 3.674 we got for the original cloud train. Either the calculations of the optimal batch size I did were not quite right (entirely likely, they were very ad-hoc) or the model weights we started with, given the random seed we're using, just happened to lead us in a slightly worse direction (also plausible). Either way, it's in line with what we expected, and is still better than the test loss of 3.725 that we got with the second-best machine in the cloud comparison post (the 8x H100 80 GiB with a global batch size of 216). So: we have a solid baseline model -- before we wrap up, let's consider those spikes in the loss that I called out in the training chart. Random spikes in the loss are a Bad Thing, right? Certainly they're a bad thing for a train in general, especially if you don't know for sure what's causing them. But my working assumption has been that they're caused by exploding gradients -- for some specific sample in the dataset, the gradients have gone up to some insanely high value, and we've had a bad update to our parameters as a result. It hasn't completely knocked the model back to its starting point, but it does take some time to recover, so we lose the benefit of some of our training. If that is the case -- and it's not just something like a batch happening to have stuff that's wildly different to the rest of the training data, or something weird in the optimiser -- then gradient clipping is the solution. I wanted to see if it would help the model quality in general, but of course if we hadn't had any loss spikes in this baseline train it would have been hard to see if that was the case! So I was very glad to see them here, as if there had been none I would either have had to do a gradient clipping experiment with no real expectation of it helping -- or do another baseline train with a different random seed in the hope that that caused some spikes, which would have cost another US$35. All in all, it was good to see them there, as it sets us up well for that experiment. So, we've trained a baseline model that we can make changes to -- the interventions I listed at the start -- and get a pretty reliable understanding of whether or not they help the quality of the final model. With that in place, we're in a good position to start running those intervention tests! Given the loss spike situation in that chart, I think that a solid first one to go for -- even though it was the last in that list at the top of this post -- is gradient clipping. Where are those loss spikes coming from, and if it's exploding gradients, what happens if we limit the damage they do with gradient clipping? Stay tuned! I've already done the training run for that (while I wrote this one up), so I should be able to post about it tomorrow. Well, you could potentially do something with batches of different sizes, but that would be fiddly.  ↩ The amount of training data. I'm not going to dig into this one; it looks like it does help, but the returns diminish rapidly, so I think that in order to get any serious improvement we'd need to train for much more than two days locally. In the one "extended training" test I did, I managed to get the loss down from 4.167 to 4.135, which was... less-than-inspiring. The number of epochs. I'm going to stick to single-epoch training -- that is, I'll train on a single pass through an amount of non-repeating data chosen to take 48 hours to handle on my local machine. The bias on the W q , W k and W v matrices. This one definitely sounds worth looking into -- easy, as it's just a change to a config flag, and makes the model more like the original GPT-2. I'll give that a go. Dropout. I've read that for single-epoch training, dropout doesn't help (which doesn't quite work with my mental model of what it's for, but does sound plausible). Worth a look! The learning rate, and weight decay. The values I've used for these are basically copypasta from the book. I think I should learn to understand these and try to optimise them a bit. The precision. I'm using AMP , which means that some calculations are done in 16-bit rather than 32-bit, and calling with to let PyTorch choose to use the GPU's tensor cores, which use TF32, a kind of "32-bit float lite" (see the post on the local train for details). Those both (at least potentially) reduce the precision of the train below what you'd get if you trained with full-fat . Would reverting that be worth the longer train time? I should probably at least poke at that. The batch size. I've already, in effect, tried playing with that. The different cloud machines I played with had different amounts of per-GPU VRAM, so supported different per-GPU micro-batch sizes. So I wound up trying batch sizes from 512 (the same as the original GPT-2 was trained with) down to 104 in the cloud, plus my local trains with a batch size of 6. I did a rough-and-ready calculation at the end of the cloud training post where I estimated that the ideal batch size might be something like 97. So, probably not worth much more investigation. Exploding gradients. In one of my local trains, and in three out of the four cloud trains, I had sudden spikes in both training and validation loss. It generally took quite a bit of training -- maybe 10-15% of training time -- to get back on track after some of these, so we had what could be seen as wasted time in the training runs. Exploding gradients can be fixed by gradient clipping, which is relatively easy to do. Definitely worth investigating! Well, you could potentially do something with batches of different sizes, but that would be fiddly.  ↩

0 views

Rendering 100k trace events faster with exponential search

We’ve recently been looking into optimizing rendering performance of the Perfetto UI on large traces. We discovered that there was some inefficiency in our data fetching logic, especially when you’re very zoomed out. In this case, there can be a lot of slices (spans) which are so small that they take less than one pixel of width. So for each pixel, we need to figure out “what is the event which we should draw for this pixel”. Over time we’ve come to the conclusion that the best thing to draw is the slice with the largest duration in that pixel. We can break this into two sub-problems: We’re going to focus on 1) in this post as that’s where the slowdown was. 2) is fascinating but also surprisingly orthogonal. If you’re interested, I would suggest reading this excellent post from Tristan Hume explaining the basic algorithm we use. What is the range of events which correspond to each pixel? What is the event with the maximum duration for that pixel?

2 views

I just discovered Kokoro free Text To Speech that runs locally

Github: https://github.com/eduardolat/kokoro-web I can’t wait to try out the api. I am planning a series of video posts on Instagram, YouTube and Tiktok for people who just don’t read (a lot more these days it seems). Just my blog posts being read out, with the web page scrolling in the background. The challenge will be automating the video creation from start to finish. Here is the first one – this was my first blog post ever (not very long!) The post I just discovered Kokoro free Text To Speech that runs locally appeared first on Circus Scientist .

0 views
Andy Bell Yesterday

I listen to a lot of I Prevail

I only discovered them in July 2025 and I’ll let this last.fm chart do the rest to illustrate the point. I think this chart sums me up as: Anyway, I think they rip. They’re a bit cheesy, sure, but this record especially sounds massive , so if you’re into metalcore etc, give ’em a try.

0 views
Rik Huijzer Yesterday

Picture of Epstein Eating Cake

A picture of Jeffrey Epstein eating a cake with what seems to be the Talmud visible behind him. Seems to have been released a few weeks ago. Source. ![Epstein eating cake with Talmud behind him](/files/3e1189458d2a9f51)

0 views
DHH Yesterday

Cloud gaming is kinda amazing

I fully understand the nostalgia for real ownership of physical-media games. I grew up on cassette tapes (C64 + Amstrad 464!), floppy disks (C64 5-1/4" then Amiga 3-1/2"), cartridges, and CDs. I occasionally envy the retro gamers on YouTube with an entire wall full of such physical media. But do you know what I like more than collecting? Playing! Anywhere. Anything. Anytime. We went through the same coping phases with movies and music. Yes, vinyl had a resurgence, but it's still a tiny sliver of hours listened. Same too with 4K Blue-rays. Almost everyone just listens to Spotify or watches on Netflix these days. It's simply cheaper, faster, and, thus, better. Not "better" in some abstract philosophical way (ownership vs rent) or even in a concrete technical way (bit rates), but in a practical way. Paying $20/month for unlimited music and the same again for a broad selection of shows and movies is clearly a deal most consumers are happy to make. So why not video games? Well, because it just wasn't good enough! Netflix tried for casual gaming, but I didn't hear much of that after the announcement. Google Stadia appears to have been just a few years ahead of reality (eerie how often that happens for big G, like with both AI and AR!) as they shut down their service already. NVIDIA, though, kept working, and its GeForce NOW service is actually, finally kinda amazing! I had tried it back in the late 2010s, and just didn't see anything worth using back then. Maybe my internet was too slow, maybe the service just wasn't good enough yet. But then I tried it again a few days ago, just after NVIDIA shipped the native GFN client for Linux, and holy smokes!! You can legitimately play Fortnite in 2880x1800 at 120 fps through a remote 4080, and it looks incredible. Yes, there's a little input lag, but it's shockingly, surprisingly playable on a good internet connection. And that's with the hardest possible genre: competitive shooters! If you play racing games like Forza Horizon or story-mode games like Warhammer 40K: Space Marine 2, you can barely tell! This is obviously a great option for anyone with a modest computer that can't run the latest triple-A titles, but also for Linux gamers who don't have access to run the cheat-protection software required for Fortnite and a few other games.  And, like Spotify and Netflix, it's pretty competitively priced. It's $20/month for access to that 4080-tier. You'd quickly spend $2,000+ on a gaming rig with a 4080, so this isn't a half bad deal: it's a payback of 100 months, and by then you'd probably want a 6080 anyway. Funny how NVIDIA is better at offering the promise of cheap cloud costs than the likes of AWS! Anyway, I've been very impressed with NVIDIA GeForce NOW. We're going to bake the Linux installer straight into the next version of Omarchy, so you can just go to Install > Gaming > NVIDIA GeForce NOW to get going (just like we have such options for Steam and Minecraft). But of course seeing Fortnite running in full graphics on that remote 4080 made me hungry for even more. I've been playing Fortnite every week for the last five years or so with the kids, but the majority of my gameplay has actually been on tablet. A high-end tablet, like an iPad M5, can play the game with good-for-mobile graphics at 120 Hz. It's smooth, it's easy, and the kids and I can lounge on the couch and play together. Good Family Fun! Not peak visual fidelity, though. So after the NVIDIA GeForce NOW experience, I found a way to use the same amazing game streaming technology at home through a local-server solution called Apollo and a client called Moonlight. This allowed me to turn my racing-sim PC that's stuck downstairs into a cloud-like remote gaming service that I can access anywhere on the local network, so I can borrow its 4090 to play 120-fps, ultra-settings Fortnite with zero perceivable input lag on any computer in the house. The NVIDIA cloud streaming is very impressive, but the local-server version of the same is mind-blowing. I'm mostly using the Asus G14 laptop as a client, so Fortnite looks incredible with those ultra, high-resolution settings on its OLED, but unlike when you use that laptop's built-in graphics card, the machine stays perfectly cool and silent pulling a meager 18 watts. And the graphics are of course a lot nicer. The Moonlight client is available for virtually every platform: Mac, iOS, Android, and of course Linux. That means no need to dual boot to enjoy the best games at the highest fidelity. No need for a honking big PC on my primary desk. I did not know this was an option!! Whether you give NVIDIA's cloud gaming setup a try or repurpose a local gaming PC for the same, you're in for a real treat of what's possible with streaming Fortnite on ultra settings at 120 fps on Linux (or even Mac!). GG, NVIDIA!

0 views

Re-architecting End-host Networking with CXL: Coherence, Memory, and Offloading

Re-architecting End-host Networking with CXL: Coherence, Memory, and Offloading Houxiang Ji, Yifan Yuan, Yang Zhou, Ipoom Jeong, Ren Wang, Saksham Agarwal, and Nam Sung Kim MICRO'25 This paper is the third one that I’ve posted about which deals with the subtleties of interfacing a NIC and a host CPU. Here are links to the previous posts on this subject: Disentangling the Dual Role of NIC Receive Rings CEIO vs rxBisect: Fixing DDIO’s Leaky DMA Problem The authors bring a new hammer to the construction site: CXL , which offers some interesting efficiencies and simplifications. This paper shows how CXL can address two specific problems with the HW/SW interface of a typical PCIe NIC: After the host prepares a packet to be transmitted, it notifies the NIC with a MMIO write. This MMIO write is expensive because it introduces serialization into the host processor pipeline. When the NIC sends a received packet to the host, ideally it would write data to the LLC rather than host DRAM. However, if the host CPU cannot keep up, then the NIC should have a graceful fallback. CXL Type-1 devices are asymmetric: the device has coherent access to host memory, but the host does not have coherent access to device memory. Practically speaking, both packet descriptors and packet payloads must still be stored in host memory (no change from PCIe based NICs). Because the NIC has coherent access to host memory, it can safely prefetch receive descriptors (RxDs) into an on-NIC cache. When a packet arrives, the NIC can grab a descriptor from the cache and thus avoid an expensive host memory read to determine where to write packet data. If the host CPU updates a RxD after the NIC has prefetched it, the CXL cache coherence protocol will notify the NIC that it must invalidate its cached data. Coherence also enables the tail pointers for transmit ring buffers to be safely stored in host memory. The host networking stack can update a tail pointer with a regular store instruction (rather than an MMIO write). The NIC can continually poll this value, using coherent reads. If the tail index pointer has not been updated since the last poll, the NIC will read a cached value and not generate any PCIe traffic. CXL Type-2 NICs allow packets and descriptors to be stored in NIC memory. The host CPU can cache data read from the NIC, as the NIC will generate the necessary coherence traffic when it reads or writes this data. The design space (what data goes into what memory) is large, and the results section has numbers for many possible configurations. Section 5.3 of the paper describes how a type-2 NIC can intelligently use the CXL operation to write received packet data directly into the host LLC. This is similar to DDIO (described in the two papers linked at the top of this post), but the key difference is that the NIC is in the driver’s seat. The CEIO paper proposes monitoring LLC usage and falling back to storing received packets in DRAM local to the NIC if the LLC is too full. With CXL, the NIC has the option to write data to host memory directly (bypassing the LLC), thus avoiding the need for DRAM attached to the NIC. The authors implemented a CXL NIC on an Altera FPGA. They compared results against an nVidia BlueField-3 PCIe NIC. Fig. 10 compares loopback latency for the two devices, normalized to the BlueField-3 latency (lower is better) for a variety of CXL configurations. Source: https://dl.acm.org/doi/pdf/10.1145/3725843.3756102 Dangling Pointers One fact I took away from this paper is that CXL coherence messages are much cheaper than MMIOs and interrupts. Burning a CPU core polling a memory location seems wasteful to me. It would be nice if that CPU core could at least go into a low power state until a relevant coherence message arrives. Thanks for reading Dangling Pointers! Subscribe for free to receive new posts and support my work. After the host prepares a packet to be transmitted, it notifies the NIC with a MMIO write. This MMIO write is expensive because it introduces serialization into the host processor pipeline. When the NIC sends a received packet to the host, ideally it would write data to the LLC rather than host DRAM. However, if the host CPU cannot keep up, then the NIC should have a graceful fallback.

0 views

Open Source security in spite of AI

The title of my ending keynote at FOSDEM February 1, 2026. As the last talk of the conference, at 17:00 on the Sunday lots of people had already left, and presumably a lot of the remaining people were quite tired and ready to call it a day. Still, the 1500 seats in Janson got occupied and there was even a group of more people outside wanting to get in that had to be refused entry. Thanks to the awesome FOSDEM video team, the recording was made available this quickly after the presentation. You can also get the video off FOSDEM servers . The 59 slide PDF version .

0 views
Stratechery Yesterday

Microsoft and Software Survival

Listen to this post : One way to track the AI era, starting with the November 2022 launch of ChatGPT, is by which Big Tech company was, at a particular point in time, thought to be most threatened. At the beginning everyone — including yours truly — was concerned about Google and the potential disruption of Search. Then, early last year , it was Apple’s turn , as its more intelligent Siri stumbled so badly it didn’t even launch. By the fall it was Meta’s in the crosshairs , as the company completely relaunched its AI efforts as Llama hit a wall. Now it’s Microsoft’s turn, which is a bit of a full circle moment, given that the company was thought to be the biggest winner from ChatGPT in particular, thanks to its partnership with OpenAI. I wrote in early 2023 in AI and the Big Five : Microsoft, meanwhile, seems the best placed of all. Like AWS it has a cloud service that sells GPUs; it is also the exclusive cloud provider for OpenAI. Yes, that is incredibly expensive, but given that OpenAI appears to have the inside track to being the AI epoch’s addition to this list of top tech companies, that means that Microsoft is investing in the infrastructure of that epoch. Bing, meanwhile, is like the Mac on the eve of the iPhone: yes it contributes a fair bit of revenue, but a fraction of the dominant player, and a relatively immaterial amount in the context of Microsoft as a whole. If incorporating ChatGPT-like results into Bing risks the business model for the opportunity to gain massive market share, that is a bet well worth making. The latest report from The Information , meanwhile, is that GPT is eventually coming to Microsoft’s productivity apps. The trick will be to imitate the success of AI-coding tool GitHub Copilot (which is built on GPT), which figured out how to be a help instead of a nuisance (i.e. don’t be Clippy!). What is important is that adding on new functionality — perhaps for a fee — fits perfectly with Microsoft’s subscription business model. It is notable that the company once thought of as a poster child for victims of disruption will, in the full recounting, not just be born of disruption, but be well-placed to reach greater heights because of it. I do, I must admit, post that excerpt somewhat sheepishly, as much of it seems woefully shortsighted: All of these factors — plus the fact that Azure growth came in a percentage point lower than expected — contributed to one of the worst days in stock market history. From Bloomberg last week: Microsoft Corp. shares got caught up in a selloff Thursday that wiped out $357 billion in value, second-largest for a single session in stock market history. The software giant’s stock closed down 10%, its biggest plunge since March 2020, following Microsoft’s earnings after the bell Wednesday, which showed record spending on artificial intelligence as growth at its key cloud unit slowed. The only bigger one-day valuation destruction was Nvidia Corp.’s $593 billion rout last year after the launch of DeepSeek’s low-cost AI model. Microsoft’s move is larger than the market capitalizations of more than 90% of S&P 500 Index members, according to data compiled by Bloomberg… The selloff comes amid heightened skepticism from investors that the hundreds of billions of dollars Big Tech is spending on AI will eventually pay off. Microsoft’s results showed a 66% rise in capital expenditures in its most recent quarter to a record $37.5 billion, while growth at its closely tracked Azure cloud-computing unit slowed from the prior quarter. I laid out my base case for Big Tech back in 2020 in The End of the Beginning , arguing that the big tech companies would be the foundation on which future paradigms would be built; is Microsoft the one that might crack? It can, when it comes to vibe coding, be difficult to parse the hype on X from the reality on the ground; what is clear is the trajectory. I have talked to experienced software engineers who will spend 10 minutes complaining about the hype and all of the shortcomings of Claude Code or OpenAI Codex, only to conclude by admitting that AI just helped them write a new feature or app that they never would have otherwise, or would have taken far longer to do than it actually did. The beauty of AI writing code is that it is a nearly perfect match of probabilistic inputs and deterministic outputs: the code needs to actually run, and that running code can be tested and debugged. Given this match I do think it is only a matter of time before the vast majority of software is written by AI, even if the role of the software architect remains important for a bit longer. That, then, raises the most obvious bear case for any software company: why pay for software when you can just ask AI to write your own application, perfectly suited to your needs? Is software going to be a total commodity and a non-viable business model in the future? I’m skeptical, for a number of reasons. First, companies — particularly American ones — are very good at focusing on their core competency, and for most companies in the world, that isn’t software. There is a reason most companies pay other companies for software, and the most fundamental reason to do so won’t change with AI. Second, writing the original app is just the beginning: there is maintenance, there are security patches, there are new features, there are changing standards — writing an app is a commitment to a never-ending journey — a journey, to return to point one, that has nothing to do with the company’s core competency. Third, selling software isn’t just about selling code. There is support, there is compliance, there are integrations with other software, the list of what is actually valuable goes far beyond code. This is why companies don’t run purely open source software: they don’t want code, they want a product, with everything that entails. Still, that doesn’t mean the code isn’t being written by AI: it’s the software companies themselves that will be the biggest beneficiaries of and users of AI for writing code. In other words, on this narrow question of AI-written code, I would contend that software companies are not losers, but rather winners: they will be able to write more code more efficiently and quickly. When the Internet first came along it seemed, at first glance, a tremendous opportunity for publishers: suddenly their addressable market wasn’t just the geographic area they could deliver newspapers to, but rather the entire world! In fact, the nature of the opportunity was the exact opposite; from 2014’s Economic Power in the Age of Abundance : One of the great paradoxes for newspapers today is that their financial prospects are inversely correlated to their addressable market. Even as advertising revenues have fallen off a cliff — adjusted for inflation, ad revenues are at the same level as the 1950s — newspapers are able to reach audiences not just in their hometowns but literally all over the world. The problem for publishers, though, is that the free distribution provided by the Internet is not an exclusive. It’s available to every other newspaper as well. Moreover, it’s also available to publishers of any type, even bloggers like myself. To be clear, this is absolutely a boon, particularly for readers, but also for any writer looking to have a broad impact. For your typical newspaper, though, the competitive environment is diametrically opposed to what they are used to: instead of there being a scarce amount of published material, there is an overwhelming abundance. More importantly, this shift in the competitive environment has fundamentally changed just who has economic power. The power I was referring to was Google; this Article was an articulation of Aggregation Theory a year before I coined the term. The relevance to AI-written code, however, is not necessarily about Aggregators, but rather about inputs. Specifically, what changed for publishers is that the cost of distribution went to zero: of course that was beneficial for any one publisher, but it was disastrous for publishers as a collective. In the case of software companies, the input that is changing is the cost of code: it’s not going completely to zero, at least not yet — you still need a managing engineer, for one, and tokens, particularly for leading edge models actually capable of writing usable code, have significant marginal costs — but the relative cost is much lower, and the trend is indeed towards zero. If you want to carry this comparison forward, this is an argument against there even being a market for software in the long run. After all, the most consumed form of content on the Internet today, three decades on, is in fact user-generated content, which you could analogize to companies having AI write their own software. That seems a reasonable bet for 2056 — if we even have companies then ( I think we will ). In the shorter-term, however, the real risk I see for software companies is the fact that while they can write infinite software thanks to AI, so can every other software company. I suspect this will completely upend the relatively neat and infinitely siloed SaaS ecosystem that has been Silicon Valley’s bread-and-butter for the last decade: identify a business function, leverage open source to write a SaaS app that addresses that function, hire a sales team, do some cohort analysis, IPO, and tell yourself that you were changing the world. The problem now, however, is that while businesses may not give up on software, they don’t necessarily want to buy more — if anything, they need to cut their spending so they have more money for their own tokens. That means the growth story for all of these companies is in serious question — the industry-wide re-rating seems completely justified to me — which means the most optimal application of that new AI coding capability will be to start attacking adjacencies, justifying both your existence and also presenting the opportunity to raise prices. In other words, for the last decade the SaaS story has been about growing the pie: the next decade is going to be about fighting for it, and the model makers will be the arms dealers. While this battle is happening, there will be another fundamental shift taking place: yes, humans will be using software, at least for a while, but increasingly so will agents. What isn’t clear is who will be creating the agents: I expect every SaaS app to have their own agent, but that agent will definitionally be bound by the borders of the application (which will be another reason to expand the app into adjacent areas). Different horizontal players, meanwhile, will be making a play to cover broader expanses of the business, with the promise of working across multiple apps. Microsoft is one of those horizontal layers, and the company’s starting point for agents is what it is calling Work IQ; here is how CEO Satya Nadella explained Work IQ on the company’s earnings call : Work IQ takes the data underneath Microsoft 365 and creates the most valuable stateful agent for every organization. It delivers powerful reasoning capabilities over people, their roles, their artifacts, their communications and their history and memory all within an organization security boundary. Microsoft 365 Copilot’s accuracy and latency powered by Work IQ is unmatched, delivering faster and more accurate work grounded results than competition, and we have seen our biggest quarter-over-quarter improvement in response quality to date. This has driven record usage intensity with average number of conversations per user doubling year-over-year. This feels like the right layer for Microsoft, given the company’s ownership of identity. Active Directory is one of the most valuable free products of all time: it was the linchpin via which Microsoft tied together all of its enterprise products and services, first driving upgrades up and down the stack, and later underpinning its per-seat licensing business model. That the company sees its understanding of the individual worker and all of his or her artifacts, permissions, etc. as the obvious place to build agents makes sense. There’s one big problem with this starting point, however: it’s shrinking. Owning and organizing a company by identity is progressively less valuable if the number of human identities starts to dwindle — and, with a per-seat licensing model, you make less money. That, by extension, means that Microsoft should feel a significant amount of urgency when it comes to fighting the adjacency battles I predicted above. First, directly incorporating more business functions into Microsoft’s own software suite will make Microsoft’s agents more capable. Secondly, absorbing more business functions into Microsoft’s software offering will let the company charge more. Third, the larger Microsoft’s surface area, the more power it will have to compel other software makers to interface with its agents, increasing their capability. This pressure explains the choices Microsoft made that led to its Azure miss in particular. Microsoft was clear that, once again, demand exceeded supply. CFO Amy Hood said in her prepared remarks: Our customer demand continues to exceed our supply. Therefore, we must balance the need to have our incoming supply better meet growing Azure demand with expanding first-party AI usage across services like M365 Copilot and GitHub Copilot, increasing allocations to R&D teams to accelerate product innovation and continued replacement of end-of-life server and networking equipment. She further explained in the Q&A section that Azure revenue was directly downstream from Microsoft’s own capacity allocation: I think it’s probably better to think about the Azure guidance that we give as an allocated capacity guide about what we can deliver in Azure revenue. Because as we spend the capital and put GPUs specifically, it applies to CPUs, the GPUs more specifically, we’re really making long-term decisions. And the first thing we’re doing is solving for the increased usage in sales and the accelerating pace of M365 Copilot as well as GitHub Copilot, our first-party apps. Then we make sure we’re investing in the long-term nature of R&D and product innovation. And much of the acceleration that I think you’ve seen from us and products over the past a bit is coming because we are allocating GPUs and capacity to many of the talented AI people we’ve been hiring over the past years. Then, when you end up, is that, you end up with the remainder going towards serving the Azure capacity that continues to grow in terms of demand. And a way to think about it, because I think, I get asked this question sometimes, is if I had taken the GPUs that just came online in Q1 and Q2 in terms of GPUs and allocated them all to Azure, the KPI would have been over 40. And I think the most important thing to realize is that this is about investing in all the layers of the stack that benefit customers. And I think that’s hopefully helpful in terms of thinking about capital growth, it shows in every piece, it shows in revenue growth across the business and shows as OpEx growth as we invest in our people. Nadella called this a portfolio approach: Basically, as an investor, I think when you think about our capital and you think about the gross margin profile of our portfolio, you should obviously think about Azure. But you should think about M365 Copilot and you should think about GitHub pilot, you should think about Dragon Copilot, Security Copilot. All of those have a gross margin profile and lifetime value. I mean if you think about it, acquiring an Azure customer is super important to us, but so is acquiring an M365 or a GitHub or a Dragon Copilot, which are all by the way incremental businesses and TAMs for us. And so we don’t want to maximize just 1 business of ours, we want to be able to allocate capacity, while we’re sort of supply constrained in a way that allow us to essentially build the best LTV portfolio. That’s on one side. And the other one that Amy mentioned is also R&D. I mean you got to think about compute is also R&D, and that’s sort of the second element of it. And so we are using all of that, obviously, to optimize for the long term. The first part of Nadella’s answer is straightforward: Microsoft makes better margins and has more lifetime value from its productivity applications than it does from renting out Azure capacity, so investors should be happy that it is allocating scarce resources to that side of the business. And, per the competition point above, this is defensive as well: if Microsoft doesn’t get AI right for its own software then competitors will soon be moving in. The R&D point, however, is also critical: Microsoft also needs to be working to expand its offering, and increasingly the way to do that is going to be by using AI to write that new software. That takes a lot of GPUs — so many that Microsoft simply didn’t have enough to meet the 40% Azure growth rate that Wall Street expected. I think it was the right decision. There are some broader issues raised by Microsoft’s capacity allocation. First, we have the most powerful example yet of the downside of having insufficient chips. Hood was explicit that Microsoft could have beat Wall Street’s number if they had enough GPUs; the fact they didn’t was a precipitating factor in losing $357 billion in value. How much greater will the misses be a few years down the road when AI demand expands even further, particularly if TSMC both remains the only option and continues to be conservative in its CapEx ? Secondly, however, it’s fair for Azure customers to feel a bit put out by Microsoft’s decision to favor itself. It reminds me of the pre-TSMC world, when fabs were a part of Integrated Device Manufacturers like Intel or Texas Instruments. If you wanted to manufacture a chip you could contract for space on their lines, but you were liable to lose that capacity if the fab needed it for their own products; TSMC was unique in that they were a pure play foundry: their capacity was solely for their customers, who they weren’t going to compete with. This isn’t the case with Azure: Microsoft has first dibs, and then OpenAI, and then everyone else, and that priority order was made clear this quarter. Moreover, it’s fair to assume that Amazon and Google will make similar prioritization decisions. I didn’t, before writing this article, fully grok the potential for neoclouds, or Oracle for that matter, but the value proposition of offering a pure play token foundry might be larger than I appreciated. All that noted, the safest assumption is that Microsoft, like the rest of Big Tech, will figure this out. Some software may be dead, but not all of it, at least not yet, and the biggest software maker of them all is — thanks in part to that size — positioned to be one of the survivors. It’s just going to need a lot of compute, not only for its customers, but especially for itself. OpenAI is still Azure’s biggest customer, but the fact that the maker of ChatGPT represents 45% of Azure’s Remaining Performance Obligations (RPO) is now seen as a detriment by the market. Bing was briefly interesting when it contained Sydney ; Microsoft quickly squashed what remains the single most compelling AI experience I’ve had and, one could make the case, Bing’s growth prospects. All of Microsoft’s products have a CoPilot of some sort; it’s not clear how well any of them work, and both Claude and OpenAI are attacking the professional productivity space. Microsoft 365 CoPilot has 15 million paying customers, but (1) that’s a tiny fraction of Microsoft 365’s overall customer base and (2) the rise of agents raises serious questions about the long-term viability of the per-seat licensing model on which Microsoft’s productivity business is built.

0 views

A moment with a message from the past

Visited Palmanova plenty of times in my life but never paid attention the writings at the center of main square. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
Michael Lynch Yesterday

My Eighth Year as a Bootstrapped Founder

Eight years ago, I quit my job as a developer at Google to create my own bootstrapped software company. Every year, I post an update about how that’s going and what my life is like as an indie founder. I don’t expect you to go back and read my last seven updates. Here’s all you need to know: People are always most interested in how money works as an indie founder, so I’ll start there. Here’s what my revenue and profit looked like every month this year. In total, I had $8.2k in profit on $16.3k in revenue. That was my total income for the year, which is obviously not enough to support a family, but my wife also works, and we have savings/investments. My main source of revenue was my book. I’m writing a book to teach developers to improve their writing . I did a Kickstarter for it in March, which gave me $6k in pre-sales . As I worked on the book, I offered paid early access. In total, 422 readers purchased early access, for which I’m grateful. I also have an old business that makes $100-200/month without me touching it. My main expenses were computer hardware ($2.1k) and LLMs ($1.9k). I don’t use AI to write, but I use it for a lot of the accessory tasks like fixing rendering/layout issues and improving the website. I also use it for my open-source projects . Here’s how 2025 compared to previous years: The years I was running TinyPilot dominate the chart. Still, 2025 was my fourth most profitable year as a founder. My goal for the year was $50k in profit, so I fell quite short (more on that later ). When I tell other software developers that I’m writing a book, they usually say something like, “Oh, great!” Then, they pause, a little confused. “To give you time to freelance?” And I have to say, “No, I’m just writing a book. That’s my whole job.” When I tell friends and family I’m working on a book, they innocently ask, “Oh, so you’re still on paternity leave?” No! I’m writing a book. It’s a real job! But if I’m being honest, I understand their confusion. How can writing a book be my job? I’m not a novelist. When I started the book, I thought I’d be done in six months. I typically write almost a book’s worth of blog posts per year, and that’s just from an hour of writing per day. If I focus on a book, I should be done in 1/8th the time! It turns out that even when all I have to do is write, I can still only write for about an hour per day. After that, I feel drained, and my writing degrades rapidly. I also can’t just write a book. I also need to find people to read the book, so I’ve been writing blog posts and sharing chapter excerpts. I normally write 5-10 blog posts per year, but I ended up writing far more in the past year than I ever have before: I also started editing blog posts for other developers. That helped me discover other developers’ writing pain points and what advice they found effective. I worked with seven clients, including Tyler Cipriani on a post that reached #1 on Hacker News . And then there’s just a bunch of administrative tasks around writing and selling a book like setting up mailing lists , dealing with Stripe , debugging PDF/epub rendering issues , etc. This has been my favorite year of being a founder since I went off on my own eight years ago. There are a few factors, but the biggest is that I found a business that aligns with me. When I first started as a founder, I didn’t think the particulars of a business mattered. I just pursued any opportunity I saw, even if it was a market I didn’t care about. I’d still get to write software, so wouldn’t that make me happy? It turns out bootstrapped founders don’t spend much time writing code. Especially at the beginning, I have to find customers and talk to them, which is hard when I don’t particularly care about the market beyond the technical challenge of building something. Over several years, I found that there are five criteria that determine how much I enjoy a business: As a concrete example, one of my first businesses was called Is It Keto. It was a simple website that explained whether certain foods fit the keto diet. One of my first businesses, Is It Keto, which told readers which foods fit the keto diet. Here’s how Is It Keto scored on my rubric: Now, let me compare Is It Keto to writing my book: The book doesn’t check all my boxes perfectly, but it aligns better with my five criteria than any business I’ve created before. At the end of my first year as a founder , I wrote: As someone who has always valued independence, I love being a solo developer. It makes a world of difference to wake up whenever I want and make my own choices about how to spend my entire day. My friends with children tell me that kids won’t complicate this at all. When I wrote that in 2019, I was in my early thirties, single, and living alone. A few weeks after writing that post, I met someone. We moved in together at the end of that year, married a few years later, and had our first child in 2024. Now, there are lots of people in our house, as my wife and I work from home, and members of our extended family come over every weekday to help with childcare. Despite all of those changes, my life is still how I described it seven years ago. Okay, things aren’t exactly the same. My toddler decides when I wake up, and it’s not always the time his independence-loving father would choose. But I still feel the joy of spending my workdays on whatever I choose. I joked back in 2019 about how kids would complicate my life as an indie founder, but it’s actually less complicated than I expected. My workdays mostly look the same. Except they’re more fun because anytime I want, I can take a break from work to go play with my son. After several years of just “enjoying” life as a bootstrapped founder, I’m happy to say that I love it again. I still want to do it forever. I originally thought I’d finish the book in six months, but I’m 13 months in and still have about 20% left. From reading about other developers’ experience writing books, underestimating time seems to be the norm. Teiva Harsanyi thought he’d be done in eight months, but it actually took him almost two years . Austin Henley started writing a book in 2023 and it dragged on for about two years before he got tired of working with his publisher and canceled his book deal . As much as I love writing code, programming itself isn’t enough to make me enjoy my work. I need to find a business that matches my interests, values, and skills. Before I became a parent, I worried that I wouldn’t have the flexibility to be a founder. In the first few months after my son arrived, I worried that parenting would take up so much time that I couldn’t work at all , much less run my own business. Fortunately, I’ve been able to find a comfortable balance where I spend my workdays as a founder while still being the parent I want to be. Last year, I set three high-level goals that I wanted to achieve during the year. Here’s how I did against those goals: I wasn’t confident I’d earn $50k from the book, but I thought I’d have time while writing to launch side businesses. I also expected to complete the book in just six months, giving me even more time for new business ideas in the second half of the year. Instead, I spent the full year on the book. It made $11.8k, which I’m proud of as pre-sales for a first-time author, but it’s less than I hoped to earn this year. Okay, okay! I didn’t finish the book! Enough of your cruel judgment, Michael from a year ago . I played around with Gleam and appreciated some aspects of it, but I never got deep enough to feel productive in the language. I learn best when I can find a project that takes advantage of a new technology, but I couldn’t think of anything where Gleam had a compelling edge over languages I know well like Go or Python. I’d like to find at least five examples of readers who cite my book as a resource that helped them achieve something tangible (e.g., grow their blog readership, get a promotion). I earned $8.2k this year, so I just have to do 9x as well next year. But honestly, I think this is doable if I can keep finding new readers for the book and try a few business ideas. I’ve enjoyed a year of writing, but I’d like to do more software development, as that’s still what I find most exciting. Cover image by Piotr Letachowicz . 2018 - 2020 - Quit my job and created several unprofitable businesses. 2020 - 2024 - Created a product called TinyPilot that let people control their computers remotely. 2024 - Sold TinyPilot , became a father . 13 blog posts (8 on my personal blog and 5 on my book’s blog ) 12 notes (shorter, less polished blog posts) 12 monthly retrospectives 150 pages of my book, including seven chapters I adapted into free excerpts I enjoy the domain and relate to the customers It leverages my skills It earns money It facilitates work-life balance It aligns interests between me and my users Result : I earned $8.2k in profit. Result : I’m about 80% done with my book. Result : I experimented with Gleam but didn’t reach competence My First Year as a Solo Developer - Feb. 1, 2019 My Second Year as a Solo Developer - Jan. 31, 2020 My Third Year as a Solo Developer - Feb. 1, 2021 My Fourth Year as a Bootstrapped Founder - Feb. 1, 2022 My Fifth Year as a Bootstrapped Founder - Feb. 10, 2023 My Sixth Year as a Bootstrapped Founder - Feb. 16, 2024 My Seventh Year as a Bootstrapped Founder - Feb. 3, 2025 My Eighth Year as a Bootstrapped Founder- Feb. 3, 2026

0 views
Justin Duke Yesterday

January, 2026

This is not, if I'm being honest, the simple, structured start to 2026 that I had in mind. Rigor and early workouts have been replaced by pulled floors and sheets of ice. After spending a lovely week in Park City with the Third South folks, we came back home and had 12 hours of respite until, board by board, our floors were pulled up for replacement. The good news — it's always important to focus on the good news — is that the damage was less extensive than we expected. The bad news, because there is always bad news to go along with good news, is that this week we learned that we would be hit by an ice storm. And so, we decamped at my parents' house, the same one that I spent my formative years reading Redwall and playing Final Fantasy even though my parents thought I was asleep. Haley and I are, to a fault, creatures of habit and routine, and it would be a lie to say that the past two weeks haven't been draining in the same way a day spent in transit is draining. We miss our house. We miss our things. For Lucy, though, this has been a permanent vacation — a whirlwind of delight that started in Utah and has extended without ceasing. In the span of two weeks, she went from walking, if she remembered about it, to quite literally sprinting through the house, chasing anything and everything she wanted. It is fascinating to watch a toddler learn about the world. There is a transparency to them, and her effortless and endless delight in discovering the cause and effect of things that I have cynically grown to consider mundane — such as a light switch — more than makes up for a little bit of inclement weather. I haven't been working much. I haven't been writing much. I haven't been reading much. I have been watching my daughter discover the world and run headlong into it, hands outstretched. | Post | Genre | | ------------------------------------------------------------------------------------------------ | ---------- | | Tabula Rasa (Vol. 1) | Book | | Levels of the Game | Book | | A Shadow Intelligence | Book | | Cameraperson | Film | | Eternity | Film | | Go | Film | | Tinker Tailor Soldier Spy (2011) | Film | | The Pigeon Tunnel | Film | | Ocean's Twelve | Film | | Uses (January 2026) | Personal | | Terragon, Conductor, PyCharm | Technology | | Migrating to PlanetScale | Technology | | Refactoring a product is tricky | Technology | | Every model should have a notes field | Technology | | Pure strategy | Technology | | The Diplomat (Season 2) | Television |

0 views

WhatsApp Encryption, a Lawsuit, and a Lot of Noise

It’s not every day that we see mainstream media get excited about encryption apps! For that reason, the past several days have been fascinating, since we’ve been given not one but several unusual stories about the encryption used in WhatsApp. Or more accurately, if you read the story, a pretty wild allegation that the widely-used app lacks encryption . This is a nice departure from our ordinary encryption-app fare on this blog, which mainly deals with people (governments, usually) claiming that WhatsApp is too encrypted.Since there have now been several stories on the topic, and even folks like Elon Musk have gotten into the action, I figured it might be good to write a bit of an explainer about it. Our story begins with a new class action lawsuit filed by the esteemed law firm Quinn Emanuel on behalf of several plaintiffs. The lawsuit notes that WhatsApp claims to use end-to-end encryption to protect its users, but alleges that all WhatsApp users’ private data is secretly available through a special terminal on Mark Zuckerberg’s desk. Ok, the lawsuit does not say precisely that — but it comes pretty darn close: The complaint isn’t very satisfying, nor does it offer any solid evidence for any of these claims. Nonetheless, the claims have been heavily amplified online by various predictable figures, such as Elon Musk and Pavel Durov , both of whom (coincidentally) operate competing messaging apps. Making things a bit more exciting, Bloomberg reports that US authorities are now investigating Meta , the owner of WhatsApp, based on these same allegations. (How much weight you assign to this really depends on what you think of the current Justice Department.) If you’re really looking to understand what’s being claimed here, the best way to do it is to read the complaint yourself: you can find it here (PDF). Alternatively, you can save yourself a lot of time and read the next five sentences, which contain pretty much the same amount of factual information: Here’s the nut of it: The Internet has mostly divided itself into people who already know these allegations are true, because they don’t trust Meta and of course Meta can read your messages — and a second set of people who also don’t trust Meta but mostly think this is unsupported nonsense. Since I’ve worked on end-to-end encryption for the last 15+ years, and I’ve specifically focused on the kinds of systems that drive apps like WhatsApp, iMessage and Signal, I tend to fall into the latter group. But that doesn’t mean there’s nothing to pay attentionto here. Hence: in this post I’m going to talk a little bit about the specifics of WhatsApp encryption; what an allegation like this would imply (technically); we can verify that things like this are true (or not verify, as the case may be). More generally I’ll try to add some signal to the noise. Full disclosure: back in 2016 I consulted for Facebook (now Meta) for about two weeks, helping them with the rollout of encryption in Facebook Messenger. From time to time I also talk to WhatsApp engineers about new features they’re considering rolling out. I don’t get paid for doing this; they once asked me if I’d consider signing an NDA and I told them I’d rather not. Instant messaging apps are pretty ancient technology. Modern IM dates from the 1990s, but the basic ideas go back to the days of time sharing . Only two major things have really changed in messaging apps since the days of AOL Instant Messenger: the scale, and also the security of these systems. In terms of scale, modern messaging apps are unbelievably huge. At the start of the period in the lawsuit, WhatsApp already had more than one billion monthly active users . Today that number sits closer to three billion . This is almost half the planet. In many countries, WhatsApp is more popular than phone calls. The downside of vast scale is that apps like this can also collect data at similarly large scale. Every time you send a message through an app like WhatsApp, you’re sending that data first to a server run by WhatsApp’s parent company, Meta. That server then stores it and eventually delivers it to your intended recipients. Without great care, this can result in enormous amounts of real-time message collection and long-term storage. The risks here are obvious. Even if you trust your provider, that data can potentially be accessed by hackers, state-sponsored attackers, governments, and anyone who can compel or gain access to Meta’s platforms. To combat this, WhatsApp’s founders Jan Koum and Brian Acton took a very opinionated approach to the design of their app. Beginning in 2014 (around the time they were acquired by Facebook), the app began rolling out end-to-end (E2E) encryption based on the Signal protocol . This design ensures that all messages sent through Meta/WhatsApp infrastructure are encrypted, both in transit and on Meta’s servers. By design, the keys required to decrypt messages exist only on a users’ device (the “end” in E2E), ensuring that even a malicious platform provider (or hacker of Meta’s servers) should never be able to read the content of your messages. Due to WhatsApp’s huge scale, the adoption of end-to-end encryption on the platform was a very big deal. Not only does WhatsApp’s encryption prevent Meta from mining your chat content for advertising or AI training, the deployment of this feature made many governments frantic with worry. The main reason was that even law enforcement can’t access encrypted messages sent through WhatsApp (at least, not through Meta itself.). To the surprise at many, Koum and Acton made a convert of Facebook’s CEO, Mark Zuckerberg, who decided to lean into new encryption features across many of the company’s products, including Facebook Messenger and (optionally) Instagram DMs. This decision is controversial, and making it has not been cost-free for Meta/Facebook. The deployment of encryption in Meta’s products has created enormous political friction with the governments of the US, UK, Australia, India and the EU. Each government is concerned about the possibility that Meta will maintain large numbers of messages they cannot access, even with a warrant. For example, in 2019 a multi-government “open letter” signed by US AG William Barr urged Facebook not to expand end-to-end encryption without the addition of “lawful access” mechanisms: So that’s the background. Today WhatsApp describes itself as serving on the order of three billion users worldwide, and end-to-end encryption is on by default for personal messaging . They haven’t once been ambiguous about what they claim to offer. That means that if the allegations in the lawsuit proved to be true, this would be one of the largest corporate coverups since Dupont . The best thing about end-to-end encryption — when it works correctly — is that the encryption is performed in an app on your own phone . In principle, this means that only you and your communication partner have the keys, and all of those keys are under your control. While this sounds perfect, there’s an obvious caveat: while the app runs on your phone, it’s a piece of software. And the problem with most software is that you probably didn’t write it. In the case of WhatsApp, the application software is written by a team inside of Meta. This wouldn’t necessarily be a bad thing if the code was open source, and outside experts could review the implementation. Unfortunately WhatsApp is closed-source, which means that you cannot easily download the source code to see if encryption performed correctly, or performed at all. Nor can you compile your own copy of the WhatsApp app and compare it to the version you download from the Play or App Store. (This is not a crazy thing to hope for: you actually can do those things with open-source apps like Signal. ) While the company claims to share its code with outside security reviewers, they don’t publish routine security reviews. None of this is really unusual — in fact, it’s extremely normal for most commercial apps! But it means that as a user, you are to some extent trusting that WhatsApp is not running a long-con on its three billion users. If you’re a distrustful, paranoid person (or if you’re a security engineer) you’d probably find this need for trust deeply unappealing. Given the closed-source nature of WhatsApp, how do we know that WhatsApp is actually encrypting its data? The company is very clear in its claims that it does encrypt . But if we accept the possibility that they’re lying: is it at least possible that WhatsApp contains a secret “backdoor” that causes it to secretly exfiltrate a second copy of each message (or perhaps just the encryption keys) to a special server at Meta? I cannot definitively tell you that this is not the case. I can, however, tell, you that if WhatsApp did this, they (1) would get caught, (2) the evidence would almost certainly be visible in WhatsApp’s application code, and (3) it would expose WhatsApp and Meta to exciting new forms of ruin. The most important thing to keep in mind here is that Meta’s encryption happens on the client application, the one you run on your phone. If the claims in this lawsuit are true, then Meta would have to alter the WhatsApp application so that plaintext (unencrypted) data would be uploaded from your app’s message database to some infrastructure at Meta, or else the keys would. And this should not be some rare, occasional glitch . The allegations in the lawsuit state that this applied to nearly all users, and for every message ever sent by those users since they signed up. Those constraints would tend to make this a very detectable problem. Even if WhatsApp’s app source code is not public, many historical versions of the compiled app are available for download. You can pull one down right now and decompile it using various tools, to see if your data or keys are being exfiltrated. I freely acknowledge that this is a big project that requires specialized expertise — you will not finish it by yourself in a weekend (as commenters on HN have politely pointed out to me.) Still, reverse-engineering WhatsApp’s client code is entirely possible and various parts of the app have indeed been reversed several times by various security researchers. The answer really is knowable, and if there is a crime, then the evidence is almost certainly* right there in the code that we’re all running on our phones. If you’re going to (metaphorically) commit a crime, doing it in a forensically-detectable manner is very stupid. Several online commenters have pointed out that there are loopholes in WhatsApp’s end-to-end encryption guarantees. These include certain types of data that are explicitly shared with WhatsApp, such as business communications (when you WhatsApp chat with a company, for example.) In fairness, both WhatsApp and the lawsuit are very clear about these exceptions. These exceptions are real and important. WhatsApp’s encryption protects the content of your messages, it does not necessarily protect information about who you’re talking to, when messages were sent, and how your social graph is structured. WhatsApp’s own privacy materials talk about how personal message content is protected while other categories of data exist. Another big question for any E2E encrypted messaging app is what happens after the encrypted message arrives at your phone and is decrypted. For example, if you choose to back up your phone to a cloud service, this often involves sending plaintext copies of your message to a server that is not under your control. Users really like this, since it means they can re-download their chat history if they lose a phone. But it also presents a security vulnerability, since those cloud backups are not always encrypted. Unfortunately, WhatsApp’s backup situation is complex. Truthfully, it’s more of a Choose Your Own Adventure novel: Finally, WhatsApp has recently been adding AI features. If you opt into certain AI tools (like message summaries or writing help), some content may be send off-device for processing a system WhatsApp calls “ Private Processing ,” which is built around Trusted Execution Environments (TEEs). WhatsApp’s user-facing overview is here , Meta’s technical whitepaper is here , and Meta’s engineering post is here . This capability should not reveal plaintext data to Meta, either: more importantly, it’s brand new and much more recent than the allegations int he lawsuit. As a technologist, I love to write about the weaknesses and limitations of end-to-end encryption in practice. But it’s important to be clear: none of these loopholes stuff can account for what’s being alleged in this lawsuit . This lawsuit is claiming something much more deliberate and ugly. When I’m speaking to laypeople, I like to keep things simple. I tell them that cryptography allows us to trust our machines. But this isn’t really an accurate statement of what cryptography does for us. At the end of the day, all cryptography can really do is extend trust. Encryption protocols like Signal allow us to take some anchor-point we trust — a machine, a moment in time, a network, a piece of software — and then spread that trust across time and space. Done well, cryptography allows us to treat hostile networks as safe places; to be confident that our data is secure when we lose our phones; or even to communicate privately in the presence of the most data-hungry corporation on the planet. But for this vision of cryptography to make sense, there has to be trust in the first place. It’s been more than forty years since Ken Thompson delivered his famous talk, “ Reflections on Trusting Trust “, which pointed out how there is no avoiding some level of trust . Hence the question here is not: should we trust someone. That decision is already taken. It’s: should we trust that WhatsApp is not running the biggest fraud in technology history. The decision to trust WhatsApp on this point seems perfectly reasonable to me, in the absence of any concrete evidence to the contrary. In return for making that assumption, you get to communicate with the three billion people who use WhatsApp. But this is not the only choice you can make! If you don’t trust WhatsApp (and there are reasonable non-conspiratorial arguments not to), then the correct answer is to move to another application; I recommend Signal . * Without leaving evidence in the code, WhatsApp could try to compromise the crypto purely on the server side, e.g., by running man-in-the-middle attacks against users’ key exchanges. This has even been proposed by various government agencies, as a way to attack targeted messaging app users. The main problem with this approach is the need to “target”. Performing mass-scale MITM against WhatsApp users in a manner described by this complaint would require (1) disabling the security code system within the app, and (2) hoping that nobody ever notices that WhatsApp servers are distributing the wrong keys. This seems very unlikely to me. The plaintiffs (users of WhatsApp) have all used WhatsApp for years. Through this entire period, WhatsApp has advertised that it uses end-to-end encryption to protect message content, specifically, through the use of the Signal encryption protocol. According to unspecified “whistleblowers”, since April 2016, WhatsApp (owned by Meta) has been able to read the messages of every single user on its platform, except for some celebrities. If you use native device backup on iOS or Android devices (for example, iCloud device backup or the standard Android/Google backup), your WhatsApp message database may be included in a device backup sent to Apple or Google . Whether that backup is end-to-end encrypted depends on what your provider supports and what you’ve enabled. On Apple platforms, for example, iCloud backups can be end-to-end encrypted if you enable Apple’s Advanced Data Protection feature, but won’t be otherwise. Note that in both cases, the backup data ends up with Apple or Google and not with Meta as the lawsuit alleges. But this still sucks . WhatsApp has its own backup feature (actually, it has more than one way to do it.) WhatsApp supports end-to-end encrypted backups that can be protected with a password, a 64-digit key, and (more recently) passkeys. WhatsApp’s public docs are here and WhatsApp’s engineering writeup of the key-vault design is here . Conceptually, this is an interesting compromise: it reduces what cloud providers can read, but it introduces new key-management and recovery assumptions (and, depending on configuration, new places to attack). Importantly, even if you think backups are a mess — and they often are — this is still a far cry from the effortless, universal access alleged in this lawsuit.

0 views

Please Don’t Feed the Scattered Lapsus ShinyHunters

A prolific data ransom gang that calls itself Scattered Lapsus ShinyHunters (SLSH) has a distinctive playbook when it seeks to extort payment from victim firms: Harassing, threatening and even swatting executives and their families, all while notifying journalists and regulators about the extent of the intrusion. Some victims reportedly are paying — perhaps as much to contain the stolen data as to stop the escalating personal attacks. But a top SLSH expert warns that engaging at all beyond a “We’re not paying” response only encourages further harassment, noting that the group’s fractious and unreliable history means the only winning move is not to pay. Image: Shutterstock.com, @Mungujakisa Unlike traditional, highly regimented Russia-based ransomware affiliate groups, SLSH is an unruly and somewhat fluid English-language extortion gang that appears uninterested in building a reputation of consistent behavior whereby victims might have some measure of confidence that the criminals will keep their word if paid. That’s according to Allison Nixon , director of research at the New York City based security consultancy Unit 221 . Nixon has been closely tracking the criminal group and individual members as they bounce between various Telegram channels used to extort and harass victims, and she said SLSH differs from traditional data ransom groups in other important ways that argue against trusting them to do anything they say they’ll do — such as destroying stolen data. Like SLSH, many traditional Russian ransomware groups have employed high-pressure tactics to force payment in exchange for a decryption key and/or a promise to delete stolen data, such as publishing a dark web shaming blog with samples of stolen data next to a countdown clock, or notifying journalists and board members of the victim company. But Nixon said the extortion from SLSH quickly escalates way beyond that — to threats of physical violence against executives and their families, DDoS attacks on the victim’s website, and repeated email-flooding campaigns. SLSH is known for breaking into companies by phishing employees over the phone, and using the purloined access to steal sensitive internal data. In a January 30 blog post , Google’s security forensics firm Mandiant said SLSH’s most recent extortion attacks stem from incidents spanning early to mid-January 2026, when SLSH members pretended to be IT staff and called employees at targeted victim organizations claiming that the company was updating MFA settings. “The threat actor directed the employees to victim-branded credential harvesting sites to capture their SSO credentials and MFA codes, and then registered their own device for MFA,” the blog post explained. Victims often first learn of the breach when their brand name is uttered on whatever ephemeral new public Telegram group chat SLSH is using to threaten, extort and harass their prey. According to Nixon, the coordinated harassment on the SLSH Telegram channels is part of a well-orchestrated strategy to overwhelm the victim organization by manufacturing humiliation that pushes them over the threshold to pay. Nixon said multiple executives at targeted organizations have been subject to “swatting” attacks, wherein SLSH communicated a phony bomb threat or hostage situation at the target’s address in the hopes of eliciting a heavily armed police response at their home or place of work. “A big part of what they’re doing to victims is the psychological aspect of it, like harassing executives’ kids and threatening the board of the company,” Nixon told KrebsOnSecurity. “And while these victims are getting extortion demands, they’re simultaneously getting outreach from media outlets saying, ‘Hey, do you have any comments on the bad things we’re going to write about you.” Nixon argues that no one should negotiate with SLSH because the group has demonstrated a willingness to extort victims based on promises that it has no intention to keep. Nixon points out that all of SLSH’s known members hail from The Com , shorthand for a constellation of cybercrime-focused Discord and Telegram communities which serve as a kind of distributed social network that facilitates instant collaboration . Nixon said Com-based extortion groups tend to instigate feuds and drama between group members, leading to lying, betrayals, credibility destroying behavior, backstabbing, and sabotaging each other. “With this type of ongoing dysfunction, often compounding by substance abuse, these threat actors often aren’t able to act with the core goal in mind of completing a successful, strategic ransom operation,” Nixon said. “They continually lose control with outbursts that put their strategy and operational security at risk, which severely limits their ability to build a professional, scalable, and sophisticated criminal organization network for continued successful ransoms – unlike other, more tenured and professional criminal organizations focused on ransomware alone.” Intrusions from established ransomware groups typically center around encryption/decryption malware that mostly stays on the affected machine. In contrast, Nixon said, ransom from a Com group is often structured the same as violent sextortion schemes against minors, wherein members of The Com will steal damaging information, threaten to release it, and “promise” to delete it if the victim complies without any guarantee or technical proof point that they will keep their word. She writes: The SLSH group steals a significant amount of corporate data, and on the day of issuing the ransom notification, they line up a number of harassment attacks to be delivered simultaneously with the ransom. This can include swatting, DDOS, email/SMS/call floods, negative PR, complaints sent to authority figures in and above the company, and so on. Then, during the negotiation process, they lay on the pressure with more harassment- never allowing too much time to pass before a new harassment attack. What they negotiate for is the promise to not leak the data if you pay the ransom. This promise places a lot of trust in the extorter, because they cannot prove they deleted the data, and we believe they don’t intend to delete the data. Paying provides them vital information about the value of the stolen dataset which we believe will be useful for fraud operations after this wave is complete. A key component of SLSH’s efforts to convince victims to pay, Nixon said, involves manipulating the media into hyping the threat posed by this group. This approach also borrows a page from the playbook of sextortion attacks, she said, which encourages predators to keep targets continuously engaged and worrying about the consequences of non-compliance. “On days where SLSH had no substantial criminal ‘win’ to announce, they focused on announcing death threats and harassment to keep law enforcement, journalists, and cybercrime industry professionals focused on this group,” she said. An excerpt from a sextortion tutorial from a Com-based Telegram channel. Image: Unit 221B. Nixon knows a thing or two about being threatened by SLSH: For the past several months, the group’s Telegram channels have been replete with threats of physical violence against her, against Yours Truly, and against other security researchers. These threats, she said, are just another way the group seeks to generate media attention and achieve a veneer of credibility, but they are useful as indicators of compromise because SLSH members tend to name drop and malign security researchers even in their communications with victims. “Watch for the following behaviors in their communications to you or their public statements,” Nixon said. “Repeated abusive mentions of Allison Nixon (or “A.N”), Unit 221B, or cybersecurity journalists—especially Brian Krebs—or any other cybersecurity employee, or cybersecurity company. Any threats to kill, or commit terrorism, or violence against internal employees, cybersecurity employees, investigators, and journalists.” Unit 221B says that while the pressure campaign during an extortion attempt may be traumatizing to employees, executives, and their family members, entering into drawn-out negotiations with SLSH incentivizes the group to increase the level of harm and risk, which could include the physical safety of employees and their families. “The breached data will never go back to the way it was, but we can assure you that the harassment will end,” Nixon said. “So, your decision to pay should be a separate issue from the harassment. We believe that when you separate these issues, you will objectively see that the best course of action to protect your interests, in both the short and long term, is to refuse payment.”

0 views
daniel.haxx.se 2 days ago

A third medal

In January 2025 I received the European Open Source Achievement Award . The physical manifestation of that prize was a trophy made of translucent acrylic (or something similar). The blog post I above has a short video where I show it off. In the year that passed since, we have established an organization for how do the awards going forward in the European Open Source Academy and we have arranged the creation of actual medals for the awardees. That was the medal we gave the award winners last week at the award ceremony where I handed Greg his prize. I was however not prepared for it, but as a direct consequence I was handed a medal this year , in recognition for the award a got last year , because now there is a medal. A retroactive medal if you wish. It felt almost like getting the award again. An honor. The box The backside Front The medal design The medal is made in a shiny metal, roughly 50mm in diameter. In the middle of it is a modern version (with details inspired by PCB looks) of the Yggdrasil tree from old Norse mythology – the “World Tree”. A source of life, a sacred meeting place for gods. In a circle around the tree are twelve stars , to visualize the EU and European connection. On the backside, the year and the name are engraved above an EU flag, and the same circle of twelve stars is used there as a margin too, like on the front side. The medal has a blue and white ribbon, to enable it to be draped over the head and hung from the neck. The box is sturdy thing in dark blue velvet-like covering with European Open Source Academy printed on it next to the academy’s logo. The same motif is also in the inside of the top part of the box. I do feel overwhelmed and I acknowledge that I have receive many medals by now. I still want to document them and show them in detail to you, dear reader. To show appreciation; not to boast.

0 views
Heather Burns 2 days ago

the other Turing test

Everything which was behind us is in front of us. It's up to you how you respond.

0 views