Latest Posts (20 found)

Reverse-engineering Prose From Internet Lingo

Read on the website: Internet learned to speak gibberish that doesn’t always coincide with literary text. But it can be converted back to that. Here’s my experiment along these lines.

0 views

A day in the life of a Japanese indie developer

In the morning, I saw my daughter off to her school bus stop. I recently came across comedian Atsushi Tamura’s conversational nodding technique and thought it sounded interesting, so I immediately tried it out during some small talk with my mama-tomo (mom friend). The results were instant. The method is simple: completely turn off the critical thinking mindset and pour 100% of my mental energy into active listening and nodding. I focused entirely on how to vary my responses, using things like "Ohh," "Yeah," "Hmm," and "I see." In Japanese, I say 「へぇ」「うん」「うーん」「なるほど〜」. You should really give it a shot—it helps you understand the other person's point much better, and natural follow-up questions or reactions just pop into your head more easily. It takes the pressure off because you don’t have to squeeze out an interesting story of your own. Conversations don't even need a solid conclusion; you can just wrap things up with a "Right, makes sense," "That's great," or "Alright, see you later!" In Japanese, 「そうなんですね」「いいですね」「ほんじゃお疲れ様です〜」. If you're ever stuck on how to respond, just saying "That's great" is super convenient! It works just as well when talking to guys, and I bet it's useful for interview content, too. Love it. The weather was gloomy and I felt sluggish, so I spent some time doing mindless tasks in my room until I could find some motivation, like taking photos of receipts. Once I snap the photos, I send them off to my back-office assistant. I’ll want to replace this process with AI eventually. My receipts are pretty much exclusively from cafes lol. I checked the user forum and saw a reply from the user who reported an Inkdrop bug yesterday. He seemed happy that we were able to track down the cause together. That's great. Moments like this are honestly one of the best parts of being an indie developer. I want to keep doing this until the day I die. After playing with my six-month-old baby for a bit, my motivation kicked in, so I headed to a cafe. I cleared the tasks I’ve left unfinished from yesterday: adding exception handling to the AI features, maintaining plugins, and updating the manuals and API documentation. A guy sitting behind me was loudly holding forth about "how the younger generation leaves messages on 'read' (read-receipt ghosting/既読スルー)." Meanwhile, Claude Code drains the battery, so my PC is already down to half power. It completely ruins the energy efficiency of Apple Silicon. When I stepped out of the cafe, it was pouring rain. It felt nice and cool. I took a walk through the park while figuring out where to grab lunch and decided to check out a bookstore-slash-cafe I'd been curious about: Calo Bookshop & Cafe / Calo Gallery . I ordered the chicken curry. I noticed that a lot of the books on display seemed to blend art and politics. Just as I was thinking the themes and designs of the books were a bit quirky, I realized they were ZINEs. That made total sense. The curry was good. Only one other customer came in during my stay, and he left quickly. By the time I walked out, the rain had stopped. Time to head back. The atmospheric low pressure is making me feel heavy. My eldest daughter was already home from kindergarten, and I ended up taking a whole one-hour nap. That was unexpected, hmm. She then left for her gymnastics class. Last week, one of my users Adrián shared his Claude Code Skills with me, which uses Inkdrop as a persistence layer. While checking it out, I remembered a blog post by Nolan Lawson (PouchDB author) I read yesterday titled " Using AI to write better code more slowly " and tweeted about it in Japanese . I like his point of view so much. He mentioned Matt Pocok's , which is included in Adrián’s Skills as well. Since Nolan runs his AI agents in parallel (Claude sub-agent, Codex, and Cursor Bugbot), I wanted to try doing that myself, so I downloaded and set up Antigravity CLI, which is a replacement for Gemini CLI. Lately, I've been really liking a Neovim plugin called . I suddenly remembered that I had added a small feature to it the other day, so I sent a PR . By evening, my daughter returned from gymnastics. My focus ended—time to cook dinner. For dinner, I boiled some pasta I bought last weekend from the Italian Fair held at the Hankyu Umeda. It was thick pasta that looked sort of like dreadlocks, and it tasted great. It makes me want to visit Italy again. I use Claude Code in English every day, and today I learned the word "idempotent" (冪等性). For example: "Make this event handler idempotent." Meaning: make it so that no matter how many times you run this event handler, the outcome remains exactly the same. 冪等 is also a difficult word in Japanese. Lately, I've been listening to Laura day romance almost exclusively. The literary lyrics combined with the melancholic vocals and expressions make for a really chill vibe. Tonight, I’m going to read a bit more of a polar explorer Daisuke Kakuhata’s book, The 43-Year-Old Peak Theory , and head to bed. Good night.

0 views

Fragments: May 27

At the GOTO Conference in Copenhagen in 2025, Kent Beck and I spent some time on stage talking and answering questions from the audience - a format I refer to as “two old geezers on a park bench”. We talk about our experiences with LLM-augmented programming (at that point - October 2025), we show our frustration that things we’ve been saying for thirty years still need to be said, we say how anything like a manifesto reunion needs to be led by a younger generation, and opine on what junior developers should be focusing on in their career. ❄                ❄                ❄                ❄                ❄ Ian Johnson has written a series of posts about restructuring a gnarly codebase The story follows a real Laravel + React codebase over ~3 months and ~258 commits from a legacy monolith with no tests to a well-structured application with automated quality gates, a React SPA migration in progress, and an AI agent that reliably ships production code with minimal supervision. The series covers the steps in decent detail, and his approach follows the kinds of steps I’d use. First get everything under the control of decent characterization tests, add static analysis, introduce the right patterns to make things flow easily. With all of this, is his use of AI, which changed during the exercise: For the first two months of this project, I used Claude Code with auto-approve turned off. Every file edit, every terminal command, every change… I reviewed it before it executed. […] The results were good. The code was clean. But I was doing most of the thinking and half the typing. The agent was a fancy autocomplete with better suggestions. I wasn’t getting the leverage I’d hoped for. I read an article about “on-the-loop” versus “in-the-loop” human-AI collaboration. The framing clicked immediately […] I was micromanaging because I didn’t trust the agent to do the right thing. And I didn’t trust the agent because there was nothing forcing it to do the right thing. His early steps put in tests, static analysis, and the right architectural patterns. With those in place, he could let the agent do more work. My role shifted from writer to curator. I don’t write most of the code anymore. I Define the patterns […] Review the test specs […] Review the output […] Update the harness […] Make strategic decisions […] He finishes the series with conclusions about how he’d generalize his experience to other circumstances. ❄                ❄                ❄                ❄                ❄ Back in the land of my birth, there was some notable groans when the National Health Service decided to close nearly all of their Open Source repositories , supposedly to the security threat of LLMs. Closing repos like this isn’t an effective counter to LLM-augmented attackers. I suspect it’s no coincidence to see GDS (Government Data Services), the highly-regarded IT enablers in the UK government publish their position Moving code from public to private as a substitute for investment in secure-by-design delivery, ownership and remediation is a warning sign because it reduces sharing and scrutiny, can slow coordinated improvement across government and suppliers, and does not remove the underlying weaknesses in a running service. Terence Eden memorably sums up his view on this: Within the UK’s Civil Service you occasionally hear the expression “being invited to a meeting without biscuits”. It implies a rather frosty discussion without any of the polite niceties of a normal meeting. ❄                ❄                ❄                ❄                ❄ I’ve seen a few cases where those developers who are most involved in working with LLMs find they are running into a problem with cognitive endurance, Adam Tornhill has joined this group : One of the big wins with agents is that they let us stay with the higher-level problem for longer. We get less sidetracked by details, dependency cleanup, and similar secondary tasks that used to break concentration. But there is a cost we are still underestimating. Agentic coding is mentally expensive. I can usually sustain the pace for a couple of hours. Then I need a break. The pace is simply too intense. And based on conversations with other engineers, I do not think I am alone in that. He explains that working with The Genie means we are making more decisions in less time, this increase in decision density is hard on the brain. He responds by keeping agent tasks small, automating everything he can, and accepting that he won’t know every line of code as long as he has good verification mechanisms in place. Notably, he has not gone in the direction of doing his work with swarms of agents that he coordinates. Instead has one long-running task that he babysits and one focus task That last point is important given the running-twenty-agents-in-parallel hype. I cannot even think about twenty meaningful things to build, and even less so about the resulting cognitive tax of the likely interruptions. It’s exactly the wrong thing to even consider. At least for humans. (And yes, I understand sub-agents and machine parallelisation. That is not what I’m objecting to. It is the parallelisation of human attention that does not scale). I liked that he included some thoughts about what folks can do in time outside this intense programming time. Not just “have a coffee” (although he includes that) but also about learning about the domain that the software supports. ❄                ❄                ❄                ❄                ❄ A couple of pithy quotes from social media Lorin Hochstein “Metaphor debt” is when all of your metaphors involve the concept of “debt” because you can’t think of any other metaphors anymore. ❄                ❄ Daniel Terhorst-North If a vegan crossfit fan is using Claude to write Rust, which thing do they tell you first? ❄                ❄                ❄                ❄                ❄ Karl Bode reacts to speakers getting booed when mentioning AI during commencement addresses. He points out that younger folks are increasingly unhappy with the tech oligarchy and their fruits . The thing is the kids aren’t stupid. They see the field clearly. They see the difference between what’s being sold to them by tech companies, the press, and commencement speakers, and what they have repeatedly seen with their own eyes. They’ve watched tech oligarchs spend the last decade mired in scandal after scandal, hype cycle after hype cycle, steadily enshittifying everything they touch along the way. The percentage of Gen Z that think AI’s benefits don’t counterbalance the risks now sits around fifty percent, up 11 percentage points in just the last year. Eight out of every ten believe that using AI makes the process of actual learning more difficult. He sees young people saddled with the perception of entering a worsening world - which leads them to rage against this latest fruit of the tech oligarchy. A rage that is easy for folks like me - with a comfortable retirement off-ramp - to properly appreciate. A rage that could have marked political and social consequences. ❄                ❄                ❄                ❄                ❄ Relevant to these concerns are a couple of items in last week’s Economist newspaper. The newspaper argues that historically major technological advances haven’t led to significant unemployment or drops in wages ( paywalled article ). The closest was the original industrial revolution in 19th Century Britain. There was a stagnation in wages during this period, but there was also a massive increase in population, from 4½ million to 12 million. It also points out that we’ll probably only understand the full consequences of all this when a recession hits, as this is when most unproductive jobs tend to be flushed out of the system. A second article ( also paywalled ) indicates that AI is having some effect on graduate hiring. They did an analysis of surveys of recent graduates, looking to see if employment varied depending on a job’s exposure to AI. The least exposed quintile of subjects saw employment rate fall by 1.5% over the last couple of years, while the most exposed quintile’s drop was 6.6%. ❄                ❄                ❄                ❄                ❄ Lawfare isn’t impressed with the latest efforts by the US Government to regulate AI. On [last] Wednesday, the White House invited leaders of OpenAI, Google, Anthropic, Meta, and Microsoft to the Oval Office for a signing ceremony the following afternoon. President Trump was to sign an executive order on AI and cybersecurity—the administration’s most formal effort yet to establish a voluntary process for reviewing frontier models before their release. But roughly three hours before the ceremony, when some company executives were already in the air to Washington, the White House called it off. They see the proposed regulations as mild, and including some valuable measures to harden defenses against cyber threats. But it’s worth underscoring the implications of postponing (if not outright canceling) this order, which, by its own terms, was about as modest a frontier-AI intervention as the federal government could put on paper: voluntary, focused on the government’s own defenses, and explicitly barred from becoming a licensing regime. The objection isn’t so much about government coercion as about the government having any settled role at all. Voluntary, in other words, isn’t the floor of frontier AI policy in this administration; it’s the ceiling. This is a questionable position given that the concerns animating this draft order will likely grow in the near future. It is also self-defeating for those who applauded the order’s delay or demise. Far from resolving the risk of government meddling in AI, killing the order just leaves in place what Ball has described as the “opaque and essentially lawless” alternative: government access happening through back channels, on terms set case by case, with no stable rules at all. One of the problems here is a distinct lack of governmental expertise, either in AI or in software in general. Too much is being decided at the whims of the tech oligarchy, there isn’t any attempt to engage in the broader issues at hand. That’s not entirely a bad thing, trying to regulate something that’s still evolving so fast is usually a fool’s errand - but the problem here is the impact of AI is so big that there’s real danger in being too far behind. ❄                ❄ Which leads me to a rare thing, an endorsement of a candidate for political office. If you are voting in congressional district MA-06 (North Shore of Massachusetts), I’d seriously look at Beth Anders-Beck , who is running for congress in that district. Beth has a long background in software development (including developing the notion of Forest and Desert ), so would introduce expertise that Congress desperately needs. I’ve known Beth for decades, and have a high opinion of their intelligence, judgment, and ability to work with others. Congress doesn’t deserve Beth, but it does need her.

0 views

SQLAlchemy 2 In Practice - Solutions to the Exercises

To conclude with my SQLAlchemy 2 in Practice series, this article contains the solutions to all the exercises. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon . Thank you!

0 views
Unsung Today

“The pipeline of future experts is thinning from both ends.”

I generally avoid think pieces about AI because a) a lot of them are boring, and b) they rarely match the pragmatic posture of this blog. But this essay on a new No One’s Happy blog was really interesting to read, and feels different in a few ways. First, it examines what happens as AI slop spreads in the context that is less discussed – in a workplace: This is a new form of slop, and it is more expensive than the public kind, because the people producing it are being paid a salary to do so. […] The cost of producing a document has fallen to nearly zero; the cost of reading one has not, and is in fact rising, because the reader must now sift the synthetic context for whatever the document was originally about. A lot in the essay feels pertinent to Unsung as real craft is not feelings or fluffiness. Real craft is deep expertise : Generative AI can produce work that looks expert without being expert, and the failure arrives in two shapes. The first is when novices in a field are able to produce work that resembles what their seniors produce, faster or more advanced than their judgment. The second is when people generate artifacts in disciplines they were never trained in. The two failures look similar from a distance and are not the same. Research has mostly measured the first. The second is what it is missing, and in my experience it is the riskier of the two. The term for this new challenge is, apparently, “output-competence decoupling.” Other parts of the essay come back to a topic – toxic velocity – we covered before : The current generation of agentic systems is built around the premise that the human is the bottleneck — that the loop runs faster and cleaner without the awkward delay of someone reading what is about to happen and deciding whether it should. This is, in a great many cases, exactly backwards. The human in the loop is not a vestige of an earlier era; the human is the only part of the loop with skin in the game. Removing the H from HITL [Human In The Loop – eds. note] is not an efficiency. It is the abandonment of the only mechanism the system has for catching itself. And one last thing that differentiates this essay from many others is the last “what to do about it” section. #ai #craft

0 views

I think Anthropic and OpenAI have found product-market fit

Anthropic are strongly rumored to be about to have their first profitable quarter. Stories are circulating of companies surprised at how expensive their LLM bills are becoming from usage by their staff. I think this is because OpenAI and Anthropic have both found product-market fit. I currently subscribe to the $100/month Max plan from Anthropic and the $100/month Pro plan from OpenAI. If you are a heavy user of coding agents these plans are a fantastic deal. I just ran the ccusage tool on my laptop to get an estimate of how much I would have spent if I were to pay for API tokens in the past 30 days and got: That's $2,180.16 worth of tokens for $200 - not bad at all! I'm a moderately heavy user of these tools, but I'm certainly not running agents every hour of the day and night. I had assumed that companies making extensive use of agents were getting similar discounts. It turns out I could not have been more wrong about that. I haven't been able to track down the exact date, but at some point in the last six months Anthropic switched their Enterprise plan (originally "Claude seats include enough usage for a typical workday" back in August 2025 ) to $20/seat/month plus API pricing for usage. This story about the change from The Information is dated Apr 14, 2026, but cites an Anthropic spokesperson claiming that the pricing change occurred in November 2025. Existing customers are finding out about the change as they renew their contracts. OpenAI made a similar pricing change in April. The Codex rate card ( Internet Archive copy ) currently says: Note : On April 2, 2026, we updated Codex pricing to align with API token usage, instead of per-message pricing. This change was applicable to new and existing Plus, Pro, ChatGPT Business and new ChatGPT Enterprise plans. On April 23, 2026, we made this update for all existing ChatGPT Enterprise plans as well, inclusive of Edu, Health, Gov, and ChatGPT for Teachers. It's a little harder to decode as they quote prices in "credits", but as far as I can tell those credit costs are an exact match for the API token costs listed for those models. All of which is to say that as of April 2026 the "Enterprise" cost for both OpenAI Codex and Anthropic Claude Code/Cowork is the same as the listed API price. GPT-5.5 (released April 23rd) is 2x the API price of GPT-5.4. Opus 4.7 (April 16th) is around 1.4x the price of Opus 4.6 when you take their new tokenizer into account. So April saw both leading model companies release new frontier models with a higher API price, and both companies now have measures to lock their enterprise customers (who tend to sign year-long deals) at those API prices, not the previous extreme discounts. Why these sudden aggressive moves on pricing? Both Anthropic and OpenAI are planning to IPO, but I suspect there's a more important factor here: I think they've finally found product-market fit, with the coding/general-purpose agent products embodied by Claude Code/Cowork and Codex. Tools like ChatGPT are wildly popular, but that wild popularity has been difficult to turn into revenue. In February OpenAI boasted more than 900 million weekly active users for ChatGPT, but only 50 million - 5.6% of that - were paying consumer subscribers. Charging $10-$20/month per user is an OK business, but you'd need 1-2 billion subscribers sticking around for four years to cover $1 trillion in infrastructure . Companies spending $200+/month/user will get you there a whole lot faster - and as noted above, as a power-user I'm at ~$1,000/month in API costs per vendor already. Coding agents really did change everything. These are tools which burn vastly more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals. Right now that's still mostly software engineers, but a coding agent is a tool that can automate anything you can do by typing commands into a computer... so they are clearly applicable to a much wider set of skilled knowledge workers. As I've discussed on this site at length , the models released in November 2025 elevated agents to being genuinely useful. We've had six months to get used to that idea now - it's no wonder companies are beginning to spend real money on this technology. You could argue that ChatGPT achieved product-market fit when it became the fastest-growing consumer app in history back in February 2023... but it certainly wasn't making any actual money back then. Coding agents plus enterprise pricing marks the point when these companies start making very real revenue. Maybe even enough to start covering their costs! As further evidence that enterprise agents represent product-market fit for these companies, consider their open job listings. OpenAI have 703 open jobs right now, of which I'd categorize 229 (32.6%) as relating to enterprise sales and support - account executives, "Go To Market", "Forward Deployed Engineers" and the like. Anthropic have 390 open jobs , 105 (26.9%) of which look enterprisey to me. It's pleasingly ironic that these AI labs have picked a business model with such a heavy demand on human labor - enterprise sales contracts don't close themselves without a whole lot of humans in the mix! (I ran this analysis by scraping their job sites with Claude Code, then having it use Datasette's JSON API to pipe that data into Datasette Cloud where I used Datasette Agent for the analysis, exported here . Dogfood!) I started digging into this in response to a growing volume of stories claiming that large companies were sounding the alarm because their AI usage costs had grown so large. The most widely cited of these stories appear quite overblown to me. The most discussed has been Uber, based on this report where CTO Praveen Neppalli Naga indicated that Uber had "maxed out its full year AI budget just a few months into 2026", mostly thanks to Claude Code. Given that Claude Code only got really good in November it's entirely unsurprising to me that a budget set in 2025 may have failed to predict demand for that tool in 2026! That Uber story was further fueled by comments made by Uber's COO, Andrew Macdonald, on the Rapid Response podcast. I tracked down the segment and there really isn't much there. Here's what Andrew said: But then you sometimes go and talk to your senior engineering leaders and you're saying, OK, how many projects that were on the cutting room floor got moved above the line because of the productivity gains because 25% of our code commits were via Claude Code last quarter? That link is not there yet, right? I think maybe implicitly there's more that is getting shipped. But it's very hard to draw a line between one of those stats and, OK, now we're actually producing like 25% more useful consumer features, right? And that line is hard to draw. Somehow this fragment turned into headlines like Uber's COO says it's getting harder to justify the money spent on AI tokenmaxxing , because the market for stories about AI failures remains enormous. The other popular story around this is Microsoft starts canceling Claude Code licenses , ostensibly to encourage their engineers to dogfood their own Copilot CLI agent instead - but The Verge reporter Tom Warren says "sources tell me the decision is also a financial one", triggered by the June 30th end of Microsoft's financial year. I think both of these stories support my "product-market fit" hypothesis. The best advice I ever heard on pricing a product was that your customer should suck air through their teeth and then say yes. Uber's budget overrun and Microsoft's seat cancellations look like that effect playing out in practice. The big AI labs spend billions of dollars on both training and inference. Credible figures are hard to come by, but we did get one huge hint as to the figures involved from, oddly enough, the recent SpaceX S-1 : [...] in May 2026, we entered into Cloud Services Agreements with Anthropic PBC (“Anthropic”), an AI research and development public benefit corporation, with respect to access to compute capacity across COLOSSUS and COLOSSUS II . Pursuant to these agreements, the customer has agreed to pay us $1.25 billion per month through May 2029 [...] The Anthropic announcement said that this deal meant they could "increase our usage limits for Claude Code and the Claude API", heavily implying that Colossus is being used for inference, not model training. Anthropic already have vast amounts of compute from other providers. The fact that they're willing to spend $1.25 billion per month for extra capacity from just one of their vendors hints at how big these inference budgets have become. Over the past two years my impression has been that OpenAI made more of their income from subscription revenue while Anthropic made more from their API. Anthropic's API revenue was historically quite dependent on a small number of large API customers - this VentureBeat story from August 2025 quotes "sources familiar with the matter" suggesting that just Cursor and GitHub Copilot were responsible for $1.2 billion of the company's then-$4 billion revenue. Today Anthropic are rumored to hit $10.9 billion in the second quarter , potentially even operating at a profit for the first time. This pivot-to-Enterprise suggests that the labs have realized that the real money lies in cutting out the middlemen. Anthropic's Claude Code directly competes with Cursor and Copilot. No wonder Cursor are investing in their own models ! I've called November 2025 the November inflection point because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got good - good enough that we've spent the last six months adapting to agent systems that can reliably get useful work done. I think April 2026 is a new inflection point where the revenue implications of this have started to land, to the benefit of the frontier AI labs and with material impacts on the budgets of large companies. We'll know for sure how real this moment is when the S-1 documents for the upcoming Anthropic and OpenAI IPOs give us some real, audited numbers to get our teeth into. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Enterprise customers are now paying API prices I think they've found product-market fit And they're ramping up The AI-failure stories around this are pretty thin We also know the labs are spending a lot API revenue is becoming less important April is a new inflection point $1,199.79 for Anthropic Claude Code $980.37 for OpenAI Codex

0 views
Unsung Today

“It took months to find appliances that didn’t need apps to function.”

The Ringer journalist Brian Phillips asked on Bluesky : I’m working on a column about the tech annoyances that drive us crazy, and I want it to be as universal as possible, so tell me yours! E.g. scanning a QR code to read a menu, never receiving the one-time passcode they supposedly texted you, “verify you’re human” by IDing tiny motorcycles, etc. There are already many responses. I am drafting behind Phillips before he even writes his essay, because I like occasionally checking in with people this way. Not just for commiserating; perhaps scanning the answers will also give you some inspiration, or validation, or quotes for something you can push to make better, wherever you are. Some patterns I noticed: The way super sketchy bootleg websites used to look (written in questionable English, 2/3 of the window overtaken by ads, constant popups and redirects, incorrect information more often than not) is just how all websites are now. Also, this little beauty : My toaster says to unplug when not in use. It also has a digital clock that resets when I unplug it. #enshittification #software evolution A lot of logging in woes: password requirements, bouncing people from apps to web to log in, login flows forgetting context, “I trusted this device” settings you cannot trust. “Local news websites that crash under the weight of all their pop-up ads and auto-play videos.” This post had a great take: Hatred of QR codes, or perhaps what they represent: needing to install an app, removing people out of the equation, introducing phones where they weren’t needed before. Surprisingly little AI. Is that because of the audience or the way the question was phrased?

0 views
Martin Fowler Yesterday

The test suite as a regression sensor

Birgitta Böckeler finishes her post on sensors for coding agents by examining the role of a test suite as a regression sensor, focusing on the role mutation testing can play.

0 views
Martin Fowler Yesterday

The VibeSec Reckoning

Vibe coding has significantly accelerated software prototyping but AI agents frequently recommend insecure configurations, creating security problems. Gautam Koul, Lucian Moss, Neil Drew-Lopez, and Daberechi Ruth Edeokoh share their experience while building applications for Thoughtworks's global marketing. They learned that to combat this we need to write a security context file to guide the AI, be cautious with AI permission requests, create a daily security intelligence feed, and provide builders with a secure-by-default harness and templates.

0 views

Metastable Failures in Distributed Systems

☕ Welcome to The Coder Cafe! Today, we explore one of the nastiest failure patterns in distributed systems: metastable failures. Based on the Metastable Failures in Distributed Systems whitepaper, we break down why these failures happen, why they persist, what we can do about them, and why our instinct to fix them is probably wrong. Get cozy, grab a coffee, and let’s begin! Stable, Vulnerable, Metastable Metastable failures borrow their name from physics, where metastable means something that looks stable but isn’t . To understand how a distributed system can end up in such a state, we need to look at three distinct states it can be in: Stable: The system recovers on its own after any disruption. This is what we call resilience in Resilient, Fault-tolerant, Robust, or Reliable . Vulnerable : The system looks perfectly healthy, but it's operating above its hidden capacity : the load level below which it can self-heal from any disruption. It responds fast, metrics are green, and nothing is alarming. Many production systems deliberately operate here because it's more efficient: resources are used closer to their limit. But there's no slack left . And the deeper the system operates in a vulnerable state, the smaller the trigger needed to push it over the edge. Indeed, a system just above its hidden capacity can survive large disruptions; a system near its advertised capacity can be tipped by almost anything. Metastable failure : A trigger (e.g., a network blip, a deployment, a traffic spike) pushes the system over its hidden capacity. The system is not fully broken: processes are alive, and it’s still running. But goodput collapses: it’s no longer doing any useful work. Technically up, effectively down . And unlike a regular outage, removing the trigger doesn’t fix it. Getting out requires a strong corrective push: a restart, a dramatic load reduction, a manual intervention. NOTE : If you’re not familiar with the concept of goodput, it’s the throughput of useful work completed successfully. For example, in a web application receiving 1000 requests per second but returning errors for 800 of them, the goodput is only 200 RPS. The three states of a metastable failure. A system can drift into the vulnerable state unnoticed, and a single trigger is enough to push it into the metastable state it cannot escape on its own. The most disorienting property of a metastable failure: stopping the trigger doesn’t stop the failure. To understand why, we need to talk about feedback loops. In a previous post on Systems Thinking Explained , we defined a feedback loop as: If causes , then influences . A feedback loop is exactly the mechanism that keeps a system stuck in the metastable state . There is always a sustaining effect, a feedback loop, that prevents recovery. The trigger is just what pushes the system over the edge. The loop is what keeps it there. Blaming the trigger is the natural instinct, and almost always the wrong diagnosis. Let’s discuss a concrete example to make this clear. Imagine a web application that queries a database. The database comfortably handles up to 300 QPS. The application retries any query that doesn’t respond within 1 second. The system is running at 280 QPS, healthy and fast, within the database’s capacity. Then, a transient network issue occurs for 10 seconds. When the issue is over, all the queued requests flood in at once. The database gets hit with a surge it can’t absorb: latency spikes and queries start timing out. So the application retries them. This doubles the effective load to 560 QPS. The database, already struggling, falls further behind. More timeouts. More retries. The loop is now self-sustaining: High load → Timeouts → Retries → Higher load → More timeouts → More retries The transient network issue was fixed minutes ago. Yet, the system is still completely broken. The trigger is gone; the feedback loop is not . The only way out is to dramatically cut the load or disable retries entirely. This is a metastable failure . The system was vulnerable because it was operating close to its hidden capacity . A minor, transient trigger pushed it over the edge and into a self-sustaining failure state it couldn’t escape on its own. The retry mechanism, a feature designed to improve reliability, became the very thing that prevented recovery. This is one example, but the same pattern appears with caches, connection pools, failover logic, and more. The shape is always the same: a feedback loop that turns a temporary problem into a permanent one . Two things make metastable failures particularly nasty. We can be tempted to blame the wrong thing . When an outage happens, the trigger is what’s visible and recent: a spike, a deployment, a hardware fault. It’s the obvious culprit. But the trigger only exposed the problem; it didn’t create it. The sustaining feedback loop was already there, structural and invisible. When analyzing the problem in retrospect, teams focus on the trigger; fixes address the trigger; and the system remains vulnerable to the next one. The authors of the paper observed teams declare a metastable failure “resolved” multiple times before realizing the real cause had never been touched. The feedback loop grows stronger with scale . Small-scale tests won’t reveal it. A staging environment running at 10% capacity may handle the same trigger without falling into a metastable state, because the loop isn’t strong enough at that scale to be self-sustaining. This means these failures can slip past even rigorous testing regimes and only manifest in production at full load. We defined hidden capacity earlier as the load level below which the system can self-heal from any disruption. It’s different, and always lower, than the advertised capacity. In our example, the numbers make it concrete: the advertised capacity is 300 QPS, but the hidden capacity is only 150 QPS, because retries double the load under failure. The gap between those two numbers is where vulnerability lives . Measuring the hidden capacity is not straightforward, though. One possible approach is to apply a trigger at a given load level and observe whether the system recovers on its own: If it does, we are below the hidden capacity. If it doesn’t, we are above it. We can also estimate it indirectly: in the retry example, retries double the load under failure, so the hidden capacity is roughly half the advertised capacity. Metastable failures are not bugs . We can’t write a unit test that catches them. They are emergent behaviors: properties that arise from the interaction of a system’s components under specific conditions, not logic errors in any individual component. No single piece of code is buggy, no single configuration is wrong. The failure is a consequence of how everything fits together under load. This changes how we need to think about them. The right question after an outage is not “ What failed? ” but “ What loop sustained it? ” And before an outage, the danger is not having bugs; it’s optimizing so aggressively for efficiency that we push the system deeper into the vulnerable state without realizing it . Retries, caches, failover logic, connection pools: these are all features that improve reliability in the common case. They are also, under the right conditions, the sustaining mechanisms of metastable failures. The same design decision that makes a system more resilient in normal operation can also prevent it from recovering when things go wrong. The paper describes several approaches to reduce the risk of metastable failures: Retry budgets and circuit breakers : Instead of retrying indefinitely, cap the total number of retries in flight at any given time. This directly weakens the feedback loop by limiting work amplification. LIFO scheduling under overload : Counterintuitively, switching from FIFO to LIFO when the system is overloaded allows some requests to complete within their deadline, preserving goodput instead of letting every request time out. NOTE : I already wrote a post about that approach in Adaptive LIFO . Fast error paths : Success paths are heavily optimized, but error paths often aren’t. An expensive error path (stack traces, DNS lookups, disk writes) under high failure rates can itself become a sustaining mechanism. Optimizing error paths reduces this risk. Read-through caches over look-aside caches : A read-through cache (where the cache itself fetches missing data from the database) can continue filling itself even when the application has given up on a request, steadily increasing the hit rate and helping the system recover. A look-aside cache (where the application is responsible for populating the cache) can’t. Production stress testing : Small-scale tests won’t reveal metastable failures. Testing against a portion of production traffic, with engineers ready to intervene, is the most reliable way to surface them. A note of humility from the paper: there is no systematic solution yet. These are ad-hoc mitigations developed in response to known failures. Detecting vulnerable states before they collapse remains an open problem. AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI won’t replace. Written by a Google SWE, trusted by thousands of engineers worldwide. A distributed system can pass through three states: stable, vulnerable, and metastable. The vulnerable state looks healthy, but it isn’t. The threshold between stable and vulnerable is invisible. Systems can operate in the vulnerable state for months without any sign of trouble. When a trigger pushes a vulnerable system into a metastable failure, a feedback loop sustains the failure even after the trigger is gone. The trigger is not the root cause. The feedback loop is. Fixing the trigger leaves the system vulnerable to the next one. Reliability features like retries and caches can become the sustaining mechanism of a metastable failure under the right conditions. Metastable failures are emergent behaviors, not bugs. We can’t unit test for them, and optimizing for efficiency makes them more likely. Mitigations exist (retry budgets, circuit breakers, LIFO scheduling, fast error paths), but they are all ad-hoc responses to known failures. Detecting vulnerable states before they collapse remains an open problem. Resilient, Fault-tolerant, Robust, or Reliable? Adaptive LIFO Fail Open vs. Fail Closed Metastable Failures in Distributed Systems Metastability and Distributed Systems Stable, Vulnerable, Metastable Metastable failures borrow their name from physics, where metastable means something that looks stable but isn’t . To understand how a distributed system can end up in such a state, we need to look at three distinct states it can be in: Stable: The system recovers on its own after any disruption. This is what we call resilience in Resilient, Fault-tolerant, Robust, or Reliable . Vulnerable : The system looks perfectly healthy, but it's operating above its hidden capacity : the load level below which it can self-heal from any disruption. It responds fast, metrics are green, and nothing is alarming. Many production systems deliberately operate here because it's more efficient: resources are used closer to their limit. But there's no slack left . And the deeper the system operates in a vulnerable state, the smaller the trigger needed to push it over the edge. Indeed, a system just above its hidden capacity can survive large disruptions; a system near its advertised capacity can be tipped by almost anything. Metastable failure : A trigger (e.g., a network blip, a deployment, a traffic spike) pushes the system over its hidden capacity. The system is not fully broken: processes are alive, and it’s still running. But goodput collapses: it’s no longer doing any useful work. Technically up, effectively down . And unlike a regular outage, removing the trigger doesn’t fix it. Getting out requires a strong corrective push: a restart, a dramatic load reduction, a manual intervention. NOTE : If you’re not familiar with the concept of goodput, it’s the throughput of useful work completed successfully. For example, in a web application receiving 1000 requests per second but returning errors for 800 of them, the goodput is only 200 RPS. We can be tempted to blame the wrong thing . When an outage happens, the trigger is what’s visible and recent: a spike, a deployment, a hardware fault. It’s the obvious culprit. But the trigger only exposed the problem; it didn’t create it. The sustaining feedback loop was already there, structural and invisible. When analyzing the problem in retrospect, teams focus on the trigger; fixes address the trigger; and the system remains vulnerable to the next one. The authors of the paper observed teams declare a metastable failure “resolved” multiple times before realizing the real cause had never been touched. The feedback loop grows stronger with scale . Small-scale tests won’t reveal it. A staging environment running at 10% capacity may handle the same trigger without falling into a metastable state, because the loop isn’t strong enough at that scale to be self-sustaining. This means these failures can slip past even rigorous testing regimes and only manifest in production at full load. If it does, we are below the hidden capacity. If it doesn’t, we are above it. Retry budgets and circuit breakers : Instead of retrying indefinitely, cap the total number of retries in flight at any given time. This directly weakens the feedback loop by limiting work amplification. LIFO scheduling under overload : Counterintuitively, switching from FIFO to LIFO when the system is overloaded allows some requests to complete within their deadline, preserving goodput instead of letting every request time out. NOTE : I already wrote a post about that approach in Adaptive LIFO . Fast error paths : Success paths are heavily optimized, but error paths often aren’t. An expensive error path (stack traces, DNS lookups, disk writes) under high failure rates can itself become a sustaining mechanism. Optimizing error paths reduces this risk. Read-through caches over look-aside caches : A read-through cache (where the cache itself fetches missing data from the database) can continue filling itself even when the application has given up on a request, steadily increasing the hit rate and helping the system recover. A look-aside cache (where the application is responsible for populating the cache) can’t. Production stress testing : Small-scale tests won’t reveal metastable failures. Testing against a portion of production traffic, with engineers ready to intervene, is the most reliable way to surface them. A distributed system can pass through three states: stable, vulnerable, and metastable. The vulnerable state looks healthy, but it isn’t. The threshold between stable and vulnerable is invisible. Systems can operate in the vulnerable state for months without any sign of trouble. When a trigger pushes a vulnerable system into a metastable failure, a feedback loop sustains the failure even after the trigger is gone. The trigger is not the root cause. The feedback loop is. Fixing the trigger leaves the system vulnerable to the next one. Reliability features like retries and caches can become the sustaining mechanism of a metastable failure under the right conditions. Metastable failures are emergent behaviors, not bugs. We can’t unit test for them, and optimizing for efficiency makes them more likely. Mitigations exist (retry budgets, circuit breakers, LIFO scheduling, fast error paths), but they are all ad-hoc responses to known failures. Detecting vulnerable states before they collapse remains an open problem. Resilient, Fault-tolerant, Robust, or Reliable? Adaptive LIFO Fail Open vs. Fail Closed Metastable Failures in Distributed Systems Metastability and Distributed Systems

0 views
Andre Garzia Yesterday

We need to own our computing experience

Originally when I talked about owning our own platform is this blog, I meant owning the stack that powers and serves the blog. Moving to your own VPS or servers or static pages in which you didn't depend on some *Blog As A Service* company such as Wordpress.com. Eventually, I [started talking about owning the workflow that empowered your blog experience](https://andregarzia.com/2026/02/building-your-own-blogging-tools-is-a-fun-journey.html) not only your posting experience but your reading experience. To that effect, I showed how I created my own blog reader and integrated that into Firefox and also my own blog editor. Recently, I think that we need to move further into owning more and more of our computing experience. The avalanche of LLM/AI based slop solutions being force fed into our lives is radicalising me towards a very specific path in which owning my own platform now needs to mean controlling my own computing experience. I been an Apple user for a very long time and have [spoken previously about my recent desire to leave the platform](https://andregarzia.com/2026/03/apple-just-lost-me.html) because of a recent decrease in quality of macOS, change in priority for Apple in regards to being an independent developer in their ecosystem, and a general feeling that I must move away from big tech. In that post, I outlined my desire to move to an [MNT Pocket Reform](https://mntre.com), [Fairphone Gen 6 with potentially Murena /e/OS](https://fairphone.com) and maybe a NAS. I already purchased the Pocket Reform and am waiting for assembly and shipment, but I changed my approach for the next two items in that list. Instead of buying a NAS, I decided first to experiment with self-hosting and homelabbing by converting an old x86 MacBook Pro into a server using [Yunohost](https://yunohost.org). That server is going surprisingly well for me and I am moving more and more of my computing to inside the house. I will eventually get a proper NAS or build one, but at the moment that server is all I need. I am even hosting my [fediverse account](https://social.soapdog.org/@soapdog) in it using [GoToSocial](https://gotosocial.org). I reckon that I will spend close to 500 pounds to get the Fairphone with /e/OS. I don't have that budget right now and am afraid of doing it blind cause I been checking the forums and it seems like WhatsApp stopped working in the last update and not all features of Halifax UK bank app are working. I don't want a switch to a deGoogled OS to prevent me from talking to my friends or using my bank. I know that sucks, but those are not easily solvable problems. Like my original plan with the NAS, I think I might be able to test the waters of e/OS/ by buying an old second-hand smartphone and installing it and seeing for myself how well it works. That will cost me much less and then if I like it enough, I can make the move to a Fairphone. So now the issue is figuring out what phone to buy on a budget of 150 pounds or less. Moving back to Linux on open hardware and to Android but deGoogled is my slow journey towards computing autonomy. Google was never worth trust, but the recent move to prevent side-loading on Android and stop showing links on their search result page, becoming a de facto slop as service engine, is something I can't really abide. Apple hypermaniacal need to control the experience of their users and milk both developers and users as much as possible reached a tiping point for me. My Macbook Air doesn't feel like mine since there are piling frictions when trying to run software that is not coming from the App Store. I'm done with that. What is left then? We need to return to a human-focused FOSS community. Not the fast turnaround LLM/AI commits into every single repo cause whoever is sponsoring this project needs it to move FAST. The best thing about the free and open source community has never been the code, but the ethos. Made by humans, to be understandable by humans, to be modifiable by humans. This crazy trend towards LLM assisted coding is removing the understandable part. Lots of commits are being generated by machine and reviewed by machines without a single person actually having read the whole thing. That will erode skills and also lead to code that is impossible to maintain cause no one has ever fully understood it. Hence why I am starting to also build my own tools. There are of course tools I depend on that are too large for me to build from scratch, goddess forbid trying to build a web browser, in those cases it is okay to use a FOSS solution like Firefox. But things that are dear to me like blogging, well I can build my own tools for that. Or epub manipulation tools, or small decentralisation apps. The more I build, the more I can be sure I can maintain it in the long run. I don't want a Web where all we do as creators is feed training models so that gigantic greedy corporations can get it all wrong and regurgitate shit to users. FAANG erected a wall inside the internet and creators are now on the outside. Fighting back is not done by creating local models, or ethical AI companies, fighting back is done by walking away and playing a different game. We can't win over Google and Apple at their own game. It is rigged. But we can play a different game in which they don't matter. For me that game is building offline-first, local-first, decentralised tools and apps for my friends and whoever else can benefit from them. Create for those around you, for those that matter. Forget web scale, think in terms of a village. Get back to Linux, deGoogle yourself if you're able to. Create FOSS and also use the tools you create. Use repairable tech if you can afford it and make sure to step out of this consumption and slop cycle the digital world has become.

0 views

Kafka Share Groups and Parallelizing Consumption - Part 2: Producer Batches and share.acquire.mode

All tests were executed against Kafka 4.3.0 using Dimster .  In the last post we used simulated consumer processing time to reveal how important it is to set an appropriate value for to ensure the consumer parallelism that we expect. With a uniform distribution of messages over partitions, the rule of thumb was a value somewhat lower than: But there’s more to parallel consumption than . The size of producer batches also plays a role when using the default ( ). Share group members are assigned to partitions like consumer group members are, except that share group assignment allows multiple consumers to be assigned to the same partition. If the number of share consumers is less than the partition count, then each consumer will be assigned multiple partitions. If the consumer count matches or exceeds the partition count, then each consumer will be assigned one partition. Fig 1. Share consumer assignments. Left: consumer count < partition count. Right: consumer count > partition count. When a consumer is assigned only one partition, it will always be fetching from one broker. If a consumer is assigned multiple partitions, it may fetch from multiple brokers concurrently. There are two values for : The Javadoc says the following: The application chooses between the two modes using the consumer share.acquire.mode configuration property. If the application sets the property to batch_optimized or does not set it at all, the share consumer fetches records based on batch boundaries which may mean that the number of records returned may exceed the max.poll.records configuration property. The share consumer may also prefetch records and buffer them temporarily awaiting the application's next call to poll(Duration). If the application sets the property to record_limit, the share consumer fetches no more than records at a time and does not prefetch. This is slower but gives the application tighter control on how many records are fetched and when the acquisition locks begin. So why two modes?  It comes down to efficiency ( ) and consumer control ( ). First of all the sentence “ the share consumer fetches records based on batch boundaries” is correct but a little misleading. No matter what mode is used, whole batches are returned to the consumer over the wire . In other words, the data sent over the network is always based on batch boundaries as the record batch is the unit of data delivery.  What that sentence refers to is what records are acquired by the consumer and returned to the application: With , the config is a soft cap. The consumer acquires any batches (in their entirety) that are covered by the offset range determined by . These acquired batches are returned to the consumer, and the consumer returns the records of those batches to the calling application (that invoked ). With , the config is a strict cap. The consumer only acquires the records that are covered by the offset range determined by (though less if less records are available). However, the unit of data delivery is the record batch, so the consumer receives whole batches but only returns a specific offset range to the calling application. For example, in the figure below we have three consumers sending fetch requests with and . Fig 2. Three consumers fetching with batch_optimized Despite asking for only one record, each consumer acquires and receives records along batch boundaries. The result of consumer.poll(Duration) for c1 is three records, not one. If we rerun this scenario with record_limit: c1 acquires record 0 c2 acquires record 1 c3 acquires record 2 However, the batch is the unit of data delivery, so batch 1 is sent in its entirety to each consumer (the consumer internals only returns the acquired records of the batch to the application). Fig 3. Three consumers fetching with record_limit This is obviously less efficient… We just sent the same batch three times! Nonetheless, exists because sometimes that inefficiency over the wire is countered by other concerns (one of which is covered in this post). Another efficiency gain that has is that because each batch is only sent to one consumer, Kafka only needs to do share group housekeeping of the batch as a whole, not each record individually. This reduces CPU and makes metadata more compact. If we get mixed acknowledgments of the batch records (2 success, 1 reject) only then does the record tracking explode the metadata to be per-record. With , the housekeeping always tracks state per record, which is more expensive. The final difference between the modes is that in mode, a consumer can send concurrent fetches to all the brokers of its assigned partitions. This further increases the number of records that a consumer might receive as is a soft cap per broker. With , the consumer sends one fetch at a time, round-robin between the brokers of its partitions. This difference only manifests when the consumer count is less than the partition count. We’ll cover this aspect more in the next post. The main implications are that: With , the effective consumer parallelism can be impacted by the average number of records per record batch. With , the network throughput will increase in most scenarios as offset ranges are unlikely to align with batch boundaries. If the is larger than the average number of records per batch, then each batch may only be delivered twice. The network throughput can a lot if the is much smaller than the average number of records per record batch. Don’t worry if that isn’t clear yet, we’ll gather some empirical results next which should make it clearer. Let’s test this out with Dimster’s interactive mode, using the same workload as the last post. In the last post, we calculated that the maximum theoretical consumption rate for 300 consumers with a processing time of 5 ms per message would be 60,000 msg/s. By setting to 30 we reached 55,000 msg/s and then finally reached 60,000 with low end-to-end latency by adding an additional 12 consumers (2 per partition). So we use the following workload file (no dimensional stuff in this one as we’re going to use live-interaction): In this test we’re going to make the record batches bigger and see what happens to the consumption rate. First we start Dimster and ensure it’s handling the 60k msg/s. Once it has started and settled in, we see it’s coping well. If I look at the metrics, the current record batch size is around 5KB with 10 records per batch. The average fetch size is 7KB with 14 records. This means some consumers get 1 record batch per fetch and some get 2 record batches per fetch. Let’s increase the batch size. To do this we’ll drop to 1 producer, and set the linger.ms to 10 to reach the default batch.size of 16KB batches. We see that the batch size has risen to the default of 16KB, or 32 records per batch. The consumers should now, on average, receive 32 records per fetch (2 above the max.poll.records). Fig 4. The record batches sent by the producers increase from 5.5 KB to 16 KB The coordinator output shows that the consumers are still coping, as expected. With 500b records, the number of records returned per fetch will be 32 which is close enough to the max.poll.records of 30 to not impact consumption. Now let’s double batch.size to 32786. From a separate terminal window to the coordinator output, we’ll run the following: We see the batch size increase again in the dashboard. Fig 5. The record batches sent by the producers increase from 5.5 KB to 16 KB to 32 KB The coordinator output shows that the consumers are no longer keeping up! Only managing 37K msg/s with a fast growing backlog. The problem is that each partition has an inflight budget of 2000 records and each record batch contains 64 records. That allows up to 31 effective consumers per partition (2000 / 64), leaving 21 consumers starved at any point in time. This explains the 37K msgs/s: We can fix this problem in three ways: Set in the producer. Increase to create a larger inflight budget We already know the default 16KB batch size is ok. Let’s first increase the inflight budget. We’ll double the budget and see what happens. First we’ll stop the producers and remove the processing time on the consumers to drain the backlog. Next we need to update the broker config and restart the brokers. In we add: Then we’ll redeploy Kafka (again from a separate terminal window). Now we’ll start the producers again and apply the 5 ms processing time to the consumers. We’re in business! The consumers are now coping with the larger batch sizes with this increased inflight budget. This time we’ll try . First let’s walk back that inflight budget change by  1) stopping the producers, 2) commenting out the added line to our broker config, 3) redeploying Kafka. While the producers are still stopped, I’ll change the consumers to use : Then start the producers again: In the coordinator, we see that the consumers are now coping with the 60K msg/s. The reason that allows the consumers to keep up, despite the larger record batches, is that each consumer is only allocated a max of 30 records per fetch, even though each batch contains 64 records. However, each batch is now being delivered three times as 30 doesn’t align well with 64. We can see this in the Kafka client metrics. Fig 6. On the left, with the larger inflight budget and batch_optimized. The middle was when we stopped the producers to restart Kafka with the original inflight budget. The right is with record_limit and each batch being sent three times. We could make this more efficient if we increase to 32 to align with the 64 record batches. If I simply change the to 32, we don’t see much of an improvement as most offset ranges of 32 records will touch two batches. But if we stop the producers, ensure there is no backlog at all then set , the fetches will be perfectly aligned. Fig 6. On the left, with unaligned fetches with max.poll.records=32 (each batch delivered 3 times). Right: aligned fetches with max.poll.records=32 (each batch delivered 2 times). Let’s not over-index on this one case. The purpose of this post was to explain the underlying mechanics and back that up with some empirical benchmarks, sticking with the same workload example as the last post. What we’ve learned: Consumer parallelism is impacted by more than just consumer count and . It is also impacted by: Record batch sizes (determined by the producers) The inflight budget ( ) The share consumer config Record acquisition is along batch boundaries with , and record ranges with . Record batches are the unit of delivery, so can cause consumer network bandwidth to increase because fetches likely will not align on batch boundaries causing batches to be delivered at least twice (more if is much smaller than the average number of records per batch). In the next post we’re going to look a bit closer at . ps: you can run this whole scenario with two terminal windows: Window 1 - kick off the benchmark (using the workload yaml described in the post) Window 2 - wait a few minutes then run the following bash script: Happy testing! If the application sets the property to batch_optimized or does not set it at all, the share consumer fetches records based on batch boundaries which may mean that the number of records returned may exceed the max.poll.records configuration property. The share consumer may also prefetch records and buffer them temporarily awaiting the application's next call to poll(Duration). If the application sets the property to record_limit, the share consumer fetches no more than records at a time and does not prefetch. This is slower but gives the application tighter control on how many records are fetched and when the acquisition locks begin. With , the config is a soft cap. The consumer acquires any batches (in their entirety) that are covered by the offset range determined by . These acquired batches are returned to the consumer, and the consumer returns the records of those batches to the calling application (that invoked ). With , the config is a strict cap. The consumer only acquires the records that are covered by the offset range determined by (though less if less records are available). However, the unit of data delivery is the record batch, so the consumer receives whole batches but only returns a specific offset range to the calling application. c1 acquires record 0 c2 acquires record 1 c3 acquires record 2 With , the effective consumer parallelism can be impacted by the average number of records per record batch. With , the network throughput will increase in most scenarios as offset ranges are unlikely to align with batch boundaries. If the is larger than the average number of records per batch, then each batch may only be delivered twice. The network throughput can a lot if the is much smaller than the average number of records per record batch. Don’t worry if that isn’t clear yet, we’ll gather some empirical results next which should make it clearer. Set in the producer. Increase to create a larger inflight budget Consumer parallelism is impacted by more than just consumer count and . It is also impacted by: Record batch sizes (determined by the producers) The inflight budget ( ) The share consumer config Record acquisition is along batch boundaries with , and record ranges with . Record batches are the unit of delivery, so can cause consumer network bandwidth to increase because fetches likely will not align on batch boundaries causing batches to be delivered at least twice (more if is much smaller than the average number of records per batch).

0 views
Stratechery Yesterday

The SpaceX IPO and Data Centers in Space

Listen to this post : It’s hardly the biggest problem in the world — or perhaps the height of privilege to consider it a problem at all — but one of the most annoying consumer experiences is booking an Uber Black and realizing you got assigned a Tesla Model Y (Uber finally stopped allowing new Model Y’s onto Black last year ). Buckle up for an uncomfortable back seat, basic plastic finishes, and, all-too-often, potential car sickness from a driver who hasn’t completely mastered the Tesla’s aggressive regenerative braking. Still, the fact that the Model Y ever made it to the Black level is a testament to the brand Elon Musk built. Back in 2016, when 300,000 people dropped $1,000 each in a matter of hours to reserve an as-yet-unreleased Model 3, I explained that the phenomenon was because It’s a Tesla : The real payoff of Musk’s “Master Plan” is the fact that Tesla means something: yes, it stands for sustainability and caring for the environment, but more important is that Tesla also means amazing performance and Silicon Valley cool. To be sure, Tesla’s focus on the high end has helped them move down the cost curve, but it was Musk’s insistence on making “An electric car without compromises” that ultimately led to 276,000 people reserving a Model 3, many without even seeing the car: after all, it’s a Tesla. This is the same brand halo that landed what is, if we’re honest, a pretty basic car on the Uber Black list. What actually makes these cars compelling is the extent to which they are computers on wheels: I know plenty of very rich people who drive a Tesla not for the finishes but rather the Full Self-Driving (Supervised); there is nothing like it on the market, at least when it comes to cars you can own. Tesla appears to be doubling down on this point of differentiation: the company stopped production of the Models S and X earlier this year, focusing production resources on the CyberCab and robots; if you want your car to drive itself, you’ll get the same model as everyone else. It reminds me of Andy Warhol’s famous quote : What’s great about this country is that America started the tradition where the richest consumers buy essentially the same things as the poorest. You can be watching TV and see Coca-Cola, and you know that the President drinks Coke, Liz Taylor drinks Coke, and just think, you can drink Coke, too. A Coke is a Coke and no amount of money can get you a better Coke than the one the bum on the corner is drinking. All the Cokes are the same and all the Cokes are good. Liz Taylor knows it, the President knows it, the bum knows it, and you know it. That “tradition” is scale, and America is indeed better at it than any other country in the world; and, amongst Americans, no one pursues and seeks to leverage scale quite like Musk. From a press release from American Airlines: American Airlines today announced a sweeping modernization of its narrowbody inflight customer experience with the installation of Starlink, the fastest Wi-Fi in the sky, on more than 500 narrowbody aircraft beginning in Q1 2027. Starlink is widely regarded as the world’s most advanced satellite constellation using a low Earth orbit to deliver broadband Internet capable of supporting inflight streaming, online gaming, collaborative meeting tools and more. With thousands of satellites in low Earth orbit, Starlink can deliver multigigabit connectivity to aircraft using its Aero Terminal, which can support up to 1 Gbps per antenna. “As a premium global airline, we are continuously seeking out world-class partners like Starlink to deliver what our customers need and want,” said American Airlines Chief Customer Officer Heather Garboden. “The addition of Starlink solidifies American as a leading airline in keeping passengers connected in flight.” As part of American’s commitment to an elevated onboard experience, Starlink will enable seamless streaming, browsing and real-time communication capabilities across American’s domestic and short-haul international routes. I linked to the press release just for the amusement of American Airlines, which has in recent years built its strategy around offering anything-but-premium on routes you need, billing their Starlink deal as a commitment to “an elevated onboard experience.” That may have been the argument for United’s Starlink deal when it was announced in 2024 , but by this point it’s tablestakes , which is surely exactly how Musk wants it. Starlink is the consumer-facing business of SpaceX, generating $8.7 billion in revenue last year and $4.4 billion in profit; while it’s not totally clear exactly how SpaceX accounts for launch costs, obviously Starlink benefits greatly from the fact that it has access to SpaceX’s launch capacity. That launch capacity has resulted in over ten thousand active satellites in low Earth orbit, delivering low latency high speed Internet anywhere in the world — including in the air. That’s the carrot for airlines; the stick is the prospect of everyone else having the same service, and customers making flight decisions based on the quality of Internet access available. There is a similarity to Tesla in this way. Musk companies at their best don’t win the game; they change the rules through scale, such that billionaires buy economy cars because they actually drive themselves (with supervision), and airlines transform the consumer experience on their own dime. Musk makes all-in bets — whether that be in terms of launch capacity or in autonomous driving — not by making rational short-term business decisions, but by starting with the desired end state and working backwards. Tech has a long history of silly charts — there is an entire category known as Bezos charts — and the SpaceX S-1 has one that made me laugh. It came in the discussion of SpaceX’s total addressable market: We believe we have identified the largest actionable total addressable market (“TAM”) in human history. We estimate that our quantifiable TAM is $28.5 trillion, consisting of $370 billion in Space from space-enabled solutions; $1.6 trillion in Connectivity across $870 billion in Starlink Broadband and $740 billion in Starlink Mobile as well as additional opportunities in enterprise and government; $26.5 trillion in AI across $2.4 trillion in AI infrastructure, $760 billion in consumer subscriptions, $600 billion in digital advertising, and $22.7 trillion in enterprise applications. For illustrative purposes of sizing our addressable market opportunity, we exclude China and Russia from our global estimates. This image is approximately to scale vertically, but certainly not horizontally: I could use the help in really wrapping my mind around the $26.5 trillion AI opportunity, given it’s more than 13 times the space and connectivity opportunity combined! In all seriousness, the numbers are obviously absurd, but then again, everything about this IPO is absurd. SpaceX is seeking a $2 trillion valuation on a mere $18.67 billion in revenue with $4.9 billion in losses last year, and growth actually slowed from 35% to 33%. That slowdown happened despite the addition of xAI (and thus also X), which tipped the company from a small profit to that massive loss, thanks to $5.1 billion in AI R&D expense. That R&D, keep in mind, went towards building a model that is in 5th place, and whose entire founding team recently left the company. But sure, $26.5 trillion AI opportunity! This is not to say that SpaceX won’t get its desired valuation. Tesla’s valuation never made any sense right up until the Models 3 and Y actually worked out, causing Tesla’s share price to soar (and even then it was hard to ever build a financial model that justified the new share price). Musk’s ability to make his own reality starts with investors; from 2021’s Mistakes and Memes and comparing Apple and Tesla: This comparison works as far as it goes, but it doesn’t tell the entire story: after all, Apple’s brand was derived from decades building products, which had made it the most profitable company in the world. Tesla, meanwhile, always seemed to be weeks from going bankrupt, at least until it issued ever more stock, strengthening the conviction of Tesla skeptics and shorts. That, though, was the crazy thing: you would think that issuing stock would lead to Tesla’s stock price slumping; after all, existing shares were being diluted. Time after time, though, Tesla announcements about stock issuances would lead to the stock going up. It didn’t make any sense, at least if you thought about the stock as representing a company. It turned out, though, that TSLA was itself a meme, one about a car company, but also sustainability, and most of all, about Elon Musk himself. Issuing more stock was not diluting existing shareholders; it was extending the opportunity to propagate the TSLA meme to that many more people, and while Musk’s haters multiplied, so did his fans. The Internet, after all, is about abundance, not scarcity. The end result is that instead of infrastructure leading to a movement, a movement, via the stock market, funded the building out of infrastructure. I explained in that Article why I generally did not cover Tesla’s financial results, and the reasoning extends to why I don’t expect to cover SpaceX’s: Musk is the master of memes, and is himself a meme. He offers a dream — Mars, fully autonomous vehicles, an addressable market of $28.5 trillion — and positions his companies and their stock as access to that dream, and through the alchemy of capital markets, transforms shared delusion into mass market reality. Musk’s track record matters in this regard. Building an electric car company was possible, as was full self-driving (supervised); at the same time there were ever increasing government mandates and programs around decreasing emissions that acted as the stick to Tesla’s carrot. Similarly, landing rockets was possible, and the new market creation downstream from correspondingly lower launch costs was comprehensible. That Musk succeeded in both instances gives him the benefit of the doubt. The question that matters, then, is not if the numbers make sense right now (they absolutely do not); what matters is if the dream is even possible, and if there are actual reasons to think it might happen. I think that data centers in space meet these conditions. The first question about data centers in space is if they are even possible, and I think the answer is clearly yes. The key thing to consider is that there is no requirement that these data centers look anything like data centers on earth. On earth we build massive buildings full of GPUs with massive infrastructure for cooling those GPUs and massive power plants (or a connection to a grid which connects to massive power plants) to power those GPUs. The idea of transporting these massive structures to space sounds implausible, and it is! However, there is no reason that space data centers would look like data centers on earth. What makes far more sense is to think about an individual satellite as something akin to a rack. Right now the largest Starlink satellite in orbit is the V2 Mini Direct-to-Cell, which measures 7.4 meters by 2.7 meters by 0.3 meters (estimated); an NVL72 rack from Nvidia, meanwhile, measures 2.2 meters by 1.1 meters by 0.6 meters, so we’re already in the right size range. The V2 Mini Direct-to-Cell consumes (and dissipates) up to an estimated 25kW of energy; the NVL72 up to 135kW, and it can fit a 1 trillion parameter model quantized to FP4. The big shortcoming for a rack-satellite is power and its dissipation, but going from 25kW to 135kW is certainly within the realm of possibility — and given that you don’t need much of the cooling and power distribution usage on earth, something closer to 100kW might deliver similar performance. There are other issues to address, including the problem of radiation screwing with calculations, reliability, etc., although those two concerns could be addressed in part by using larger chips (which are less efficient, but also use less power); these rack-satellites will also be disposable, like Starlink satellites, ameliorating reliability issues. The key factor, however, is that a fleet of racks, interconnected with lasers (as Starlink’s already are), each with their own solar panels and radiator arrays for cooling (deploying 200+ square meters of radiators per rack will be a huge challenge), is possible . The next question about data centers in space is if there is a use case for them — the carrot — and I already made the argument that there is in The Inference Shift . Specifically, there are three types of workloads developing around LLMs: training, answer inference, and agentic inference. From the section making the case for “agentic inference”: Critically, this articulation of an agentic-specific memory hierarchy implies a necessary trade-off of speed for capacity. Here’s the thing, though: lower speed isn’t nearly as important a consideration if there isn’t a human in the loop. If an agent is waiting around for a job that is being run overnight, the agent doesn’t know or care about the user experience impact; what is most important is being able to accomplish a task, and if entirely new approaches to memory make that possible, then delays are fine. If delays are fine, then all of the focus on pure compute power and high-bandwidth memory seems out of place: if latency isn’t the top priority, then slower and cheaper memory — like traditional DRAM, for example — makes a lot more sense. And if the entire system is mostly waiting on memory, then chips don’t need to be as fast as the cutting edge either. This represents a profound shift in future architectures, but it also doesn’t mean that current architectures are going away: At the same time, these categories won’t be equal in size or importance. Specifically, agentic inference will be the largest market by far, because that is the market that won’t be limited by humans or time. Today’s agents are fancy answer inference; in the future true agentic inference will be work done by computers according to dictates given by other computers, and the market size scales not with humans but with compute. It’s agentic inference that makes the most sense for racks in space, and conveniently enough, that is also the market that is likely to be the largest in the long run. The third question about data centers in space is if there is a stick. Specifically, while I think that racks-in-space are both a lot more viable than people think, and a lot more relevant to agentic inference than current modes of compute, it is at the end of the day cheaper and easier to build on earth, all things being equal. All things are not equal, however: right now we are at the very beginning of the AI buildout and already one of the biggest constraints is not just power (expected), but zoning (unexpected). I wrote in an Update last week : That leads to an interesting contrast to globalization: when companies were closing down American factories and laying off workers and moving operations to China, none of the affected towns or workers had a say. They just suddenly no longer had a job, and a huge number of cities across the Rust Belt no longer had a reason to exist. People simply had to move, or worse, retreat to things like alcohol or drugs. AI, however, is the opposite: building data centers requires permission, which is to say that people actually have a say. Again, I am not at all saying that these people are well informed about data centers, or about the economic impact on their communities, much less the economic impact of AI generally; what I am noting is that people who didn’t have a say in globalization are suddenly finding they do have a say about AI, and it’s not a surprise they are expressing their disapproval by blocking data centers. In that Update I made the case that data center builders — and by extension the companies that use them — should straight up pay people for permission to build data centers in their communities. At a minimum, however, that increases the costs of terrestrial data centers. What seems very plausible in the long run is that the demand for compute ends up being so large that there eventually is nowhere left to build, making the vast expanses of space not just an alternative but in fact the only choice. If all of this happens — and there are a lot of “if”s here! — then suddenly that $2 trillion valuation starts looking reasonable. SpaceX is already monetizing xAI’s first data center, Colossus 1, to the tune of $15 billion/year for 300MW of capacity; that’s 3,000 racks-in-space. Anthropic, meanwhile, will probably make 3x the revenue on that capacity; it remains to be seen if xAI can get back in the state-of-the-art game, but if so then the amount of revenue it can generate per rack-in-space will be commensurately higher. Even without xAI, however, SpaceX has the potential to be a monopoly provider of marginal compute capacity. There are, needless to say, a massive number of assumptions baked into this argument, including assuming a huge number of engineering challenges are solved, Starship actually works, SpaceX gets sufficient supply of the right kinds of chips, compute demand is massively larger, agentic inference unbundles current architectures, and data center opponents are successful. The risk attached to all of these assumptions should discount the valuation you put on this business, which is to say I still think this IPO is nuts. At the same time, I’m glad it exists, for multiple reasons. The first one is the most obvious one: Musk, for all of his faults, has already pushed humanity forward on multiple vectors, including electric cars, self-driving, reusable rockets, satellite Internet, etc., and I’m excited to see him try and do more. The second is that I am in fact concerned about our ability to muster enough compute to fully realize the gains from AI, and am very worried about a replay of nuclear power, where our failure to build denied us the opportunity to even imagine what could be invented in a world of unlimited energy; the fact Musk is proposing an alternative path to unlimited compute is a relief. The third is that I appreciate the extent to which this IPO is a return to what an IPO should be: the opportunity for people to contribute capital to actually build the business, and to benefit if it works out. As I noted, I can’t make a financial model that necessarily justifies this valuation, particularly based on current financials, but neither can a VC investing in the Series A of a company. SpaceX has already invented a lot, and its early investors are going to make a lot of money with this IPO; at the same time, there is still so much more to invent that there remains a lot of upside — and, to be very clear, a lot of risk. It’s a testament to SpaceX’s ambitions that retail investors get to play VC. And hey, you get Mars upside for free! Training will continue to matter, and Nvidia’s current architecture, including high-speed compute, large amounts of high-bandwidth memory, and high-speed networking, will likely continue to dominate. Answer inference will be a meaningful market, albeit a relatively small one, and speed from chips like Cerebras or Groq (I explained how Nvidia is deploying Groq’s LPUs here ) will be very useful. Agentic inference will gradually unbundle the GPU, which alternates between stranding high-bandwidth memory (during the prefill process) and stranding compute (during the decode process), in favor of increasingly sophisticated memory hierarchies dominated by high capacity and relatively lower cost memory types, with “good enough” compute; indeed, if anything it will be the speed of CPUs for things like tool use that will matter more than the speed of GPUs.

0 views
Heather Burns Yesterday

Born Crotchety

I spoke with The National about the proposed UK social media ban for teenagers.  That’s an archive link due to their unfortunate adwall. There’s nothing I offered in my delightfully crotchety comments that I wasn’t already saying four, five, six, and seven years ago, but if anyone had listened to me four, five, six, and […]

0 views
Jeff Geerling Yesterday

I patched iozone for better disk benchmarks on modern macOS

A decade ago, I settled on for disk benchmarking on all my systems. Tools like ('Flexible IO' tester) are a little more capable for raw disk performance testing, and other tools test network-scale filesystems better, but gives me an easy overview of real-world disk performance across hard drives and SSDs, and runs on Mac, Windows, and Linux (and a smattering of other OSes). It's been around since 1991 , and is still updated today—in fact, the two latest updates (version 509 and 510) contain patches I sent in to get iozone to compile on Apple Silicon Macs running newer releases of macOS.

0 views
iDiallo Yesterday

How Many Tokens Did You Burn Today

Early in my career, a manager at one of the big firms where I worked made a request so absurd it remains etched in my memory. I walked back to the team, repeated what he had asked, and couldn't finish the story without laughing. He wanted me to create a pie chart, of lines of code, per developer, per week. We all lost it. Our lead developer asked if, by any chance, the manager's eyes looked glassy. We laughed even harder. Because yes. Yes, they did. He was always high. That was twenty years ago. I've repeated that story countless times, and it always drew chuckles as we discussed the disconnect between software teams and management. Any software engineer could relate. We all knew that lines of code were a meaningless metric. A junior could write a thousand lines of spaghetti. A senior could fix the same problem with forty elegant ones. But then, last week, I found my name at the top of a leaderboard. My employer had been exploring productivity tools and trialed one they thought would be useful. After the trial, they were quoted $500k a year. The tool tracked developer productivity and integrated with Atlassian products, Microsoft, and many other services we used. The price was too steep, so it was dropped. A couple of months later, the same company came back with a discount. The exact same tool for just $50k a year. My employer jumped at the opportunity. How many bytes did you use today? I'm looking at this dashboard right now and I see my name at the top of the leaderboard. I click on the widget, and a pie chart appears. There it is: a breakdown of the total lines of code my team has produced using AI, by individual. This isn't limited to my employer. Every company is putting something together to track AI usage and justify the investment. Instead of tracking project completions, we're tracking how many lines of code each developer generated with AI. And the joke's on me, because nobody is laughing. The whole industry is applauding and encouraging employees to use more of it. I didn't become the champion because I have some neat agentic workflow. It was done by complete accident. While using an LLM, I accidentally selected "planning mode" for a request that had already been planned. The agent ran for several minutes, burning tokens to resolve a problem that didn't exist. Just like that, I made it to the top, without ever writing a single line of code. If this widget is taken at face value, it won't be long before developers start gaming it deliberately. Just let the agent run overnight, and your employer can claim a 10x improvement in productivity. We didn't use line count as a productivity metric in the past because it never made sense. Whenever we refactor code, we often end up with less than we started with. In fact, much of the time I spend modifying AI-generated code is spent deleting unnecessary things it created. Should we track negative lines of code? The better you are at programming, the worse your numbers look. We are assessing developers by the lines of code. I've watched AI evangelists ask "how many tokens did you burn today?" They were trying to convince an audience that productivity is directly proportional to token usage. It reminds me of the transition from paper to computers. A computer evangelist of that era might have asked: "how many bytes did you use today?" Token counts, lines of code, bytes, none of these have anything to do with actual productivity. Metrics are often entirely disconnected from what they're meant to measure. I've seen companies rely on story points only to watch employees point every ticket as high as possible. Choose lines of code as your metric, and lines of code will increase. Reward the highest contributor, and watch everyone double or triple their output by the next performance review. It's a silly metric but it serves a purpose, just not yours. AI companies promote token usage and associate it with productivity because they directly benefit from it. Imagine an internet service provider that charges by the byte. What would their recommendation for productivity be? "Use more bytes!" The best engineers I've ever known wrote less code, not more. They deleted things. They simplified. They understood that the goal was never the code itself. They solved problems, they made the system reliable, and they served the user. Measuring developers by output volume, whether that's lines, commits, or tokens, mistakes the exhaust for the engine. Every era of tooling brings a new class of metric that mistakes activity for value. The spreadsheet didn't make accountants more productive just because they could fill more cells. AI won't make developers more productive just because it can generate more code. We aren't even tracking if the right problems are being solved, and solved well. If the productivity dashboard can't answer that, it's not measuring productivity. It's measuring the subscription.

0 views
Maurycy Yesterday

Notes on optimizing battery life:

Ok, so you have something with a battery, and you want it to run for a long time. I'll be using the classic CR2023 non-rechargeable lithium "coin cell" as an example, but everything here applies to other types of battery. (except the exact voltage and capacity numbers) First off, it helps to measure power draw in current and charge in well, charge. It is tempting to convert everything into power and energy, but don't. Most circuit's power draw is much closer to constant current than constant power: a single clock cycle on a microcontroller involves charging or discharging some number of MOSFET gates. That requires some number of coulombs, not some number of joules. Linear regulators turn any circuit into a perfect current sink: no matter what potential is supplied, the device sees a constant voltage and will always draw the same current. Even if you don't use any, most chips will use a few to generate internal voltages. This is the "typical" current draw of an AVR32DD32 microcontroller over voltage from the datasheet : Black: 25 °C. Yellow: 125 °C. Also, battery capacity is nearly-universally specified as charge, usually in milliamp hours: a 100 mAh battery can support 1 mA of current for 100 hours before it's "dead". (more on what this means later) Non-ideal batteries : This battery has 3 volts stamped right on it... but that's kinda of a lie: Measuring the battery with a meter, the voltage is actually 3.3 volts. However, checking the datasheet, getting the manufacturer's claimed 235 mAh capacity requires operating down to 2 volts: From the datasheet (yes, these have one) With these "CR" Li/MnO 2 cells, the discharge curve is fairly flat: a device that only works down to 85% of nominal (2.6 volts) can still use a good 90% of the capacity. However, an "Alkaline" Zn/MnO 2 1.5 volt cell falls below 80% of nominal with a quarter of it's charge remaining. The manufacturer considers them dead at 0.8 volts — around half the original voltage. In a typical circuit, two batteries will be connected in series to produce a 3 V-ish supply. To get the advertised capacity, the device must be able to run down to 1.6 volts: the same as a (fresh) single cell! Think of supply voltage like a budget : If your battery will drop down to 2 volts and the MCU needs 1.8 V, any other components involved in supplying power must not drop more than 200 mV. It's not that the same MCU won't work on two AA batteries, but it won't be able to use the last 10% or so of capacity because it requires at least 1.8 / 2 = 0.9 volt per cell. Ok, so design for half the nominal supply voltage ? Batteries have non-trivial internal resistance, which causes a voltage drop when any current is drawn: a coin cell is usually around 10 ohms, while large AA cells sit around 0.1 ohms. To understand what causes this, let's look at how a coin cell works: On the negative electrode, a piece of lithium metal looses it's electron and dissolves into the electrolyte. Li → Li + + e - The resulting ions travel over to positive electrode and steal oxygen from the manganese dioxide: 2 MnO 2 + 2 Li + + e - → Li 2 O + Mn 2 O 3 This reaction releases a lot of energy because lithium is an alkali metal the manganese doesn't really care. That released energy is actually what powers the connected circuit. Crucially, the whole thing depends on positive lithium ions reaching and reacting with the positive electrode: moving against the electric field produced by the battery. The open circuit voltage, 3.3 volts, is enough to completly stop the reaction. This is why batteries only discharge once a circuit drains some of the accumulated electrons... but for the reaction to proceed at a reasonable rate, the voltage must drop quite a bit below the measured open-circuit voltage. If you've done any chemistry, it should come as no surprise that this is affected by temperature : As a rule-of-thumb, to operate down to -40 C, plan for ten times the internal resistance at room temp. If you see the voltage rail dropping by 50 mV at 20 C, make sure there's still enough voltage to go around if it drops 500 mV. Another thing that impacts reaction rate is the amount of reagents present , or in other words, the charge left in the battery: resistance increases as the battery is drained. As a test, I discharged an Alkaline battery at 400 mA: Orange: open circuit, blue: under load With a fresh cell, pulling almost half an amp only results in 100 mV of drop, or 0.25 ohms. By the time the battery is half empty, the resistance doubled to around half an ohm. At 60% discharge, the under-load voltage has dropped below the 0.8 V "dead" threshold. Reducing the voltage requirement won't help here: shortly afterwards, the resistance increased so much my test rig needed to supply power to force those 400 mA through. The smaller CR2032 cells start at around 10 ohms, and reach several hundred ohms by the time the open-circuit voltage falls to 2 V. It follows that any circuit that draws a lot of current can not use the full rated capacity. For pulsed loads, large capacitors can help, but they have their own problems which I'll discuss later. Also, batteries get worse as they age . Electrolytes can evaporate/leak and side-reactions can form layers that impede current. There's a good chance you've experienced this: a battery that tests fine on a meter but refuses to actually power anything. What's happened is that it developed a huge internal resistance (many killohms). In series with a high-impedance multimeter, it doesn't create any noticeable voltage drop. When connected to an actual device, the voltage drops to almost nothing. This is why you should be skeptical of any claims of 20 year, 30 year, 50 year battery life. Sure, that might be what you get by dividing nominal capacity by average current draw, but there's no telling how well the battery will work after all that time: I doubt even the manufacture really knows what happens past a decade or two. There's also self discharge , where leakage currents drain the battery, even when it's sitting on a shelf: This is usually given by the manufacturer as percent of capacity per year. Because the cell's voltage doesn't change all that much during discharge, — and the current is quite small — it's a fraction of the original capacity, not of what's remaining. This alone is enough to kill a AA battery in only 5 years depending on temperature (hotter is worse)... but again, this is not the only mechanism at play: Just because self-discharge might suggest a hundred year shelf-life, doesn't mean it will actually work in a hundred years. Another "fun" effect is voltage droop : Drawing current can deplete the chemicals around the electrode, causing a temporary increase in resistance. Applying a 400 mA current pulse to a half-empty ZnMnO 2 500 mAh cell caused the internal resistance to triple over the course 40 seconds: Yellow: cell voltage. Blue: Current Eventually, the battery does recover, but it took a good minute or so: Actually a trace of a different pulse, so the starting voltage is higher. What's interesting is that even though no current is being drawn, the battery circuit voltage is still not back to where it should be. This is where the "resistance" model starts to break down. It's more accurate to say that the pulse temporarily pushed the cell down it's discharge curve: increasing the resistance and decreasing the open circuit voltage. This gets worse when the battery is nearly empty: I applied a similar 10 second pulse to an 80% drained cell, it took around 5 minutes minutes to for it's open circuit voltage rise back above 0.8 volts. This effect highly variable depending on temperature (colder is worse) and state of charge, so it's good to include a wide voltage margin when designing a circuit that will draw sustained current. In short , internal resistance increases when... ... it's cold ... the battery is close to being empty ... the battery is used ... you do nothing at all Plan for a much worse voltage drop than what you see on your workbench: it's possible to loose as much as a volt per each mA drawn with a mostly empty coin cell on a cold night. With that in mind , it's time to look at those capacity numbers. As already discussed, aiming for longer than a decade or so is largely pointless because of battery aging. These CR2023 batters have quoted shelf life of 10 years, so it's going to be my target: From a CR2032 (~230 mAh), a device can draw an average of 2.6 uA if it runs down to 2 volts. From a AA (~3000 mAh) AA battery, a device can draw 34 uA if it runs down to 0.8 volts per cell. ... so we have a voltage budget and a target current. Keep in mind that internal resistance will cut into the voltage if when draw pulses in excess of a few microamps. Measurement techniques: These small currents present a problem: most multimeters don't really do well below a microamp. Benchtop models that can measure down to the nanoamps exist but are quite expensive. On paper, measuring current is easy: Insert a known resistor into the circuit and measure the voltage drop across it... except this either requires adding a large resistance or measuring a tiny voltage. A better way is to use an op-amp to hide the voltage drop from the device under test: The amplifier tries to keep its two inputs at the same voltage, which requires it to exactly match the device's current through the feedback resistor. This results in exactly the same voltage as if it was used as a shunt, except with zero burden voltage. Since most chips have two opamps, I use the other to create a VDD/2 supply rail which is used as the ground. This allows the chip to have access to voltages both above and below it. Most modern chips are "rail-to-rail", meaning they are designed to operate close to one of the supply rails... but this doesn't work too well: Consider what happens when the input current drops to zero. The amplifier has to pull the output (with a non-trivial amount of capacitance) down to zero. If the best the amplifier could do is connect the output to the negative rail, the voltage would exponentially decay, approaching zero but never reaching it. Would this be a huge problem? Probably not. Is it a good idea to make the chip's job as easy as possible? Yes. As a bonus, this allows the device to measure currents in both directions. Using the 100 pA/mV range, the circuit has an offset of ~10 pA, so it's not quite a picoammeter, but it's close. This makes it good for testing the leakage of MOSFETs, diodes, capacitors and the such. However, this design has one huge snag: It's zero burden voltage up to a fairly modest point. Once the output maxes out (100 nA - 100 uA depending on the range), the device will can see the full shunt resistance. This is a non-issue for testing component leakage, but it becomes a problem when measuring the current drawn by a microcontroller. For measuring sleep current, it's best to build a firmware image that never wakes up, and short the meter's input or connect a second power source during startup. Another option is to use a tiny feedback resistor: connecting a 1 kohm resistor between the input and output yields a 1 uA/mV range with a maximum of 1 mA. Once the microcontroller boots, the resistor can be removed to measure it's sleep current. (and if you are drawing more than this, you probably shouldn't) This is also a good trick to avoid crashing MCUs when switching ranges, which can cause a momentary disconnection depending on the geometry of your selector switch. Shielding is not optional : 100 picoamps is a kind of current that floats around on the air. It's best to put the whole setup inside a metal box connected to the meter's ground. Running coax to a scope or meter is fine because the wire's sheath is connected to the rest of the shield: this isn't RF stuff. If you don't have a box, wrapping the whole thing in aluminum foil works almost as well. (make sure it's not touching anything!) Also, it's a little silly to carefully screen out interference only to reintroduce it with a power supply, so it's best to run everything with batteries: Two 1.5 volt alkaline cells provides 3 volts and four is close enough to 5 volts. Also, be careful with what's touching the meter or part under test: a post-it note can easily conduct a whole nanoamp at 5 volts. Wood and fabric are similarly problematic. If in doubt as to if something is a problem, test it. When measuring capacitors, there's a really annoying property to be aware of : The dielectric material can slowly absorb or release charge over multiple hours. This effect is mostly known for recharging high-voltage capacitors after they've been removed from circuit — with unpleasant results — but it can also result in a deceptively high leakage current that goes away if the capacitor is used in a real circuit. Unless you have fancy polypropylene capacitors, you'll have to leave them in the test rig for several hours before taking a reading. Circuit testing : Of course, it's not enough to test individual components. The whole system has to work correctly with an imperfect power supply: A device running on a coin cell should be able to tolerate the full 1k with a two volt supply. ... also, it's a good idea to simulate a dead battery: an empty battery shouldn't result in hardware damage or data loss. Temperature can greatly effect leakage currents. If you expect the components to get up to 80 C, grab a heat gun and see how it performs at those temperatures. Practical advice: Before considering any components, does to circuit board itself consume any power? There's lots of people on forums saying you shouldn't use a soldermask, or that flux on the board causes leakage... For testing, I used a nothing special JLCPCB, green, FR4, 2-layer board. It had two quarter millimeter traces 30 mm long and separated by 2.7 mm. For the measurements, I used a 9 volt bias, which should represent worst case results: Clean : Testing the board as it came from the factory Humid : Breathing on it for a few seconds (99% RH, no visible condensation) Fingers : Touching it to get skin oils on the board Rosin : Spread some RMA flux and burned it with a soldering iron. Board condition and soldermask Current Soldermask, clean < 5 pA Soldermask, fingers < 5 pA Soldermask, humid < 5 pA Soldermask, rosin < 5 pA No soldermask, clean < 5 pA No soldermask, fingers 10 pA No soldermask, humid 30,000 pA No soldermask, rosin 20 pA The main troublemaker is humidity. If you are designing a circuit that needs to work outside, underwater or underground, it would be a good idea to include some desiccants: most plastic will allow water vapor to permeate inside. The soldermask prevented any significant leakage between traces, but problems could still happen between component pins. Conformal coatings will protect against short exposures, but will suffer from the permation problem. Soldering residue or skin oils aren't a problem unless you are doing picoamp metrology. Capacitors : Electrolytic or tantalum capacitors can leak multiple microamps at just a few volts: A jellybean 100 uF 16V electrolytic pulled 26 uA at nine volts, which is ten times the entire current budget for a CR2032! That cap alone could discharge the battery just a year or two. Ceramic capacitors a lot better: I grabbed a random 1 uF capacitor from my parts bin initially pulled several hundred nanoamps, but it dropped down to 920 pA @9 volts after two hours. Even a hundred of these would only draw 92 nA, which is only 3% of the budget. TLDR ; Don't use electrolytic or tantalums. Ceramic capacitors are fine in reasonable quantities and when run well below their rated voltage. Diodes are very commonly used for reverse polarity protection, but there are two possible configurations: A series diode uses a forward biased diode to prevent reverse current from getting to the device. A parallel diode adds a reverse biased diode to clamp the reverse voltage before the device is damaged. In the series configuration, voltage drop is very important : Real diodes are quite different from the idealized model. The voltage drop of a 1N4148 is only 0.6 V at 1 mA of draw and at 25 C. The relationship between current and voltage drop is roughly exponential: For a silicon PN diode, passing 10 times the current requires an extra 100 mV. This also works in the other direction: A circuit that only needs 10 uA (peak) will only see around 0.4 volts of drop across that diode. Temperature affects this: The threshold will rise ~2 mV for each degree the diode is cooled. At -40, expect 130 mV of extra voltage drop compared at room temperature. A Schottky diode has a much lower threshold voltage: 1 mA of current only needs 0.25 V. This can be a huge improvement to your voltage budget, although it's still a non-trivial amount. In the parallel configuration, reverse leakage matters . Because it's highly dependent on voltage, I measured a few diodes at 5 volts, which is closer to normal operating conditions: 2N4148 [PN] @5V: 2.3 nA BAT46 [Schottky] @5V: 2.4 uA In this test, the schottky doesn't do so well: It's three orders of magnitude worse than a similar PN diode. So, use a PN diode right? Well, if the battery can supply 50 mA into a short (fresh coin cell), there might be around a volt across the device. That can be enough to cause damage. So, what's a good reverse polarity protection circuit? An n-channel low-side switching version is also possible A MOSFET can act as a near ideal diode: If the gate (connected to the negative rail) is in fact, the lowest voltage, it's switched on. If the battery is inserted backwards, the gate now has the highest voltage in the circuit and the transistor stays off. However, it's still important to consult the datasheet or conduct experiments: the battery voltage might not be enough to fully turn on the FET, and even a properly "on" MOSFET still has a voltage drop. The final option is nothing: Battery clips that physically prevent a user from inserting a battery backward exist. These have no electrical penalties except for the contact resistance (which is negligible when compared to the battery's). Schottky leakage also poses a problem for dual power supply circuits. A microamp of backfeed into the backup battery can actually be enough to damage it. In these cases, you may be forced to use a PN diode or use a variation of the MOSFET trick: connect the gate to the primary supply rail. This will, at a minimum, perform as well as a silicon diode because of the transistor's intrinsic body diode. Once the power rail drops down to zero, the MOSFET's gate will be negative and it will turn on. However, it's performance won't be perfect if the main rail takes more than a millisecond or so to loose voltage. It's best to plan for a PN diode drop and consider any extra voltage as be a nice bonus. Computers : In theory, CMOS logic doesn't draw any power when sitting idle. In practice, it absolutely does. An 8-bit AVR128DD28 microcontroller draws 1.5 uA during sleep mode. Connecting a 32KHz crystal and using the integrated RTC to provide wake ups bring it up to 1.8 uA. This leaves just 700 uA of average current to work with. Ok, but at some point, the processor has to do something. Each clock cycle has a fixed cost: For the AVR, I measured it at ~0.28 nanoamp seconds, meaning that the battery has enough power for 3,000 billion cycles. Individual clock cycles on an AVR128DA28 running at 32 kHz. However, it's almost always a good idea to use a slow clock: The chip will draw an extra 277 uA of current draw per MHz. At the default four MHz clock speed, that's just over a milliamp. There's no guarantee the battery will be able to supply that kind of power. Decoupling caps aren't going to save you here: 1 mA is enough to drain a rather big 1 uF capacitor at 1 volt per millisecond. (remember, no electrolytics allowed.) Since the MCU has a minimum voltage of 1.8 volts, and the batteries can go as low as two, it's only safe to run like this for 200 microseconds / 800 cycles! However, running at 32 kHz only draws an average of 10 microamps. There are still current pulses from each clock cycle, but there are small enough to that they only drop a 1 uF capacitor by 0.27 millivolts. The processor does draw more a bit more quiescent current while running then in sleep mode. This is why some people suggest you should run at the maximum clock speed to save power... but while it is more efficient on paper, it simply doesn't work with real batteries. This also lets us calculate how long it can run for: 10 microamps is 14 times the remaining 700 nanoamp budget, so the processor can be running 7% of the time. Also, on this particular MCU, wakeups cause a big current pulse: Because of stray capacitance, applying power to the processor costs a whole 2.62 nanoamp seconds. With a 1 uF capacitor, this would drain it by 2.62 mV. However, with smaller caps like 6.8 nF, it could would discharge them a whole 385 mV. Stuff like this is why I'd recommend using around a microfarad: A decent 1 uF (MLCC) ceramic rated at a few times the supply voltage will leak almost nothing. To be fair, the datasheet does recommend this value, but plenty of people are in the habit of using smaller ones: When you have a 5 volt supply, loosing a third of a volt is not a big deal. Using a 3-but-actually-2 volt battery, it's enough to drop below the chip's minimum operating voltage. Some parts claim a much lower sleep current (in the nanoamps), but that's without retaining memory: Most applications can't use these modes. Consider a data-logger. Because flash consumes the same amount of power when writing a few bytes or a kilobyte, being able to buffer readings actually saves power. ... although there are some applications where a feature like this does make sense: This is something you have to consider before taking sleep current specs at face value. ... it's cold ... the battery is close to being empty ... the battery is used ... you do nothing at all Clean : Testing the board as it came from the factory Humid : Breathing on it for a few seconds (99% RH, no visible condensation) Fingers : Touching it to get skin oils on the board Rosin : Spread some RMA flux and burned it with a soldering iron. https://ww1.microchip.com/downloads/en/DeviceDoc/AVR128DA28-32-48-64-DataSheet-DS40002183B.pdf : The discussed microcontroller. https://data.energizer.com/pdfs/cr2032.pdf : Example battery datasheet https://lcamtuf.substack.com/p/real-mlccs-and-inductors-have-curves : Another footgun with capacitors

0 views

Introducing Headcode: A Unified API for UK Rail Data

Headcode is a unified, developer-friendly JSON API that takes the fragmented, legacy feeds of the UK rail network and turns them into clean, enriched real-time data.

0 views

Promises and perils

One of the just-so stories we keep hearing about AI is that it’s inevitable, that the technology is here and will continue to be here, and we better get on board or get left behind. These stories have the ring of a threat because they are, explicitly and otherwise, threatening. They are also familiar . Fear that there may be no alternative to the will of the AI arise because we have been told for decades that there is no alternative to neoliberalism, that there is no alternative to the mediation of all society by profit-driven markets, no alternative to the universal power of private self-interest that continually tries not to better the world, but to maximize it’s own profit and hence power. Stories about the “promises and perils” of AI ring true, not because the AI is poised to hunt all of us down, but because the stories reflect real experiences of technology, capitalism, and ideology; they reflect the capitalist developments of the incomprehensibility of technology, the invisibilization of labor, enclosures, proliferating neoliberal bureaucracies, and the sense that there is no alternative to capitalism and the status quo. Blix & Glimmer, Why We Fear AI , page 56 In other words, the threat isn’t so much that AI is inevitable as that the ongoing—and likely expanding—immiseration of workers is unstoppable. This is the subtext of the strange and conflicted messaging that we get from the hype men: when they say that you better learn AI or be left behind, they are admitting that a great many people will be left behind. And if you—smart and clever and hardworking person that you are—are somehow able to make it to the other side of the line, you’re supposed to find relief or pride at having done so, and not horror at all the people suffering in your wake. You’re supposed to be as uncaring as the capital that uses you. But getting through this gauntlet is no guarantee of getting through the next one—and there will be a next one, because the plain aim of the technocrats is to immiserate everyone, eventually. From the capitalist perspective, anyone with skills enough to negotiate a comfortable wage is a cost in need of cutting. Add to that the fact that AI’s whole pitch is that the more you use it, the more data it gathers, the more likely it becomes capable of mimicking you well enough to convince the fools above you that it can do your job. So get-in-or-get-left-behind is something of a trick—everyone is left behind, eventually. Which is both terrifying and clarifying. Terrifying in that the capitalists really do have the ability to do us harm—they have been doing so, already. Clarifying in that there really isn’t any reason to stay on the path they’ve laid out for us. It leads nowhere good. Meanwhile, there aren’t very many people up ahead, and there are a whole lot of us back here. Let’s see what we can do. View this post on the web , subscribe to the newsletter , or reply via email .

0 views

Why We Fear AI

Hagen Blix and Ingeborg Glimmer make a compelling case for why we fear AI: our fears of what AI will do to us are really just our fears of what capitalism is already doing. In this way, AI isn’t so much a novel new technology as an acceleration of long-existing patterns in neoliberal capitalism—automation, deskilling, unaccountability, surveillance, and increasing precarity amidst shrinking welfare systems. But therein also lies a clue as to how to counter it, in that only organized, democratic control of labor can stand up to capital. When we see through the hype, we know what work we have to do. View this post on the web , subscribe to the newsletter , or reply via email .

0 views