Posts in Shell (20 found)

Has Mythos just broken the deal that kept the internet safe?

For nearly 20 years the deal has been simple: you click a link, arbitrary code runs on your device, and a stack of sandboxes keeps that code from doing anything nasty. Browser sandboxes for untrusted JavaScript, VM sandboxes for multi-tenant cloud, ad iframes so banner creatives can't take over your phone or laptop - the modern internet is built on the assumption that those sandboxes hold. Anthropic just shipped a research preview that generates working exploits for one of them 72.4% of the time, up from under 1% a few months ago. That deal might be breaking. From what I've read Mythos is a very large model. Rumours have pointed to it being similar in size to the short lived (and very underwhelming) GPT4.5 . As such I'm with a lot of commentators in thinking that a primary reason this hasn't been rolled out further is compute. Anthropic is probably the most compute starved major AI lab right now and I strongly suspect they do not have the compute to roll this out even if they wanted more broadly. From leaked pricing, it's expensive as well - at $125/MTok output (5x more than Opus, which is itself the most expensive model out there). One thing that has really been overlooked with all the focus on frontier scale models is how quickly improvements in the huge models are being achieved on far smaller models. I've spent a lot of time with Gemma 4 open weights model, and it is incredibly impressive for a model that is ~50x smaller than the frontier models. So I have no doubt that whatever capabilities Mythos has will relatively quickly be available in smaller, and thus easier to serve, models. And even if Mythos' huge size somehow is intrinsic to the abilities (I very much doubt this, given current progress in scaling smaller models) it has, it's only a matter of time before newer chips [1] are able to serve it en masse. It's important to look to where the puck is going. As I've written before, LLMs in my opinion pose an extremely serious cybersecurity risk. Fundamentally we are seeing a radical change in how easy it is to find (and thus exploit) serious flaws and bugs in software for nefarious purposes. To back up a step, it's important to understand how modern cybersecurity is currently achieved. One of the most important concepts is that of a sandbox . Nearly every electronic device you touch day to day has one (or many) layers of these to protect the system. In short, a sandbox is a so called 'virtualised' environment where software can execute on the system, but with limited permissions, segregated from other software, with a very strong boundary that protects the software 'breaking out' of the sandbox. If you're reading this on a modern smartphone, you have at least 3 layers of sandboxing between this page and your phone's operating system. First, your browser has (at least) two levels of sandboxing. One is for the JavaScript execution environment (which runs the interactive code on websites). This is then sandboxed by the browser sandbox, which limits what the site as a whole can do. Finally, iOS or Android then has an app sandbox which limits what the browser as a whole can do. This defence in depth is absolutely fundamental to modern information security, especially allowing users to browse "untrusted" websites with any level of security. For a malicious website to gain control over your device, it needs to chain together multiple vulnerabilities, all at the same time. In reality this is extremely hard to do (and these kinds of chains fetch millions of dollars on the grey market ). Guess what? According to Anthropic, Mythos Preview successfully generates a working exploit for Firefox's JS shell in 72.4% of trials. Opus 4.6 managed this in under 1% of trials in a previous evaluation: Worth flagging a couple of caveats. The JS shell here is Firefox's standalone SpiderMonkey - so this is escaping the innermost sandbox layer, not the full browser chain (the renderer process and OS app sandbox still sit on top). And it's Anthropic's own benchmark, not an independent one. But even hedging both of those, the trajectory is what matters - we're going from "effectively zero" to "72.4% of the time" in one model generation, on a real-world target rather than a toy CTF. This is pretty terrifying if you understand the implications of this. If an LLM can find exploits in sandboxes - which are some of the most well secured pieces of software on the planet - then suddenly every website you aimlessly browse through could contain malicious code which can 'escape' the sandbox and theoretically take control of your device - and all the data on your phone could be sent to someone nasty. These attacks are so dangerous because the internet is built around sandboxes being safe. For example, each banner ad your browser loads is loaded in a separate sandboxed environment. This means they can run a huge amount of (mostly) untested code, with everyone relying on the browser sandbox to protect them. If that sandbox falls, then suddenly a malicious ad campaign can take over millions of devices in hours. Equally, sandboxes (and virtualisation) are fundamental to allowing cloud computing to operate at scale. Most servers these days are not running code against the actual server they are on. Instead, AWS et al take the physical hardware and "slice" it up into so called "virtual" servers, selling each slice to different customers. This allows many more applications to run on a single server - and enables some pretty nice profit margins for the companies involved. This operates on roughly the same model as your phone, with various layers to protect customers from accessing each other's data and (more importantly) from accessing the control plane of AWS. So, we have a very, very big problem if these sandboxes fail, and all fingers point towards this being the case this year. I should tone down the disaster porn slightly - there have been many sandbox escapes before that haven't caused chaos, but I have a strong feeling that this is going to be difficult. And to be clear, when just AWS us-east-1 goes down (which it has done many , many , times ) it is front page news globally and tends to cause significant disruption to day to day life. This is just one of AWS's data centre zones - if a malicious actor was able to take control of the AWS control plane it's likely they'd be able to take all regions simultaneously, and it would likely be infinitely harder to restore when a bad actor was in charge, as opposed to the internal problems that have caused previous problems - and been extremely difficult to restore from in a timely way. Given all this it's understandable that Anthropic are being cautious about releasing this in the wild. The issue though, is that the cat is out of the bag. Even if Anthropic pulled a Miles Dyson and lowered their model code into a pit of molten lava, someone else is going to scale an RL model and release it. The incentives are far, far too high and the prisoner's dilemma strikes again. The current status quo seems to be that these next generation models will be released to a select group of cybersecurity professionals and related organisations, so they can fix things as much as possible to give them a head start. Perhaps this is the best that can be done, but this seems to me to be a repeat of the famous "obscurity is not security" approach which has become a meme in itself in the information security world. It also seems far fetched to me that these organisations who do have access are going to find even most of the critical problems in a limited time window. And that brings me to my final point. While Anthropic are providing $100m of credit and $4m of 'direct cash donations' to open source projects, it's not all open source projects. There are a lot of open source projects that everyone relies on without realising. While the obvious ones like the Linux kernel are getting this "access" ahead of time, there are literally millions of pieces of open source software (nevermind commercial software) that are essential for a substantial minority of systems operation. I'm not quite sure where the plan leaves these ones. Perhaps this is just another round in the cat and mouse cycle that reaches a mostly stable equilibrium, and at worst we have some short term disruption. But if I step back and look how fast the industry has moved over the past few years - I'm not so sure. And one thing I think is for certain, it looks like we do now have the fabled superhuman ability in at least one domain. I don't think it's the last. Albeit at the cost of adding yet more pressure onto the compute crunch the AI industry is experiencing ↩︎ Albeit at the cost of adding yet more pressure onto the compute crunch the AI industry is experiencing ↩︎

0 views
Simon Willison 1 weeks ago

Anthropic's Project Glasswing - restricting Claude Mythos to security researchers - sounds necessary to me

Anthropic didn't release their latest model, Claude Mythos ( system card PDF ), today. They have instead made it available to a very restricted set of preview partners under their newly announced Project Glasswing . The model is a general purpose model, similar to Claude Opus 4.6, but Anthropic claim that its cyber-security research abilities are strong enough that they need to give the software industry as a whole time to prepare. Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser . Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. Project Glasswing partners will receive access to Claude Mythos Preview to find and fix vulnerabilities or weaknesses in their foundational systems—systems that represent a very large portion of the world’s shared cyberattack surface. We anticipate this work will focus on tasks like local vulnerability detection, black box testing of binaries, securing endpoints, and penetration testing of systems. There's a great deal more technical detail in Assessing Claude Mythos Preview’s cybersecurity capabilities on the Anthropic Red Team blog: In one case, Mythos Preview wrote a web browser exploit that chained together four vulnerabilities, writing a complex  JIT heap spray  that escaped both renderer and OS sandboxes. It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses. And it autonomously wrote a remote code execution exploit on FreeBSD's NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain over multiple packets. Plus this comparison with Claude 4.6 Opus: Our internal evaluations showed that Opus 4.6 generally had a near-0% success rate at autonomous exploit development. But Mythos Preview is in a different league. For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla’s Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more. Saying "our model is too dangerous to release" is a great way to build buzz around a new model, but in this case I expect their caution is warranted. Just a few days ( last Friday ) ago I started a new ai-security-research tag on this blog to acknowledge an uptick in credible security professionals pulling the alarm on how good modern LLMs have got at vulnerability research. Greg Kroah-Hartman of the Linux kernel: Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us. Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real. Daniel Stenberg of : The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good. I'm spending hours per day on this now. It's intense. And Thomas Ptacek published Vulnerability Research Is Cooked , a post inspired by his podcast conversation with Anthropic's Nicholas Carlini. Anthropic have a 5 minute talking heads video describing the Glasswing project. Nicholas Carlini appears as one of those talking heads, where he said (highlights mine): It has the ability to chain together vulnerabilities. So what this means is you find two vulnerabilities, either of which doesn't really get you very much independently. But this model is able to create exploits out of three, four, or sometimes five vulnerabilities that in sequence give you some kind of very sophisticated end outcome. [...] I've found more bugs in the last couple of weeks than I found in the rest of my life combined . We've used the model to scan a bunch of open source code, and the thing that we went for first was operating systems, because this is the code that underlies the entire internet infrastructure. For OpenBSD, we found a bug that's been present for 27 years, where I can send a couple of pieces of data to any OpenBSD server and crash it . On Linux, we found a number of vulnerabilities where as a user with no permissions, I can elevate myself to the administrator by just running some binary on my machine. For each of these bugs, we told the maintainers who actually run the software about them, and they went and fixed them and have deployed the patches patches so that anyone who runs the software is no longer vulnerable to these attacks. I found this on the OpenBSD 7.8 errata page : 025: RELIABILITY FIX: March 25, 2026 All architectures TCP packets with invalid SACK options could crash the kernel. A source code patch exists which remedies this problem. I tracked that change down in the GitHub mirror of the OpenBSD CVS repo (apparently they still use CVS!) and found it using git blame : Sure enough, the surrounding code is from 27 years ago. I'm not sure which Linux vulnerability Nicholas was describing, but it may have been this NFS one recently covered by Michael Lynch . There's enough smoke here that I believe there's a fire. It's not surprising to find vulnerabilities in decades-old software, especially given that they're mostly written in C, but what's new is that coding agents run by the latest frontier LLMs are proving tirelessly capable at digging up these issues. I actually thought to myself on Friday that this sounded like an industry-wide reckoning in the making, and that it might warrant a huge investment of time and money to get ahead of the inevitable barrage of vulnerabilities. Project Glasswing incorporates "$100M in usage credits ... as well as $4M in direct donations to open-source security organizations". Partners include AWS, Apple, Microsoft, Google, and the Linux Foundation. It would be great to see OpenAI involved as well - GPT-5.4 already has a strong reputation for finding security vulnerabilities and they have stronger models on the near horizon. The bad news for those of us who are not trusted partners is this: We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale—for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring. To do so, we need to make progress in developing cybersecurity (and other) safeguards that detect and block the model’s most dangerous outputs. We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview. I can live with that. I think the security risks really are credible here, and having extra time for trusted teams to get ahead of them is a reasonable trade-off. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
Ahead of AI 1 weeks ago

Components of A Coding Agent

In this article, I want to cover the overall design of coding agents and agent harnesses: what they are, how they work, and how the different pieces fit together in practice. Readers of my Build a Large Language Model (From Scratch) and Build a Large Reasoning Model (From Scratch) books often ask about agents, so I thought it would be useful to write a reference I can point to. More generally, agents have become an important topic because much of the recent progress in practical LLM systems is not just about better models, but about how we use them. In many real-world applications, the surrounding system, such as tool use, context management, and memory, plays as much of a role as the model itself. This also helps explain why systems like Claude Code or Codex can feel significantly more capable than the same models used in a plain chat interface. In this article, I lay out six of the main building blocks of a coding agent. You are probably familiar with Claude Code or the Codex CLI, but just to set the stage, they are essentially agentic coding tools that wrap an LLM in an application layer, a so-called agentic harness, to be more convenient and better-performing for coding tasks. Figure 1: Claude Code CLI, Codex CLI, and my Mini Coding Agent . Coding agents are engineered for software work where the notable parts are not only the model choice but the surrounding system, including repo context, tool design, prompt-cache stability, memory, and long-session continuity. That distinction matters because when we talk about the coding capabilities of LLMs, people often collapse the model, the reasoning behavior, and the agent product into one thing. But before getting into the coding agent specifics, let me briefly provide a bit more context on the difference between the broader concepts, the LLMs, reasoning models, and agents. An LLM is the core next-token model. A reasoning model is still an LLM, but usually one that was trained and/or prompted to spend more inference-time compute on intermediate reasoning, verification, or search over candidate answers. An agent is a layer on top, which can be understood as a control loop around the model. Typically, given a goal, the agent layer (or harness) decides what to inspect next, which tools to call, how to update its state, and when to stop, etc. Roughly, we can think about the relationship as this: the LLM is the engine, a reasoning model is a beefed-up engine (more powerful, but more expensive to use), and an agent harness helps us the model. The analogy is not perfect, because we can also use conventional and reasoning LLMs as standalone models (in a chat UI or Python session), but I hope it conveys the main point. Figure 2: The relationship between conventional LLM, reasoning LLM (or reasoning model), and an LLM wrapped in an agent harness. In other words, the agent is the system that repeatedly calls the model inside an environment. So, in short, we can summarize it like this: LLM: the raw model Reasoning model : an LLM optimized to output intermediate reasoning traces and to verify itself more Agent: a loop that uses a model plus tools, memory, and environment feedback Agent harness: the software scaffold around an agent that manages context, tool use, prompts, state, and control flow Coding harness: a special case of an agent harness; i.e., a task-specific harness for software engineering that manages code context, tools, execution, and iterative feedback As listed above, in the context of agents and coding tools, we also have the two popular terms agent harness and (agentic) coding harness . A coding harness is the software scaffold around a model that helps it write and edit code effectively. And an agent harness is a bit broader and not specific to coding (e.g., think of OpenClaw). Codex and Claude Code can be considered coding harnesses. Anyways, A better LLM provides a better foundation for a reasoning model (which involves additional training), and a harness gets more out of this reasoning model. Sure, LLMs and reasoning models are also capable of solving coding tasks by themselves (without a harness), but coding work is only partly about next-token generation. A lot of it is about repo navigation, search, function lookup, diff application, test execution, error inspection, and keeping all the relevant information in context. (Coders may know that this is hard mental work, which is why we don’t like to be disrupted during coding sessions :)). Figure 3. A coding harness combines three layers: the model family, an agent loop, and runtime supports. The model provides the “engine”, the agent loop drives iterative problem solving, and the runtime supports provide the plumbing. Within the loop, “observe” collects information from the environment, “inspect” analyzes that information, “choose” selects the next step, and “act” executes it. The takeaway here is that a good coding harness can make a reasoning and a non-reasoning model feel much stronger than it does in a plain chat box, because it helps with context management and more. As mentioned in the previous section, when we say harness , we typically mean the software layer around the model that assembles prompts, exposes tools, tracks file state, applies edits, runs commands, manages permissions, caches stable prefixes, stores memory, and many more. Today, when using LLMs, this layer shapes most of the user experience compared to prompting the model directly or using web chat UI (which is closer to “chat with uploaded files”). Since, in my view, the vanilla versions of LLMs nowadays have very similar capabilities (e.g., the vanilla versions of GPT-5.4, Opus 4.6, and GLM-5 or so), the harness can often be the distinguishing factor that makes one LLM work better than another. This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code. That said, some harness-specific post-training is usually beneficial. For example, OpenAI historically maintained separate GPT-5.3 and GPT-5.3-Codex variants. In the next section, I want to go more into the specifics and discuss the core components of a coding harness using my Mini Coding Agent : https://github.com/rasbt/mini-coding-agent . Figure 4: Main harness features of a coding agent / coding harness that will be discussed in the following sections. By the way, in this article, I use the terms “coding agent” and “coding harness” somewhat interchangeably for simplicity. (Strictly speaking, the agent is the model-driven decision-making loop, while the harness is the surrounding software scaffold that provides context, tools, and execution support.) Figure 5: Minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python) Anyways, below are six main components of coding agents. You can check out the source code of my minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python), for more concrete code examples. The code annotates the six components discussed below via code comments: This is maybe the most obvious component, but it is also one of the most important ones. When a user says “fix the tests” or “implement xyz,” the model should know whether it is inside a Git repo, what branch it is on, which project documents might contain instructions, and so on. That’s because those details often change or affect what the correct action is. For example, “Fix the tests” is not a self-contained instruction. If the agent sees AGENTS.md or a project README, it may learn which test command to run, etc. If it knows the repo root and layout, it can look in the right places instead of guessing. Also, the git branch, status, and commits can help provide more context about what changes are currently in progress and where to focus. Figure 6: The agent harness first builds a small workspace summary that gets combined with the user request for additional project context. The takeaway is that the coding agent collects info (”stable facts” as a workspace summary) upfront before doing any work, so that it’s is not starting from zero, without context, on every prompt. Once the agent has a repo view, the next question is how to feed that information to the model. The previous figure showed a simplified view of this (“Combined prompt: prefix + request”), but in practice, it would be relatively wasteful to combine and re-process the workspace summary on every user query. I.e., coding sessions are repetitive, and the agent rules usually stay the same. The tool descriptions usually stay the same, too. And even the workspace summary usually stays (mostly) the same. The main changes are usually the latest user request, the recent transcript, and maybe the short-term memory. “Smart” runtimes don’t rebuild everything as one giant undifferentiated prompt on every turn, as illustrated in the figure below. Figure 7: The agent harness builds a stable prompt prefix, adds the changing session state, and then feeds that combined prompt to the model. The main difference from section 1 is that section 1 was about gathering repo facts. Here, we are now interested in packaging and caching those facts efficiently for repeated model calls. The “stable” “Stable prompt prefix” means that the information contained there doesn’t change too much. It usually contains the general instructions, tool descriptions, and the workspace summary. We don’t want to waste compute on rebuilding it from scratch in each interaction if nothing important has changed. The other components are updated more frequently (usually each turn). This includes short-term memory, the recent transcript, and the newest user request. In short, the caching aspect for the “Stable prompt prefix” is simply that a smart runtime tries to reuse that part. Tool access and tool use are where it starts to feel less like chat and more like an agent. A plain model can suggest commands in prose, but an LLM in a coding harness should do something narrower and more useful and be actually able to execute the command and retrieve the results (versus us calling the command manually and pasting the results back into the chat). But instead of letting the model improvise arbitrary syntax, the harness usually provides a pre-defined list of allowed and named tools with clear inputs and clear boundaries. (But of course, something like Python can be part of this so that the agent could also execute an arbitrary wide list of shell commands.) The tool-use flow is illustrated in the figure below. Figure 8: The model emits a structured action, the harness validates it, optionally asks for approval, executes it, and feeds the bounded result back into the loop. To illustrate this, below is an example of how this usually looks to the user using my Mini Coding Agent. (This is not as pretty as Claude Code or Codex because it is very minimal and uses plain Python without any external dependencies.) Figure 9: Illustration of a tool call approval request in the Mini Coding Agent. Here, the model has to choose an action that the harness recognizes, like list files, read a file, search, run a shell command, write a file, etc. It also has to provide arguments in a shape that the harness can check. So when the model asks to do something, the runtime can stop and run programmatic checks like “Is this a known tool?”, “Are the arguments valid?”, “Does this need user approval?” “Is the requested path even inside the workspace?” Only after those checks pass does anything actually run. While running coding agents, of course, carries some risk, the harness checks also improve reliability because the model doesn’t execute totally arbitrary commands. Also, besides rejecting malformed actions and approval gating, file access can be kept inside the repo by checking file paths. In a sense, the harness is giving the model less freedom, but it also improves the usability at the same time. Context bloat is not a unique problem of coding agents but an issue for LLMs in general. Sure, LLMs are supporting longer and longer contexts these days (and I recently wrote about the attention variants that make it computationally more feasible), but long contexts are still expensive and can also introduce additional noise (if there is a lot of irrelevant info). Coding agents are even more susceptible to context bloat than regular LLMs during multi-turn chats, because of repeated file reads, lengthy tool outputs, logs, etc. If the runtime keeps all of that at full fidelity, it will run out of available context tokens pretty quickly. So, a good coding harness is usually pretty sophisticated about handling context bloat beyond just cutting our summarizing information like regular chat UIs. Conceptually, the context compaction in coding agents might work as summarized in the figure below. Specifically, we are zooming a bit further into the clip (step 6) part of Figure 8 in the previous section. Figure 10: Large outputs are clipped, older reads are deduplicated, and the transcript is compressed before it goes back into the prompt. A minimal harness uses at least two compaction strategies to manage that problem. The first is clipping, which shortens long document snippets, large tool outputs, memory notes, and transcript entries. In other words, it prevents any one piece of text from taking over the prompt budget just because it happened to be verbose. The second strategy is transcript reduction or summarization, which turns the full session history (more on that in the next section) into a smaller promptable summary. A key trick here is to keep recent events richer because they are more likely to matter for the current step. And we compress older events more aggressively because they are likely less relevant. Additionally, we also deduplicate older file reads so the model does not keep seeing the same file content over and over again just because it was read multiple times earlier in the session. Overall, I think this is one of the underrated, boring parts of good coding-agent design. A lot of apparent “model quality” is really context quality. In practice, all these 6 core concepts covered here are highly intertwined, and the different sections and figures cover them with different focuses or zoom levels. In the previous section, we covered prompt-time use of history and how we build a compact transcript. The question there is: how much of the past should go back into the model on the next turn? So the emphasis is compression, clipping, deduplication, and recency. Now, this section, structured session memory, is about the storage-time structure of history. The question here is: what does the agent keep over time as a permanent record? So the emphasis is that the runtime keeps a fuller transcript as a durable state, alongside a lighter memory layer that is smaller and gets modified and compacted rather than just appended to. To summarize, a coding agent separates state into (at least) two layers: working memory: the small, distilled state the agent keeps explicitly a full transcript: this covers all the user requests, tool outputs, and LLM responses Figure 11: New events get appended to a full transcript and summarized in a working memory. The session files on disk are usually stored as JSON files. The figure above illustrates the two main session files, the full transcript and the working memory, that usually get stored as JSON files on disk. As mentioned before, the full transcript stores the whole history, and it’s resumable if we close the agent. The working memory is more of a distilled version with the currently most important info, which is somewhat related to the compact transcript. But the compact transcript and working memory have slightly different jobs. The compact transcript is for prompt reconstruction. Its job is to give the model a compressed view of recent history so it can continue the conversation without seeing the full transcript every turn. The working memory is more meant for task continuity. Its job is to keep a small, explicitly maintained summary of what matters across turns, things like the current task, important files, and recent notes. Following step 4 in the figure above, the latest user request, together with the LLM response and tool output, would then be recorded as a “new event” in both the full transcript and working memory, in the next round, which is not shown to reduce clutter in the figure above. Once an agent has tools and state, one of the next useful capabilities is delegation. The reason is that it allows us to parallelize certain work into subtasks via subagents and speed up the main task. For example, the main agent may be in the middle of one task and still need a side answer, for example, which file defines a symbol, what a config says, or why a test is failing. It is useful to split that off into a bounded subtask instead of forcing one loop to carry every thread of work at once. (In my mini coding agent, the implementation is simpler, and the child still runs synchronously, but the underlying idea is the same.) A subagent is only useful if it inherits enough context to do real work. But if we don’t restrict it, we now have multiple agents duplicating work, touching the same files, or spawning more subagents, and so on. So the tricky design problem is not just how to spawn a subagent but also how to bind one :). Figure 12: The subagent inherits enough context to be useful, but it runs inside tighter boundaries than the main agent. The trick here is that the subagent inherits enough context to be useful, but also has it constrained (for example, read-only and restricted in recursion depth) Claude Code has supported subagents for a long time, and Codex added them more recently. Codex does not generally force subagents into read-only mode. Instead, they usually inherit much of the main agent’s sandbox and approval setup. So, the boundary is more about task scoping, context, and depth. The section above tried to cover the main components of coding agents. As mentioned before, they are more or less deeply intertwined in their implementation. However, I hope that covering them one by one helps with the overall mental model of how coding harnesses work, and why they can make the LLM more useful compared to simple multi-turn chats. Figure 13: Six main features of a coding harness discussed in previous sections. If you are interested in seeing these implemented in clean, minimalist Python code, you may like my Mini Coding Agent . OpenClaw may be an interesting comparison, but it is not quite the same kind of system. OpenClaw is more like a local, general agent platform that can also code, rather than being a specialized (terminal) coding assistant. There are still several overlaps with a coding harness: it uses prompt and instruction files in the workspace, such as AGENTS.md, SOUL.md, and TOOLS.md it keeps JSONL session files and includes transcript compaction and session management it can spawn helper sessions and subagents However, as mentioned above, the emphasis is different. Coding agents are optimized for a person working in a repository and asking a coding assistant to inspect files, edit code, and run local tools efficiently. OpenClaw is more optimized for running many long-lived local agents across chats, channels, and workspaces, with coding as one important workload among several others. I am excited to share that I finished writing Build A Reasoning Model (From Scratch) and all chapters are in early access yet. The publisher is currently working on the layouts, and it should be available this summer. This is probably my most ambitious book so far. I spent about 1.5 years writing it, and a large number of experiments went into it. It is also probably the book I worked hardest on in terms of time, effort, and polish, and I hope you’ll enjoy it. Build a Reasoning Model (From Scratch) on Manning and Amazon . The main topics are evaluating reasoning models inference-time scaling self-refinement reinforcement learning distillation There is a lot of discussion around “reasoning” in LLMs, and I think the best way to understand what it really means in the context of LLMs is to implement one from scratch! Amazon (pre-order) Manning (complete book in early access , pre-final layout, 528 pages) Figure 1: Claude Code CLI, Codex CLI, and my Mini Coding Agent . Coding agents are engineered for software work where the notable parts are not only the model choice but the surrounding system, including repo context, tool design, prompt-cache stability, memory, and long-session continuity. That distinction matters because when we talk about the coding capabilities of LLMs, people often collapse the model, the reasoning behavior, and the agent product into one thing. But before getting into the coding agent specifics, let me briefly provide a bit more context on the difference between the broader concepts, the LLMs, reasoning models, and agents. On The Relationship Between LLMs, Reasoning Models, and Agents An LLM is the core next-token model. A reasoning model is still an LLM, but usually one that was trained and/or prompted to spend more inference-time compute on intermediate reasoning, verification, or search over candidate answers. An agent is a layer on top, which can be understood as a control loop around the model. Typically, given a goal, the agent layer (or harness) decides what to inspect next, which tools to call, how to update its state, and when to stop, etc. Roughly, we can think about the relationship as this: the LLM is the engine, a reasoning model is a beefed-up engine (more powerful, but more expensive to use), and an agent harness helps us the model. The analogy is not perfect, because we can also use conventional and reasoning LLMs as standalone models (in a chat UI or Python session), but I hope it conveys the main point. Figure 2: The relationship between conventional LLM, reasoning LLM (or reasoning model), and an LLM wrapped in an agent harness. In other words, the agent is the system that repeatedly calls the model inside an environment. So, in short, we can summarize it like this: LLM: the raw model Reasoning model : an LLM optimized to output intermediate reasoning traces and to verify itself more Agent: a loop that uses a model plus tools, memory, and environment feedback Agent harness: the software scaffold around an agent that manages context, tool use, prompts, state, and control flow Coding harness: a special case of an agent harness; i.e., a task-specific harness for software engineering that manages code context, tools, execution, and iterative feedback Figure 3. A coding harness combines three layers: the model family, an agent loop, and runtime supports. The model provides the “engine”, the agent loop drives iterative problem solving, and the runtime supports provide the plumbing. Within the loop, “observe” collects information from the environment, “inspect” analyzes that information, “choose” selects the next step, and “act” executes it. The takeaway here is that a good coding harness can make a reasoning and a non-reasoning model feel much stronger than it does in a plain chat box, because it helps with context management and more. The Coding Harness As mentioned in the previous section, when we say harness , we typically mean the software layer around the model that assembles prompts, exposes tools, tracks file state, applies edits, runs commands, manages permissions, caches stable prefixes, stores memory, and many more. Today, when using LLMs, this layer shapes most of the user experience compared to prompting the model directly or using web chat UI (which is closer to “chat with uploaded files”). Since, in my view, the vanilla versions of LLMs nowadays have very similar capabilities (e.g., the vanilla versions of GPT-5.4, Opus 4.6, and GLM-5 or so), the harness can often be the distinguishing factor that makes one LLM work better than another. This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code. That said, some harness-specific post-training is usually beneficial. For example, OpenAI historically maintained separate GPT-5.3 and GPT-5.3-Codex variants. In the next section, I want to go more into the specifics and discuss the core components of a coding harness using my Mini Coding Agent : https://github.com/rasbt/mini-coding-agent . Figure 4: Main harness features of a coding agent / coding harness that will be discussed in the following sections. By the way, in this article, I use the terms “coding agent” and “coding harness” somewhat interchangeably for simplicity. (Strictly speaking, the agent is the model-driven decision-making loop, while the harness is the surrounding software scaffold that provides context, tools, and execution support.) Figure 5: Minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python) Anyways, below are six main components of coding agents. You can check out the source code of my minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python), for more concrete code examples. The code annotates the six components discussed below via code comments: 1. Live Repo Context This is maybe the most obvious component, but it is also one of the most important ones. When a user says “fix the tests” or “implement xyz,” the model should know whether it is inside a Git repo, what branch it is on, which project documents might contain instructions, and so on. That’s because those details often change or affect what the correct action is. For example, “Fix the tests” is not a self-contained instruction. If the agent sees AGENTS.md or a project README, it may learn which test command to run, etc. If it knows the repo root and layout, it can look in the right places instead of guessing. Also, the git branch, status, and commits can help provide more context about what changes are currently in progress and where to focus. Figure 6: The agent harness first builds a small workspace summary that gets combined with the user request for additional project context. The takeaway is that the coding agent collects info (”stable facts” as a workspace summary) upfront before doing any work, so that it’s is not starting from zero, without context, on every prompt. 2. Prompt Shape And Cache Reuse Once the agent has a repo view, the next question is how to feed that information to the model. The previous figure showed a simplified view of this (“Combined prompt: prefix + request”), but in practice, it would be relatively wasteful to combine and re-process the workspace summary on every user query. I.e., coding sessions are repetitive, and the agent rules usually stay the same. The tool descriptions usually stay the same, too. And even the workspace summary usually stays (mostly) the same. The main changes are usually the latest user request, the recent transcript, and maybe the short-term memory. “Smart” runtimes don’t rebuild everything as one giant undifferentiated prompt on every turn, as illustrated in the figure below. Figure 7: The agent harness builds a stable prompt prefix, adds the changing session state, and then feeds that combined prompt to the model. The main difference from section 1 is that section 1 was about gathering repo facts. Here, we are now interested in packaging and caching those facts efficiently for repeated model calls. The “stable” “Stable prompt prefix” means that the information contained there doesn’t change too much. It usually contains the general instructions, tool descriptions, and the workspace summary. We don’t want to waste compute on rebuilding it from scratch in each interaction if nothing important has changed. The other components are updated more frequently (usually each turn). This includes short-term memory, the recent transcript, and the newest user request. In short, the caching aspect for the “Stable prompt prefix” is simply that a smart runtime tries to reuse that part. 3. Tool Access and Use Tool access and tool use are where it starts to feel less like chat and more like an agent. A plain model can suggest commands in prose, but an LLM in a coding harness should do something narrower and more useful and be actually able to execute the command and retrieve the results (versus us calling the command manually and pasting the results back into the chat). But instead of letting the model improvise arbitrary syntax, the harness usually provides a pre-defined list of allowed and named tools with clear inputs and clear boundaries. (But of course, something like Python can be part of this so that the agent could also execute an arbitrary wide list of shell commands.) The tool-use flow is illustrated in the figure below. Figure 8: The model emits a structured action, the harness validates it, optionally asks for approval, executes it, and feeds the bounded result back into the loop. To illustrate this, below is an example of how this usually looks to the user using my Mini Coding Agent. (This is not as pretty as Claude Code or Codex because it is very minimal and uses plain Python without any external dependencies.) Figure 9: Illustration of a tool call approval request in the Mini Coding Agent. Here, the model has to choose an action that the harness recognizes, like list files, read a file, search, run a shell command, write a file, etc. It also has to provide arguments in a shape that the harness can check. So when the model asks to do something, the runtime can stop and run programmatic checks like “Is this a known tool?”, “Are the arguments valid?”, “Does this need user approval?” “Is the requested path even inside the workspace?” Figure 10: Large outputs are clipped, older reads are deduplicated, and the transcript is compressed before it goes back into the prompt. A minimal harness uses at least two compaction strategies to manage that problem. The first is clipping, which shortens long document snippets, large tool outputs, memory notes, and transcript entries. In other words, it prevents any one piece of text from taking over the prompt budget just because it happened to be verbose. The second strategy is transcript reduction or summarization, which turns the full session history (more on that in the next section) into a smaller promptable summary. A key trick here is to keep recent events richer because they are more likely to matter for the current step. And we compress older events more aggressively because they are likely less relevant. Additionally, we also deduplicate older file reads so the model does not keep seeing the same file content over and over again just because it was read multiple times earlier in the session. Overall, I think this is one of the underrated, boring parts of good coding-agent design. A lot of apparent “model quality” is really context quality. 5. Structured Session Memory In practice, all these 6 core concepts covered here are highly intertwined, and the different sections and figures cover them with different focuses or zoom levels. In the previous section, we covered prompt-time use of history and how we build a compact transcript. The question there is: how much of the past should go back into the model on the next turn? So the emphasis is compression, clipping, deduplication, and recency. Now, this section, structured session memory, is about the storage-time structure of history. The question here is: what does the agent keep over time as a permanent record? So the emphasis is that the runtime keeps a fuller transcript as a durable state, alongside a lighter memory layer that is smaller and gets modified and compacted rather than just appended to. To summarize, a coding agent separates state into (at least) two layers: working memory: the small, distilled state the agent keeps explicitly a full transcript: this covers all the user requests, tool outputs, and LLM responses Figure 11: New events get appended to a full transcript and summarized in a working memory. The session files on disk are usually stored as JSON files. The figure above illustrates the two main session files, the full transcript and the working memory, that usually get stored as JSON files on disk. As mentioned before, the full transcript stores the whole history, and it’s resumable if we close the agent. The working memory is more of a distilled version with the currently most important info, which is somewhat related to the compact transcript. But the compact transcript and working memory have slightly different jobs. The compact transcript is for prompt reconstruction. Its job is to give the model a compressed view of recent history so it can continue the conversation without seeing the full transcript every turn. The working memory is more meant for task continuity. Its job is to keep a small, explicitly maintained summary of what matters across turns, things like the current task, important files, and recent notes. Following step 4 in the figure above, the latest user request, together with the LLM response and tool output, would then be recorded as a “new event” in both the full transcript and working memory, in the next round, which is not shown to reduce clutter in the figure above. 6. Delegation With (Bounded) Subagents Once an agent has tools and state, one of the next useful capabilities is delegation. The reason is that it allows us to parallelize certain work into subtasks via subagents and speed up the main task. For example, the main agent may be in the middle of one task and still need a side answer, for example, which file defines a symbol, what a config says, or why a test is failing. It is useful to split that off into a bounded subtask instead of forcing one loop to carry every thread of work at once. (In my mini coding agent, the implementation is simpler, and the child still runs synchronously, but the underlying idea is the same.) A subagent is only useful if it inherits enough context to do real work. But if we don’t restrict it, we now have multiple agents duplicating work, touching the same files, or spawning more subagents, and so on. So the tricky design problem is not just how to spawn a subagent but also how to bind one :). Figure 12: The subagent inherits enough context to be useful, but it runs inside tighter boundaries than the main agent. The trick here is that the subagent inherits enough context to be useful, but also has it constrained (for example, read-only and restricted in recursion depth) Claude Code has supported subagents for a long time, and Codex added them more recently. Codex does not generally force subagents into read-only mode. Instead, they usually inherit much of the main agent’s sandbox and approval setup. So, the boundary is more about task scoping, context, and depth. Components Summary The section above tried to cover the main components of coding agents. As mentioned before, they are more or less deeply intertwined in their implementation. However, I hope that covering them one by one helps with the overall mental model of how coding harnesses work, and why they can make the LLM more useful compared to simple multi-turn chats. Figure 13: Six main features of a coding harness discussed in previous sections. If you are interested in seeing these implemented in clean, minimalist Python code, you may like my Mini Coding Agent . How Does This Compare To OpenClaw? OpenClaw may be an interesting comparison, but it is not quite the same kind of system. OpenClaw is more like a local, general agent platform that can also code, rather than being a specialized (terminal) coding assistant. There are still several overlaps with a coding harness: it uses prompt and instruction files in the workspace, such as AGENTS.md, SOUL.md, and TOOLS.md it keeps JSONL session files and includes transcript compaction and session management it can spawn helper sessions and subagents Build a Reasoning Model (From Scratch) on Manning and Amazon . The main topics are evaluating reasoning models inference-time scaling self-refinement reinforcement learning distillation Amazon (pre-order) Manning (complete book in early access , pre-final layout, 528 pages)

0 views
マリウス 2 weeks ago

Updates 2026/Q1

This post includes personal updates and some open source project updates. 안녕하세요 and greetings from Asia! Right now I’m in Seoul, Korea. I’ll start this update with a few IRL experiences regarding my time here and some mechanical keyboard related things. If you’re primarily here for the technical stuff, you can skip forward or even skip all of the personal things and jump straight to the open source projects . With that said, let’s dive straight into it. Seoul has been one of the few places that I genuinely love coming back to. I cannot pinpoint why that is, but there’s a particular rhythm to the capital that’s hard to explain until you’ve lived in it for a while. Not the tourist rhythm, where you tick off palaces and night markets to “complete your bucket list” but the deeper, slower one that makes the city truly enjoyable. The rhythm of picking a neighborhood, learning its backstreets, finding your morning coffee spot, and then finding a different one the following week. I spent my time here doing exactly that, and what follows are some honest reflections on a city that continues to surprise me. As some of you might know by now, I’m basically the Mark Wiens of coffee, because I travel for coffee , except that I don’t film myself and put it online. But I’ve surely had a lot of coffee, in a lot of cities. However, Seoul’s coffee scene operates on a completely different level. The sheer density of independently run coffee shops is staggering. Within a fifteen-minute walk in neighborhoods like Mangwon , Hapjeong , or Sangsu , you can pass dozens of places where someone is carefully dialing in their espresso, roasting their own beans, and serving a beautifully made Americano for usually around three or four thousand KRW . That’s roughly two to three US dollars for a genuinely excellent cup of coffee, which is a pretty solid value proposition. I’ve been in Seoul before, multiple times actually, and I had the chance to find genuinely great cafes which I kept on my list of places to revisit whenever I would happen to come back. And so I did. But as life moves forward, places change or, in more unfortunate circumstances, even close down for good. das ist PROBAT is one of the places that sadly closed just a few days before I arrived. In its spot is now a new Ramen restaurant that seemed fairly popular. A few other places I’d loved on previous visits and that are still operating left me genuinely disappointed this time around. Compile Coffee was one of the sharper letdowns. Two years ago, it was a highlight. This time, however, the experience felt rushed and careless. The barista hurried through the ordering process, despite no one else waiting in line, and the cappuccino that followed was a spectacle for all the wrong reasons. The milk was frothed to an almost comical extreme, the liquid poured in first, then the foam scooped in one spoonful at a time, and finally a thick layer of chocolate powder on top that I hadn’t asked for. It felt like watching a car accident happening slowly enough for every detail to remain stuck in one’s head, yet too fast to articulate anything about it. I gave the place another try a few weeks after this incident only to experience a similarly rushed and somewhat unloving execution. Another change that I hadn’t seen coming was Bean Brothers in Hapjeong . The coffee house converted from their old industrial-style space to a noticeably more polished and… well, “posh” one. The new spot is nice enough, but the vibe has shifted towards a more upscale, less alternative one. In addition, they also opened up a new location in Sangsu , which leans further in that direction, with wait times for walk-ins that suggest a clientele they’re specifically courting. Bean Brothers seems to be evolving into a streamlined, upscale chain, and while that’s not inherently bad, it’s a different thing from what originally made it special. And last but not least, there’s Anthracite Coffee Roasters , specifically the Seogyo location , which had been one of my absolute favorite spots back in 2023. It pains me to say this, but the place has become a ripoff, with this specific location charging eight thousand KRW for a hot (drip coffee) Americano to go. For context, the healthy food chain Preppers serves a full meal consisting of a big portion of rice and a protein, as well as some greenery, for 8,900 KRW. The cup of drip coffee at Anthracite is only halfway full, and most of the time it arrives already lukewarm, which makes it essentially useless as a to-go option, unless all you want is to gulp down around 120ml of coffee. You’d think a place charging premium prices would at least discount a thousand Won for takeaways, as many Seoul cafes do. The Seogyo location’s commitment to drip coffee not only makes it feel somewhat pretentious considering the prices, but also adds a whole other layer of issues. During peak hours, the wait is considerable, and the coffee menu is limited to a small rotation of options that, more often than not, skew toward the acidic side of the spectrum. If that’s your preference, there’s nothing wrong with that. But when combined with the pricing, the lukewarm temperatures, and the half-filled cups, the experience increasingly feels like you’re paying for a brand name rather than a good cup of coffee. However, the beautiful thing about Seoul’s coffee culture is that for every established spot that drifts toward becoming another Starbucks experience, ten new places pop up that more than make up for it. The ecosystem is relentlessly self-renewing. In the same neighborhood as Anthracite ’s Seogyo location, I discovered a handful of places that are not only better in the cup, but dramatically more affordable: These are only a handful of places that I think of off the top of my head, but rest assured that there are plenty more. The quiet confidence of people who care about the craft without needing to perform it is what makes these places special. No gimmicks, no inflated prices justified by whatever interior design. Just friendly people and good coffee that’s made well and respects the customer. The time in Seoul reinforced what I already knew from past visits. This city is one of the best places in the world to simply be in. The neighborhoods are endlessly walkable, the infrastructure works beautifully (with the exception of traffic lights and escalators, but more on that in a bit), and the coffee culture, despite the occasional disappointment from places that have lost their way, remains one of the richest and most dynamic I’ve encountered anywhere. The disappointments, if anything, make the discoveries sweeter. The food also deserves a mention. Seoul is one of those cities where even a quick, unremarkable lunch tends to be delicious and more often than not at a sane price, judging from a global perspective. Compared to other capital cities like London or, worse, Madrid , in which food prices are frankly absurd, especially when taking the generally low quality into account, the cost of food in Seoul still strikes me as overall reasonable. Unlike for example Madrid , which is an almost homogenous food scene, Seoul offers incredibly diverse options, ranging from traditional Korean food, all the way to Japanese, Thai, Vietnamese and even European and Latin American food. And while the Italian pasta in many places in Seoul might not convince an actual Italian gourmet, it suddenly becomes a very high bar to complain about dishes that originate as far as twelve thousand kilometers/seven thousand miles away and that have almost no local cultural influence . Another beautiful thing about Seoul, at least for keyboard -enthusiasts like I am, is the availability of actual brick-and-mortar keyboard stores. Seoul is home to three enthusiast keyboard shops: Funkeys , SwagKeys , and NuPhy . The first two are local vendors that have physical locations across Seoul, the latter is a Hong Kong-based manufacturer of entry-level enthusiast boards that just opened a showroom in Mapo-gu . I took the time to try to visit each of them and I even scooped up some new hardware. The Funkeys store is located in the Tongsan district, on the second floor of a commercial space. The store is relatively big and stocks primarily FL-Esports , AULA , and 80Retros boards, keycaps and switches, but you can also find a few more exclusive items like the Angry Miao CyberBoard . I seized the opportunity to test (and snap up) some 80Retros switches, but more on that further down below. SwagKeys is probably a name that many people in the keyboard enthusiast community have stumbled upon at least once. They are located in the Bucheon area and they used to have a showroom, which I tried to visit. Sadly, it wasn’t clear to me that the showroom was temporarily (permanently?) closed, so I basically ended up standing in front of locked doors of an otherwise empty space. Luckily, however, SwagKeys have popup stores in different malls, which I have visited as well. Unfortunately in those popup stores they only seem to offer entry-level items; Enthusiast products are solely available through their web shop and cannot be ordered and picked-up at any of their pop-up locations. I was curious to test and maybe get the PBS Modern Abacus , which SwagKeys had in stock at that time, but none of the pop-ups had it available. Exclusive SwagKeys pop-up. This is a shared space with plenty of other brands to choose from. The NuPhy showroom in the Mapo-gu area is a small space packed with almost all the products the brand offers, from keyboards, over switches and keycaps all the way to accessories and folios /bags. However, the showroom is exactly that: A showroom. There’s no way to purchase any of the hardware. As with almost everything in Seoul, your best bet is to order it from NuPhy’s official Korean store, which accepts Naver Pay . Apart from Funkeys , SwagKeys and NuPhy , there are various brands (like Keychron , Razer and Logitech ) that can be found across in-store pop-ups in different malls. It’s interesting to see a society like the one in Seoul, that has largely moved away from offline-shopping for almost everything but fashion (more on this in a moment) having that many shops and pop-ups selling entry-level mechanical keyboards. I guess with keyboards being something in which haptics and personal preference play a big role, it makes sense to have places for people to test the various boards and switches, even if most of them will ultimately only sell the traditional Cherry profiles. Speaking of mechanical keyboards, I happened to be in the right place at the right time this year to visit the Seoul Mechanical Keyboard Expo 2026 at the Seoul Trade Exhibition Center ( SETEC ) in the Gangnam area. It was an interesting experience despite being less of a traditional enthusiast community event and more of a manufacturer trade fair targeting average users. Because yes, the average user in Korea does indeed seem to have a soft-spot for mechanical keyboards. This, however, meant that most vendors would primarily showcase the typical mainstream products, like Cherry profile keycaps and boards that are more affordable. For example while Angry Miao were around, their Hatsu board was nowhere to be seen. And it made sense: Every vendor had little signs with QR codes that would lead to their store’s product page for people to purchase it right away. Clearly, the event was geared more toward the average consumer than the curious enthusiast. It was nevertheless interesting to see an event like this happening in the wild . Getting around is different in Seoul than it is in other cities. If you’re navigating Seoul with Google Maps , you’re doing it wrong. Naver Map is simply superior in every way that matters for daily life here, although this might soon change . Not only does Naver show you where the crosswalks are, something you don’t realize you need until you’ve jaywalked across six lanes of traffic because Google told you the entrance was “right there” , but it also shows last order times for restaurants and cafes, saving you from going to places only to find out they’re not serving anymore. And public transit arrival times? Accurate to a degree that feels almost unsettling. You trust Naver , because it earns that trust. Clearly, however, me being me , I only used Naver without an account and on a separate profile on my GrapheneOS phone . Also, I mostly use it for finding places and public transit; For everything else CoMaps works perfectly fine, and I take care to contribute to OSM whenever I can. Note: The jaywalking example isn’t too far-fetched. You’re very tempted to cross at red lights simply because traffic light intervals in Seoul are frankly terrible. As a pedestrian you age significantly waiting for the stoplight to finally turn green. If you’re unlucky, you’re at a large crossing that is followed by smaller crossings, which for reasons I cannot comprehend turn green for pedestrians at the exact same time. Unless you are Usain Bolt there is no way to make it across multiple crossings in one go, leading you to have to stop at every crossing for around three minutes. That doesn’t sound like much, until you’re out at -15°C/5°F. Seoul has too many pedestrian crossings with traffic lights, and too few simple marked crosswalks. This is however probably due to drivers often not giving a damn about traffic rules and almost running over people trying to cross at regular marked crossings. My gut feeling tells me that, because of the indifference of drivers, the government decided to punish every traffic participant by building traffic lights at almost every corner. However, this didn’t have the (supposedly) intended effect, as especially scooters, but also regular cars often couldn’t care less about their bright red stop light. Considering the amount of CCTVs (more on this in just a second) one could assume that traffic violations are being enforced strictly. However, judging by the negligence of drivers towards traffic rules I would guess that this is probably not happening. Circling back to the painfully long waiting times at crossings, that are only outrivalled by painfully slow escalators literally everywhere, a route for which CoMaps estimates 10 minutes can hence easily become a 20 minute walk. Naver , however, appears to be making time estimations based on average waiting times at crossings, leading to it being more accurate than CoMaps in many cases. With Naver being independent of Google , it works without any of the Google Play Services bs that apps often require for anything related to location. And don’t get me wrong, Naver is just as much of an E Corp as Google , but there’s something worth appreciating on a broader level here. Korea built and maintains its own mapping platform rather than ceding that ground to US big tech, and it shows. Naver Map is designed by people who actually navigate Korean cities, and that local knowledge is baked into every interaction. I would love to see more countries doing the same, especially European ones. While there is Nokia HERE Maps HERE WeGo in Europe, it’s as bad for public transport as you might expect from a joint venture between Audi , BMW and Mercedes-Benz , and it is not at all comparable to Naver Maps , let alone Naver as a whole. One big caveat with Naver , however, is that it will drain your battery like a Mojito on a hot summer evening, so it’s essential to carry a power bank . Even on a Pixel 8 , the app feels terribly clunky and slow. In addition, the swiping recognition more often than not mistakes horizontal swipes (for scrolling through photos of a place) for vertical swiping, making it really cumbersome to use. I assume that on more modern Samsung and Apple devices the app probably works significantly better, as the Korean market appears to be absolutely dominated by these two brands. As a matter of fact, the Google Pixel is not even being sold in Korea, which brings me to one important aspect of life in Seoul that might be interesting for the average reader of this site. As much as I enjoy Seoul, it is an absolute privacy disaster. CCTV cameras in Seoul are everywhere and the city government actively expands and upgrades them as part of its public-safety and smart city initiatives. The systems are “AI” -enabled and can automatically detect unusual behavior or safety risks . It’s hard to find a definitive number, but it’s estimated that Seoul is covered with around 110,000 to 160,000 surveillance cameras, with an ongoing expansion of the network. This makes Seoul one of the most surveilled major cities in the world. In addition to CCTV surveillance, Seoul is also almost completely cashless. Most places only accept card/NFC payments with cash payments being a highly unusual thing to do. While there are still ATMs around, getting banknotes is almost pointless. You can top up your transit card using cash, and you might be thinking that at least this way nobody knows who owns the card and you cannot be tracked, but with the amount of “AI” cameras everywhere, there’s no need to track people using an identifier as primitive as a transit card. Speaking of which, mobile connectivity is another thing. In Korea SIM cards are registered using an ID/Passport. From what I have found, there’s no way to get even just a pre-paid SIM without handing over your ID. In addition, with everything being cashless, your payment details are also connected to the SIM card. You could of course try to only use the publicly available WiFi to get around and spare yourself the need for a SIM card. However, the moment you’d want to order something online, you will need a (preferably Korean) phone number that can retrieve verification SMS and you might even need to verify your account with an ID. You might think that this doesn’t really matter because online shopping isn’t something vital that you have to do. But with Seoul being almost completely online in terms of shopping you cannot find even the most basic things easily in brick-and-mortar stores. For example, I was looking to upgrade my power brick from the UGREEN X757 15202 Nexode Pro GaN 100W 3-Port charger that I’ve been using for the past year to the vastly more powerful UGREEN 55474 Nexode 300W GaN 5-Port charger. I bought the 3-Port Nexode last year during my time in Japan , in a Bic Camera . However, in Seoul it was impossible to find any UGREEN product. In fact, I could not find any household name products, like Anker or Belkin , regardless of where I looked. Everyone kept telling me to look online, on Naver or Coupang . Short story long, to be able to live a normal life in Seoul you will unfortunately have to hand over your details at every corner. Note : Only one day before publishing this update, the popular Canadian YouTuber Linus Tech Tips uploaded a video titled “Shopping in Korea’s Abandoned Tech Mall” , which perfectly captures the sad state of offline tech stores in Seoul. What I found more shocking than this, however, is that it doesn’t seem like privacy concerns are part of the public discourse. The dystopian picture that people in the Western hemisphere paint in literature and movies, in which conglomerates run large parts of society and the general population are merely an efficient workforce and consumers isn’t far off from how society here appears to be working. At the end of February I ran into an issue that I had seen before : Back then, I attributed it to either alpha particles or cosmic rays, as I was unable to reproduce the issue nor reliably find bad regions in the RAM. This time, however, my laptop was crashing periodically, for seemingly no reason at all. After running the whole playbook of and to verify the filesystem, as well as multiple rounds of the , I found several RAM addresses that were reported faulty. I decided to seize the opportunity and publish a post on BadRAM . At this point, I removed one of the two 32GB RAM sticks and it appears to have helped at least somewhat: The device now only crashes every few hours rather than every twenty or so minutes. But with RAM and SSD prices being what they are, I’m not even going to attempt to actually fix the issue. After all, it might well be that whatever is causing the buzzing sound I’ve been hearing on my Star Labs StarBook has also had an impact on the RAM modules or even the logic board. I’m going to hold on to this hardware for as long as possible, but I’ve also realized that the StarBook has aged quicker than I anticipated. I have therefore been glancing at alternatives for quite a while now. I love what Star Labs has done with the StarBook Mk VI AMD in terms of form factor and Linux support. Back when I bought it , the Zen 3 Ryzen 7 5800U had already been on the market for almost 4 years and wasn’t exactly modern anymore. However, its maturity gave me hope that Linux support would be flawless (which is the case) and that Star Labs would eventually be able to deliver on their promises. When I purchased the device, Star Labs had advertised an upcoming upgrade from its American Megatrends EFI (“BIOS”) to Coreboot , an open-source alternative. Years later, however, this upgrade is still nowhere to be seen . At this point it is highly unlikely, that Coreboot on the AMD StarBook will ever materialize. As already hinted exactly one year ago I’m done waiting for Star Labs and I am definitely not going to look into any of their other (largely obsolete) AMD offerings, especially considering the outrageous prices. I’m also not going to consider any of their StarBook iterations, whether it’s the regular version, or the Horizon , given that none of them come with AMD CPUs any longer, and, more importantly, that their Intel processors are far too outdated for their price tags. Let alone all the quirks the Star Labs hardware appears to be having, and the firmware features that sometimes make me wonder what the actual f… the Star Labs people are smoking. Note : The firmware update lists the following update: * Remove the power button debounce (double press is no longer required) “Power button debounce” is what Star Labs calls the requirement to double-press the power button in order to power on the laptop when it is not connected to power. It is mind-boggling that this feature made it into the firmware to begin with. Who in their right mind thought “Hey, how about we introduce a new feature with the coming firmware update which we won’t communicate anywhere, which requires the user to press the power button quickly twice in a row for their device to power on, but only when no power cable is connected? And how about if they only press it once when no power cord is attached the device simply won’t boot, but it will nevertheless produce a short audible sound to make it seem like it tried to boot, but in reality it won’t boot?” …? Because this is exactly what the “power button debounce” was about. I believe it got introduced sometime around , but I can’t really tell, because Star Labs didn’t mention it anywhere. Short story long, instead of spending more money on obsolete and quirky Star Labs hardware, I have identified the ASUS ExpertBook Ultra as a potential successor. The ExpertBook Ultra is supposed to be released in Q2 in its highest performance variant, featuring the Intel Core Ultra X9 Series 3 388H “Panther Lake” processor, running at 50W TDP and sporting up to 64 GB LPDDR5x memory, which is the model that I’m interested in. I will wait out the reviews, specifically for Linux, but unless major issues are to be expected I’ll likely upgrade to it. “Wait, aren’t you Team Red?” , you might be wondering. And, yes, for the past decade I’ve been solely purchasing AMD CPUs and GPUs, with one exception that was a MacBook with Intel CPU. However, at this point I’m giving up on ever finding an AMD-based laptop that fits my specs, because sadly with AMD laptops it’s always something : Either the port selection sucks, or there’s no USB4 port at all, or if there is it’s only on one specific side, or the display and/or display resolution sucks, or the battery life is bad, or you can only get some low-TDP U variant, or the device is an absolute chonker, or or or. It feels like with an AMD laptop I always have to make compromises at a price point at which I simply don’t want to have to make these compromises anymore. So unless AMD and the manufacturers – looking specifically at you, Lenovo! – finally get their sh#t together to build hardware that doesn’t feel like it’s artificially choked, I’m going back to Team Blue . “Panther Lake” seems to have made enough of a splash, TDP-performance-wise, that it is worth considering Intel again, despite the company’s history of monopolistic business tactics, its anti-consumer behavior, its major security flaws, its quality control issues, and its general douchebag attitude towards everything and everyone. The ASUS ExpertBook Ultra appears to feature the performance that I want, with all the connectivity that I need, packaged in a form factor that I find aesthetically pleasing and lightweight enough to travel with. If the Intel Core Ultra X9 388H notably exceeds the preliminary benchmarks and reviews of the Intel Core Ultra X7 358H version of the ExpertBook Ultra , then I’m “happy” to pay the current market premium for a device that will hopefully hold up for much longer and with fewer quirks than I’ve experienced with the StarBook . With a Speedometer 3.1 rating of around 30 and reporting 11:25:05 hours for on my current device, however, I’m fairly certain that even the X7 358H will be a significant improvement. “Did you hear about the latest XPS 14 & 16 from Dell? They also come with Panther Lake!” , I hear you say. See here and there on why those are seemingly disappointing options. The tl;dr is that Dell only feeds them 25W (14") / 35W (16"), instead of the 45W that ASUS runs the CPU at. I can’t tell for sure how long I’ll be able to continue working on the StarBook . While I can do the most critical things, the looming threat of data-corruption and -loss is frightening. The continuous crashes also introduce unnecessary overhead. I’m hoping for ASUS to make the ExpertBook Ultra available rather sooner than later, but if there’s no clarity on availability soon I might have to go with a different option. Ultrabook Review luckily has a full list of Panther Lake laptops to help with finding alternatives. What’s the second best thing that can happen when your computer starts failing? Exactly: Your phone (slowly) dying. It appears that the infamous Pixel 8 green-screen-of-death hit my GrapheneOS device, making it almost impossible to use it. Not only does the display glitch terribly, but it appears that the lower bottom part of the phone gets abnormally hot. When the glitching began, it would be sufficient to literally slap the bottom part of the phone and it would temporarily stop glitching. Sadly, the effectiveness of this workaround has decreased so much over time that now I basically need to squeeze the bottom part of the phone for the glitching to stop. The moment I decrease force, the screen starts glitching again. My plan was to keep the Google Pixel 8 for the next few years and eventually move to a postmarketOS /Linux phone as soon as there will be a viable option. Sadly it seems that I’m going to have to spend more money on Google’s bs hardware to get another GrapheneOS device for the time being. Unfortunately Google is not selling the Pixel devices across Asia, making it hard to find an adequate replacement for the phone right now. I might just have to suck it up and wait until I’ll pass by a region in which Pixel devices are more widely available. Of course, I luckily brought backups , although those run malware and are hence less than ideal options. My Anker Soundcore Space Q45 have died on me during a flight, for absolutely no reason at all. I purchased them back at the end of May 2024 and now, after not even 2 years it appears that the electronics inside of them broke in a way in which the headphones cannot be turned off or on again. They seem to be in a sort-of odd state in between, in which pressing e.g. the ANC button does something and makes the LED light up, but there’s no Bluetooth connectivity whatsoever. When connecting them via USB-C to power or to another device, the LED changes dozens of times per second between white and red. Holding the power button makes the LED turn on (white) but nothing else. The moment the power button is let go, the LED turns back off. This is yet another Anker product that broke only shortly after its warranty expired and I’m starting to see a common theme here. Hence, I will avoid Anker products going forward, especially given the tedious support that I had experienced in the past with one of their faulty power banks. I still use the Soundcore headphones via audio jack, as this luckily works independently of the other electronics. To avoid anything bad happening, especially during flights, I opened the left earcup and removed the integrated battery. The USI 2.0 stylus that I had bought back in mid September of 2024 from the brand Renaisser is another hardware item that has pretty much died. It seems like the integrated battery is done, hence the pen doesn’t turn on anymore unless a USB-C cable is connected to it to power it externally. While I’m still using it, it is slightly inconvenient to have a relatively stiff USB-C cable pull on the upper end of the pen while writing or editing photos, which is what I use the pen primarily for. As mentioned in the Seoul part, I picked up a handful of mechanical keyboard-related items, namely MX switches for my keyboard(s) . KTT x 80Retros GAME 1989 Orange , 40g (22mm KOS single-stage extended, bag lubed with Krytox 105 ), lubed with Krytox 205G0 . 80Retros x HMX Monochrome , 42g (48g bottom out), LY stem, PA12 top housing, HMX P2 bottom housing, 22mm spring, factory lubed, 2mm pre-travel, 3.5mm total. I invested quite some time in pursuing my open source projects in the past quarter, hence there are a few updates to share. This quarter I have finally found the time to also update my feature and make it work with the latest version of Ghostty , the cross-platform terminal emulator written in Zig. You can use this commit if you want to patch your version of Ghostty with this feature. It is unlikely that the Ghostty team is ever going to include this feature in their official release, yet I’m happy to keep maintaining it as it’s not a lot of code. I have updated and it now supports a new flag (that does not support), which makes it possible to build a complete power management policy directly through command-line arguments. I have documented it in detail in the repository , but the idea is that the flag allows executing arbitrary shell commands when the battery reaches a specific percentage, either by charging or discharging. The flag takes three arguments: For , the command fires when the battery percentage drops to or below the given value. For , it fires when the percentage reaches or exceeds it. The command fires once when the condition is met and will only fire again after the condition has cleared and been met again. Additionally, the flag can be specified multiple times to define different rules. This makes it possible to build a complete power management policy, from low-battery warnings to automatic shutdown, without any external scripts or configuration files. The benefit this has over, let’s say, rules, is that script execution as the current user is significantly easier, less hacky and poses fewer overall security risks, as does not need to (read: should not ) be run in privileged mode. Another one of my Zig tools that got a major update is , the command line tool for getting answers to everyday questions like or more importantly . The new version has received an update to work with Zig 0.15.0+ and its command line arguments parser logic was rewritten from scratch to be able to handle more complex cases. In addition, is now able to do a handful of velocity conversions, e.g. . As a quick side note, alongside the Breadth-first search implementation that it is using, , has also been updated to support Zig 0.15.0+. I had some fun a while ago building an XMPP bot that’s connected to any OpenAI API (e.g. ) and is able to respond when mentioned and respond to private messages. It preserves a single context across all messages, which might not be ideal in terms of privacy, but it is definitely fun in a multi-user chat – hey, btw, come join ours! The code is relatively crude and simple. Again, this was a just a two-evening fun thing, but you can easily run the bot yourself, check the README and the example configuration for more info. The work on my new project, ▓▓▓▓▓▓▓▓▓▓▓, which I had announced in my previous status update sadly didn’t progress as quickly as I was expecting it to, due to (amongst other things) the RAM issues that I’ve had to deal with. It also turns out that when writing software in 2026, everyone seems to expect instant results, given all the Codexes and Claudes that are usually being employed these days to allow even inexperienced developers to vibe code full-blown Discord alternatives within shorts periods of time. However, because I don’t intend to go down that path with ▓▓▓▓▓▓▓▓▓▓▓, it will sadly take some more time for me to have a first alpha ready. To everyone who reached out to offer their help with alpha testing: You will be the first ones to get access as soon as it’s ready. Kauf Roasters : A roastery with a clear focus on simplicity and quality without pretension. Identity Coffee Lab : This one stunned me. A hot Americano to go for 3,000 KRW. That’s almost a third of what Anthracite charges. And the coffee isn’t just cheaper, it is significantly better! It’s a bigger cup, it’s notably less acidic, and, here’s the part that really got me, it comes out steaming hot and stays that way for a good twenty minutes. You can actually walk around and sip it casually, even in freezing cold temperatures, just the way a to-go coffee is meant to be enjoyed, instead of gulping it down before it turns into cold brew. Oscar Coffee Booth : This became a personal favorite. Another spot where the coffee is serious, the price is fair, and nobody is trying to impress you with anything other than a well-made drink. On top of that the owner is a genuinely kind person. : Either (aliases: , ) or (aliases: , ) : The battery level (number from 0 to 100) : The shell command to execute

0 views
Brain Baking 2 weeks ago

App Defaults In March 2026

It’s been almost three years since sharing my toolkit defaults (2023) . High time to report an update. There’s a second reason to post this now: I’ve been trying to get back into the Linux groove (more on that later), so I’m hoping to either change the defaults below in the near future or streamline them across macOS & Linux. When the default changed I’ll provide more information; otherwise see the previous post as linked above. Some more tools that have been adapted that don’t belong in one of the above categories: A prediction for this post in 2027: all tools have been replaced with Emacs. All silliness aside; Emacs is the best thing that happened to me in the last couple of months. Related topics: / lists / app defaults / By Wouter Groeneveld on 29 March 2026.  Reply via email . Backup system : Still Restic, but I added Syncthing into the loop to get that 1-2-3 backup number higher. I still have to buy a fire-proof safe (or sync it off-site). Bookmarks and Read It Later systems : Still Alfred & Obsidian. Experimenting with Org-mode and org-capture; hoping to migrate this category to Emacs as well. Browser : Still Firefox. Calendar and contacts : Still Self-hosted Radicale. Chat : Mainly Signal now thanks to bullying friends into using it . Cloud File Storage : lol, good one. Coding environment : For light and quick scripting, Sublime Text Emacs! Otherwise, any of the dedicated tools from the JetBrains folks. and can only do so much; it’s dreadful in Java. Image editor : Still ImageMagick + GIMP. Mail : Apple Mail for macOS for brainbaking Mu4e in Emacs! and Microsoft Outlook for work Apple Mail for the work Exchange server. I didn’t want to mix but since Mu cleared up Mail, that’s much better than Outlook. Music : Still Navidrome. Notes : Still pen & paper but I need to remind myself to take up that pen more often. Password Management : Still KeePassXC. Photo Management : Still PhotoPrism. I considered replacing it but I barely use it; it’s just a photo dump place for now. Podcasts : I find myself using the Apple Podcast app more often than in 2023. I don’t know if that’s a bad thing—it will be if I want to migrate to Linux. Presentations : Haven’t found the need for one. RSS : Still NetNewsWire but since last year it’s backed by a FreshRSS server making cross-platform reading much better. Android client app used is Randrop now, so that’s new. Spreadsheets : For student grading, Google Sheets or Excel if I have to share it with colleagues . My new institution is pro Teams & Office 365. Yay. Text Editor : I’m typing this Markdown post in Sublime Text Emacs. Word Processing : Still Pandoc if needed. Terminal : emulator: iTerm2 Ghostty, but evaluating Kitty as well (I hated how the iTerm2 devs shoved AI shit in there); shell: Zsh migrated to Fish two days ago! The built-in command line option autocomplete capabilities are amazing. Guess what: more and more I’m using eshell and Emacs. Karabiner Elements to remap some keys (see the explanation ) I tried out Martha as a Finder alternative. It’s OK but I’d rather dig into Dired (Emacs)—especially if I see the popularity of tools like that just steal Dired features. I replaced quite a few coreutils CLI commands with their modern counterparts: now is , now is , now is , now is , and can be used to enhance shell search history but Fish eliminated that need. AltTab for macOS replaces the default window switcher. The default didn’t play nice with new Emacs frames and I like the mini screenshot.

1 views
blog.philz.dev 2 weeks ago

What is Buildkite?

If you're starting a new project, just skip the misery of GitHub actions and move on. Buildkite mostly gets it. The core Buildkite noun is a Pipeline, and, as traditional for an enterprise software company, their docs don't really tell you what's what. The point is that your pipeline should be: Pipelines can add steps to themselves . So, you can write a script to generate your pipeline (or just store it in your repo), and cat it into the command, and that's how the rest of your steps are discovered. Pipeline steps are each executed in their own clean checkout of what you're building. So, if you want to run the playwright tests in parallel with the backend tests (or whatever), you just declare that as two different steps, but they're part of the same thing. Pipelines have a dependency graph between steps that's conceptually similar to . (Perhaps was the Make replacement that I first heard of that did "generate the ninja graph"?) The agents seem pretty good at manipulating Buildkite once you give them an API key. They also seem to not in-line shell scripts into the yaml, which is Obviously Good. The ways to speed up the build is always the same: cache and parallelize. A 16-core machine for 1 minute costs the same as an 2-core machine for 8 minutes, and I know which one I'd rather wait for! Buildkite makes parallelism pretty easy. Anyway, it's pretty good. Thanks, Buildkite.

0 views
neilzone 3 weeks ago

Initial thoughts on the tiny XTEINK X4 ereader

What fits nicely in my hand and gives me hours of pleasure? A tiny ereader! I - like, it seems, quite a lot of people - bought an XTEINK X4 ereader. I bought an X4 because I love reading, and I was drawn to the idea of having a tiny ereader in my pocket. Instead of reaching for my phone, I hope that I will instead reach for the ereader, and enjoy some more reading. I am in the very privileged position on having the X4 as an extra / secondary ereader, which perhaps colours my view of the device, in the sense of being willing to put up with more of its quirks than if it were my only ereader. (Since someone asked me about it, perhaps because of some of the marketing photos: this is a standalone ereader. Yes, one needs to transfer books to it (see below), but it is not tied to a phone / does not require a phone to function. One can attach it, magnetically, to the back of a phone, for reasons which are not entirely obvious to me.) I had no plans to use the stock firmware, and used it only so far as to change the language to English before flashing the Free software alternative firmware, CrossPoint . (There are other firmwares for the device; I chose CrossPoint.) I did, however, note that the stock firmware does not require a user account / registration or anything like that, which I appreciated. I flashed CrossPoint using the tool at [https://xteink.dve.al]. When I tried to backup the existing firmware, I got an error of I ran , to give my user the right permissions. With that done, I could dump the existing flash (which did indeed take about 25 minutes). I had the same error when flashing the CrossPoint firmware, so I ran again, and it worked again. Once I had reset the device - hold the small button at the bottom on the right edge of the X4 for a second, then press-and-hold-for-a-few-seconds the power button at the top on the right edge of the X4 - it booted into CrossPoint very quickly. The device comes with a screen protector. This is an excellent idea. It would have been even better if this has been installed in the factory, but never mind. I bought a cheap (£4) clear plastic shell, to protect the back of it. It add a bit of bulk to the device, but I’d like to protect it. I replaced the included 16GB (the manual says that it comes with a 32GB card…) XTEINK-branded microSD card as soon as I received the device, with a 128GB SanDisk card. This was mostly down to force of habit, as it would not be a particular problem for me if the microSD card in the device died. Annoying, for sure, but I could just pop in a new card and reload all my books from Calibre. The card slot is recessed, so pressing it to remove it, and to get it back in place, was quite tricky with short fingernails. This, it turns out, is a bit of a pain. I use Calibre for managing my ebook library. For my other ereaders, I load books via a cable. Somewhat annoyingly, the X4 and its microSD card do not mount as a USB-writable device. The options are Wi-Fi-based, or else remove the microSD card. I have gone with the microSD card approach, despite it being a bit of a pain. In Calibre, I used the “Save to disk” / “Save only the EPUB format to disk in a single folder” option. This did - as expected - dump 500+ ebooks into a single directory, which is not ideal on the X4 with CrossPoint, given that they appear as a list, with no way to search. Press-and-hold on the side buttons does jump between full screens though (a bit like Page Up / Page Down), so it is not terrible. Perhaps I need to treat the X4 less like a portable library, and just move onto it a small number of books that I want to have so readily available. CrossPoint seems to struggle with books with a special character (e.g. “$”) in the title; I have yet to dig into this though. I have not tried to connect it to Wi-Fi; I have no need for this. I have not found a way to turn off Wi-Fi, which is a bit annoying, as I don’t need to be on all the time, both in terms of battery life and privacy. The reading experience is… good. Neither terrible nor amazing. What makes it good is that it is pocketable and there when I want it. The 4.3” screen is, apparently, 220 PPI. It is not as crisp/sharp as the screen on my Kobo or Tolino. A backlight would be wonderful, but I knew that it did not have one when I bought it. CrossPoint does not (currently, anyway) support dark mode - light text on a black background. I prefer dark mode when reading, but I can easily live without it on this device. There is a pull request to add dark mode to CrossPoint , but I note: Did you use AI tools to help write this code? YES The X4 can fit a surprisingly large amount of text on the small screen. But, nevertheless, it means pressing the “next page” button a lot. The buttons on the front are bit “clicky”, but fortunately the buttons on the side are much quieter / softer. I imagine that, if I was using the front buttons to turn the page, and I was sitting next to my wife at the time, she would find it very annoying. I would. Note that the two buttons on the front are, in fact, four buttons; each button is a bit like a rocker switch, I guess, with different actions for the left and right sides. I should have worked that out sooner (or read the manual)… I am quite content with the lack of a touch screen; I much prefer pressing a button to turn a page than mimicking a “swipe” action, as I don’t have to move my hand or hold the device awkwardly. It has 128 megabytes of RAM, which both feels like loads, and not much at all, at the same time. Books load more than fast enough, and page turns are rapid. It has a 650mAh battery, and although my initial experience has been fine, I wonder just how long this is going to last with Wi-Fi on the whole time (needlessly). But the X4 charges via USB-C, which is excellent, as it means that I don’t need to carry yet another cable.

0 views
André Arko 3 weeks ago

How to Install a Gem

This post was originally given as a talk at SF Ruby Meetup . The slides are also available. Hello, and welcome to How To Install A Gem . My name is André Arko, and I go by @indirect on all the internet services. You might know me from being 1/3 of the team that shipped Bundler 1.0, or perhaps the 10+ years I spent trying to keep RubyGems.org up and running for everyone to use. More recently, I’ve been working on new projects: , a CLI to install Ruby versions and gems at unprecedented speeds, and gem.coop , a community gem server designed from the ground up so Bundler an can install gems faster and more securely than ever before. So, with that introduction out of the way, let’s get started: do you know how to install a gem? Okay, that’s great! You can come up and give this talk instead of me. I’ll just sit over here while you write the rest of this post. Slightly more seriously, do you know how converts the name that you give it into a URL to download a .gem file? It’s called the “compact index”, and we’ll see how it works very soon. Next, who in the audience knows how to unpack a .gem file? Do you know what format .gem files use, and what’s inside them? We’ll look at gem structure and gemspec files as well. Then, do you know where to put the files from inside the gem? Where do all of these files and directories get put on disk so we can use them later? Does anyone know off the top of their head? Once those files have been unpacked into the correct places, the last thing we need to know is how to require them. How do these unpacked files on disk get found by Ruby, so you can and have that actually work? This exercise was mostly to show that using gems every day actually skips over most of the way they work underneath. So let’s look at what a gem is, and examine how they work. By the end of this talk, you’ll know what’s inside a gem, where how RubyGems figures out what to download, and where and how that download gets installed so you can use it. And if you already everything we just talked about, please feel free to go straight to rv.dev and start sending us pull requests! First, we’re going to look at how the name of a gem becomes a URL for a .gem file. Let’s use as our example. Historically, there have been at least five or six different ways to look up information about a gem based on its name, but today there is one canonical way: the compact index. It’s so simple that you can do it yourself using curl. Just run , and you’ll be able to read the exact output that every tool uses to look up the versions of a gem that exist. Each line in the file describes one version of the gem, so let’s look at one line. We can break down that line with , and tackle each part one at a time. First, . That’s the version of that this line is about. So we now know for sure that exists. Next, a list of dependencies. The gem (version ) declares dependencies on a bunch of other gems: , , , , , , , , , , , , and . Each dependency has a version requirement attached, and for almost every gem it is exactly version , and only version . For , Rails is a little bit more flexible, and allows any version and up. The final section contains a checksum, a ruby requirement, and a rubygems requirement. The checksum is a sha256 hash of the .gem file that contains the gem, so after we download the gem we can check to make sure we have the right file by comparing that checksum. For this version of Rails, the required Ruby version is or greater, and the required RubyGems version is or greater. It’s up to the client to do something with that information, but hopefully you’ll see an error if you are using Ruby or RubyGems that’s too old. Great! So now we know the important information: Rails version is real, and strong, and is our friend. We can download it, and check the checksum against the checksum we were given in the info file line. Let’s do that now: Notice that the checksum produced by exactly matches the checksum we previously saw in our line from the info file: . That lets us know that we got the right file, and there were no network or disk errors. Now that we have the gem, we can investigate: what exactly is inside a gem? At this point, we’re going to pivot from the gem to the gem. There’s a good reason for that, and the reason is… the gem doesn’t actually have any files in it. So it’s a bad example. In order to show off what a gem looks like when it has files in it, we’ll use instead. So, we have our .gem file downloaded with curl. What do we do now? The first piece of secret knowledge that we need: gems are tarballs. That means we can open them up with regular old . Let’s try it. So what’s inside the .gem tarball is… another tarball. And also two gzipped files. Let’s look at the files first. As you might expect from its name, the file is a gzipped YAML file, containing checksums for the other two files. It’s maybe a bit silly to have multiple layers of checksumming here, but it does confirm that the outer layer of tarball and zip was removed without any errors. Okay, so what’s inside ? The answer is… Ruby, sort of. It’s a YAML-serialized instance of the class. We can see exactly what was put into this object at the time the gem was built. After snipping out the YAML that lists the dependencies (which we already looked at, because they are included in the info file), what’s left is some relatively simple information about the gem. Author, author’s email, description, homepage, license, various URLs. For the purposes of installing and using the gem, we care about exactly six pieces of information: , , , , , and . We’re going to combine those items with the files in the remaining file to get our unpacked and installed gem. Now that we know what’s in the gem specification, let’s look at what’s inside the data tarball. It matches up very closely with the long list of entries in the array in the gemspec. So now we have a bunch of files. Where are we going to put these files? Enter: the magic of RubyGems. The scheme that RubyGems has come up with is largely shaped by the constraints of how Ruby finds files to require, which we’re going to look at soon. For now, it is enough for us to know that RubyGems keeps track of a list of directories, a lot like the way works for your shell to find commands to run. To find the current directory, you can run . Here’s what that looks like: From this list, we can see that RubyGems organizes its own files into a few directories. To install a gem, we’re going to need to put the files we have into each of those directories, with specific paths and filenames. Just to recap, the files we need to place somewhere are: So let’s move the files into the directories we see RubyGems offers. First, cache the .gem file so RubyGems doesn’t need to download it again later: Then, add the gem specification so that RubyGems will be able to find it. There’s a small twist here, which is that the directory doesn’t contain YAML files, it contains Ruby files. So we also need to convert the YAML file back into a Ruby object, and then write out the Ruby code to create that object into a file that RubyGems can load later. Next, we need to put the files that make up the contents of the gem into the directory. One more thing we need to do: set up the executables provided by the gem. You can check out the files that RubyGems generates by looking in , but for our purposes we just need to tell RubyGems what gem and executable it needs to run, so we can do that: And with that, we’ve installed the gem! You can run the file that we just created to prove it: As we wrap up here, there are three aspects of gems that we haven’t touched on at all: docs, extensions, and plugins. We don’t have time to talk about them today in this meetup talk slot. Hopefully a future (longer) version of this talk will have space to include all of those things, because they are all super interesting, I promise. In the meantime, I will have to direct you to the docs for RDoc to learn more about docs, to the source code of or RubyGems itself if you want to learn more about gem extensions and plugins. There’s one last thing to figure out before we wrap up: how does find a gem for us to be able to use it? To explain that, we’ll have to drop down to some basic Ruby, and then look at the ways that RubyGems monkeypatches Ruby’s basic to make it possible to have gems with versions. The first thing to know about is that it works exactly like does in your shell. There’s a global Ruby variable named , and it’s an array of paths on disk. When you try to require something, Ruby goes and looks inside each of those paths to see if the thing you asked for is there. You can test this out for yourself in just a few seconds! Let’s try it. The Ruby CLI flag lets you add directories to the variable, and then the function looks inside that directory to find a file with the name that you gave to require. No magic, just a list to check against for files on disk. Now that you understand how the variable makes work, how does RubyGems work? You can’t just put ten different versions of into the and expect to still work. RubyGems handles multiple versions of the same file by monkeypatching . Let’s look at what happens when we , which is a file located inside the gem that we just installed. RubyGems starts by looking at all of the gem specifications, including the one we saved earlier. In each specification, it combines the name and version with the values in to come up with a path on disk. So for our just-installed gem, that would mean a path of: . RubyGems knows that directory contains a file named , so it is a candidate to be “activated”, which is what RubyGems calls it when a gem is added to your . As long as internal bookkeeping shows that no other versions of have already been added to the , we’re good! RubyGems adds this specific directory to the , and delegates to the original implementation of . Require finds the file at , reads it, and evaluates it. With that, we’ve done it! We have found, downloaded, unpacked, and installed a gem so that Ruby is able to run a command and load ruby files, without ever touching the command. If you’re interested in contributing to an open source project that works a lot with gems, we would love to work with you on , where we are working to create the fastest Ruby and gem manager in the world. And of course, if your company could use faster, easier, or more secure gems for developers, for CI, and for production deployments, we can help. We’d love to talk to you and you can find our contact information at spinel.coop . railties-8.1.3.gem (the .gem file itself) metadata.gz (the YAML Gem::Specification object from inside the gem) the unpacked data.tar.gz files (the contents of the gem)

0 views

The AI Industry Is Lying To You

Hi! If you like this piece and want to support my independent reporting and analysis, why not subscribe to my premium newsletter? It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5000 to 18,000 words, including vast, detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large . I just put out a massive Hater’s Guide To The SaaSpocalypse , as well as the Hater’s Guide to Adobe . It helps support free newsletters like these! The entire AI bubble is built on a vague sense of inevitability — that if everybody just believes hard enough that none of this can ever, ever go wrong that at some point all of the very obvious problems will just go away. Sadly, one cannot beat physics. Last week, economist Paul Kedrosky put out an excellent piece centered around a chart that showed new data center capacity additions (as in additions to the pipeline, not brought online ) halved in the fourth quarter of 2025 (per data from Wood Mackenzie ):   Wood Mackenzie’s report framed it in harsh terms: As I said above, this refers only to capacity that’s been announced rather than stuff that’s actually been brought online , and Kedrosky missed arguably the craziest chart — that of the 241GW of disclosed data center capacity, only 33% of it is actually under active development: The report also adds that the majority of committed power (58%) is for “wires-only utilities,” which means the utility provider is only responsible for getting power to the facility, not generating the power itself, which is a big problem when you’re building entire campuses made up of power-hungry AI servers.  WoodMac also adds that PJM, one of the largest utility providers in America, “...remains in trouble, with utility large load commitments three times as large as the accredited capacity in PJM’s risked generation queue,” which is a complex way of saying “it doesn’t have enough power.”  This means that fifty eight god damn percent of data centers need to work out their own power somehow. WoodMac also adds there is around $948 billion in capex being spent in totality on US-based data centers, but capex growth decelerated for the first time since 2023 . Kedrosky adds: Let’s simplify: The term you’re looking for there is data center absorption, which is (to quote Data Center Dynamics) “...the net growth in occupied, revenue-producing IT load,” which grew in America’s primary markets from 1.8GW in new capacity in 2024 to 2.5GW of new capacity in 2025 according to CBRE .   The problem is, this number doesn’t actually express newly-turned-on data centers. Somebody expanding a project to take on another 50MW still counts as “new absorption.”  Things get more confusing when you add in other reports. Avison Young’s reports about data center absorption found 700MW of new capacity in Q1 2025 , 1.173GW in Q2 , a little over 1.5GW in Q3 and 2.033GW in Q4 (I cannot find its Q3 report anywhere), for a total of 5.44GW, entirely in “colocation,” meaning buildings built to be leased to others. Yet there’s another problem with that methodology: these are facilities that have been “delivered” or have a “committed tenant.” “Delivered” could mean “the facility has been turned over to the client, but it’s literally a powered shell (a warehouse) waiting for installation,” or it could mean “the client is up and running.” A “committed tenant” could mean anything from “we’ve signed a contract and we’re raising funds” (such as is the case with Nebius raising money off of a Meta contract to build data centers at some point in the future ). We can get a little closer by using the definitions from DataCenterHawk (from whichAvison Young gets its data), which defines absorption as follows :  That’s great! Except Avison Young has chosen to define absorption in an entirely different way — that a data center (in whatever state of construction it’s in) has been leased, or “delivered,” which means “a fully ready-to-go data center” or “an empty warehouse with power in it.”  CBRE, on the other hand, defines absorption as “net growth in occupied, revenue-producing IT load,” and is inclusive of hyperscaler data centers. Its report also includes smaller markets like Charlotte, Seattle and Minneapolis, adding a further 216MW in absorption of actual new, existing, revenue-generating capacity. So that’s about 2.716GW of actual, new data centers brought online. It doesn’t include areas like Southern Virginia or Columbus, Ohio — two massive hotspots from Avison Young’s report — and I cannot find a single bit of actual evidence of significant revenue-generating, turned-on, real data center capacity being stood up at scale. DataCenterMap shows 134 data centers in Columbus , but as of August 2025, the Columbus area had around 506MW in total according to the Columbus Dispatch, though Cushman and Wakefield claimed in February 2026 that it had 1.8GW . Things get even more confusing when you read that Cushman and Wakefield estimates that around 4GW of new colocation supply was “delivered” in 2025, a term it does not define in its actual report, and for whatever reason lacks absorption numbers. Its H1 2025 report , however, includes absorption numbers that add up to around 1.95GW of capacity…without defining absorption, leaving us in exactly the same problem we have with Avison Young.  Nevertheless, based on these data points, I’m comfortable estimating that North American data center absorption — as the IT load of data centers actually turned on and in operation — was at around 3GW for 2025 , which would work out to about 3.9GW of total power. And that number is a fucking disaster. Earlier in the year, TD Cowen’s Jerome Darling told me that GPUs and their associated hardware cost about $30 million a megawatt. 3GW of IT load (as in the GPUs and their associated gear’s power draw) works out to around $90 billion of NVIDIA GPUs and the associated hardware, which would be covered under NVIDIA’s “data center” revenue segment: America makes up about 69.2% of NVIDIA’s revenue, or around $149.6 billion in FY2026 (which runs, annoyingly, from February 2025 to January 2026). NVIDIA’s overall data center segment revenue was $195.7 billion, which puts America’s data center purchases at around $135 billion, leaving around $44 billion of GPUs and associated technology uninstalled. With the acceleration of NVIDIA’s GPU sales, it now takes about 6 months to install and operationalize a single quarter’s worth of sales. Because these are Blackwell (and I imagine some of the new next generation Vera Rubin) GPUs, they are more than likely going to new builds thanks to their greater power and cooling requirements, and while some could in theory be going to old builds retrofitted to fit them, NVIDIA’s increasingly-centralized (as in focused on a few very large customers) revenue heavily suggests the presence of large resellers like Dell or Supermicro (which I’ll get to in a bit) or the Taiwanese ODMs like Foxconn and Quanta who manufacture massive amounts of servers for hyperscaler buildouts.  I should also add that it’s commonplace for hyperscalers to buy the GPUs for their colocation partners to install, which is why Nebius and Nscale and other partners never raise more than a few billion dollars to cover construction costs.  It’s becoming very obvious that data center construction is dramatically slower than NVIDIA’s GPU sales, which continue to accelerate dramatically every single quarter. Even if you think AI is the biggest most hugest and most special boy: what’s the fucking point of buying these things two to four years in advance? Jensen Huang is announcing a new GPU every year!  By the time they actually get all the Blackwells in Vera Rubin will be two years old! And by the time we install those Vera Rubins, some other new GPU will be beating it!  Before we go any further, I want to be clear how difficult it is to answer the question “how long does a data center take to build?”. You can’t really say “[time] per megawatt” because things become ever-more complicated with every 100MW or so. As I’ll get into, it’s taken Stargate Abilene two years to hit 200MW of power . Not IT load. Power .  Anyway, the question of “how much data center capacity came online?” is pretty annoying too.  Sightline ’s research — which estimated that “almost 6GW of [global data center power] capacity came online last year” — found that while 16GW of capacity was slated to come online in 2026 across 140 projects, only 5GW is currently under construction, and somehow doesn’t say that “maybe everybody is lying about timelines.” Sightline believes that half of 2026’s supposed data center pipeline may never materialize, with 11GW of capacity in the “announced” stage with “...no visible construction progress despite typical build timelines of 12-18 months.” “Under construction” also can mean anything from “ a single steel beam ” to “nearly finished.” These numbers also are based on 5GW of capacity , meaning about 3.84GW of IT load, or about $111.5 billion in GPUs and associated gear, or roughly 57.5% of NVIDIA’s FY2026 revenue that’s actually getting built. Sightline (and basically everyone else) argues that there’s a power bottleneck holding back data center development, and Camus explains that the biggest problem is a lack of transmission capacity (the amount of power that can be moved) and power generation (creating the power itself):  Camus adds that America also isn’t really prepared to add this much power at once: Nevertheless, I also think there’s another more-obvious reason: it takes way longer to build a data center than anybody is letting on, as evidenced by the fact that we only added 3GW or so of actual capacity in America in 2025. NVIDIA is selling GPUs years into the future, and its ability to grow, or even just maintain its current revenues, depends wholly on its ability to convince people that this is somehow rational. Let me give you an example. OpenAI and Oracle’s Stargate Abilene data center project was first announced in July 2024 as a 200MW data center . In October 2024, the joint venture between Crusoe, Blue Owl and Primary Digital Infrastructure raised $3.4 billion , with the 200MW of capacity due to be delivered “in 2025.” A mid-2025 presentation from land developer Lancium said it would have “1.2GW online by YE2025.” In a May 2025 announcement , Crusoe, Blue Owl, and Primary Digital Infrastructure announced the creation of a $15 billion joint vehicle, and said that Abilene would now be 8 buildings, with the first two buildings being energized by the “first half of 2025,” and that the rest would be “energized by mid-2026.” Each building would have 50,000 GPUs, and the total IT load is meant to be 880MW or so, with a total power draw of 1.2GW.  I’m not interested in discussing OpenAI not taking the supposedly-planned extensions to Abilene because it never existed and was never going to happen .  In December 2025, Oracle stated that it had “delivered” 96,000 GPUs , and in February, Oracle was still only referring to two buildings , likely because that’s all that’s been finished. My sources in Abilene tell me that Building Three is nearly done, but…this thing is meant to be turned on in mid-2026. Developer Mortensen claims the entire project will be completed by October 2026 , which it obviously, blatantly won’t. I hate to speak in conspiratorial terms, but this feels like a blatant coverup with the active participation of the press. CNBC reported in September 2025 that “ the first data center in $500 billion Stargate project is open in Texas ,” referring to a data center with an eighth of its IT load operational as “online” and “up and running,” with Crusoe adding two weeks later that it was “live,” “up and running” and “continuing to progress rapidly,” all so that readers and viewers would think “wow, Stargate Abilene is up and running” despite it being months if not years behind schedule. At its current rate of construction, Stargate Abilene will be fully built sometime in late 2027. Oracle’s Port Washington Data Center, as of March 6 2026, consisted of a single steel beam . Stargate Shackelford Texas broke ground on December 15 2025 , and as of December 2025, construction barely appears to have begun in Stargate New Mexico . Meta’s 1GW data center campus in Indiana only started construction in February 2026 .  And, despite Microsoft trying to mislead everybody that its Wisconsin data center had ‘arrived” and “been built,” looking even an inch deeper suggests very little has actually come online” — and, considering the first data center was $3.3 billion ( remember: $14 million a megawatt just for construction), I imagine Microsoft has successfully brought online about 235MW of power for Fairwater. What Microsoft wants you to think is it brought online gigawatts of power (always referred to in the future tense), because Microsoft, like everybody else, is building data centers at a glacial pace, because construction takes forever, even if you have the power, which nobody does! The concept of a hundred-megawatt data center is barely a few years old, and I cannot actually find a built, in-service gigawatt data center of any kind, just vague promises about theoretical Stargate campuses built for OpenAI, a company that cannot afford to pay its bills.  Everybody keeps yammering on about “what if data centers don’t have power” when they should be thinking about whether data centers are actually getting built. Microsoft proudly boasted in September 2025 about its intent to build “the UK’s largest supercomputer” in Loughton, England with Nscale, and as of March 2026, it’s literally a scaffolding yard full of pylons and scrap metal . Stargate Abilene has been stuck at two buildings for upwards of six months.  Here’s what’s actually happening: data center deals are being funded by eager private credit gargoyles that don’t know shit about fuck. These deals are announced, usually by overly-eager reporters that don’t bother to check whether the previous data centers ever got built, as massive “multi-gigawatt deals,” and then nobody follows up to check whether anything actually happened.  All that anybody needs to fund one of these projects is an eager-enough financier and a connection to NVIDIA. All Nebius had to do to raise $3.75 billion in debt was to sign a deal with Meta for data center capacity that doesn’t exist and will likely take three to four years to build (it’s never happening). Nebius has yet to finish its Vineland, New Jersey data center for Microsoft , which was meant to be “ at 100MW ” by the end of 2025, but appears to have only had 50MW (the first phase) available as of February 2026 .  I’m just gonna come out and say it: I think a lot of these data center deals are trash, will never get built, and thus will never get paid. The tech industry has taken advantage of an understandable lack of knowledge about construction or power timelines in the media to pump out endless stories about “data center capacity in progress” as a means of obfuscating an ever-growing scandal: that hundreds of billions of NVIDIA GPUs got sold to go in projects that may never be built. These things aren’t getting built, or if they’re getting built, it’s taking way, way longer than expected, which means that interest on that debt is piling up. The longer it takes, the less rational it becomes to buy further NVIDIA GPUs — after all, if data centers are taking anywhere from 18 months to three years to build, why would you be buying more of them? Where are you going to put them, Jensen? This also seriously brings into question the appetite that private credit and other financiers have for funding these projects, because much of the economic potential comes from the idea that these projects get built and have stable tenants. Furthermore, if the supply of AI compute is a bottleneck, this suggests that when (or if) that bottleneck is ever cleared, there will suddenly be a massive supply glut, lowering the overall value of the data centers in progress…which are, by the way, all filled with Blackwell GPUs, which will be two or three-years-old by the time the data centers are finally turned on. That’s before you get to the fact that the ruinous debt behind AI data centers makes them all remarkably unprofitable , or that their customers are AI startups that lose hundreds of millions or billions of dollars a year , or that NVIDIA is the largest company on the stock market, and said valuation is a result of a data center construction boom that appears to be decelerating and even if it wasn’t operating at a glacial pace compared to NVIDIA’s sales . Not to sound unprofessional or nothing, but what the fuck is going on? We have 241GW of “planned” capacity in America, of which only 79.5GW of which is “under active development,” but when you dig deeper, only 5GW of capacity is actually under construction?   The entire AI bubble is a god damn mirage. Every single “multi-gigawatt” data center you hear about is a pipedream, little more than a few contracts and some guys with their hands on their hips saying “brother we’re gonna be so fuckin’ rich!” as they siphon money from private credit — and, by extension, you, because where does private credit get its capital from? That’s right. A lot comes from pension funds and insurance companies. Here’s the reality: data centers take forever. Every hyperscaler and neocloud talking about “contracted compute” or “planned capacity” may as well be telling you about their planned dinners with The Grinch and Godot. The insanity of the AI buildout will be seen as one of the largest wastes of capital of all time ( to paraphrase JustDario ), and I anticipate that the majority of the data center deals you’re reading about simply never get built. The fact that there’s so much data about data center construction and so little data about completed construction suggests that those preparing the reports are in on the con. I give credit to CBRE, Sightline and Wood Mackenzie for having the courage to even lightly push back on the narrative, even if they do so by obfuscating terms like “capacity” or “power” in ways that reporters and other analysts are sure to misinterpret. Hundreds of billions of dollars have been sunk into buying GPUs, in some cases years in advance, to put into data centers that are being built at a rate that means that NVIDIA’s 2025 and 2026 revenues will take until 2028 to 2029 to actually operationalize, and that’s making the big assumption that any of it actually gets built. I think it’s also fair to ask where the money is actually going. 2025’s $178.5 billion in US-based data center deals doesn’t appear to be resulting in any immediate (or even future) benefit to anybody involved. I also wonder whether the demand actually exists to make any of this worthwhile, or what people are actually paying for this compute.  If we assume 3GW of IT load capacity was brought online in America, that should (theoretically) mean tens of billions of dollars of revenue thanks to the “insatiable demand for AI” — except nobody appears to be showing massive amounts of revenue from these data centers.  Applied Digital only had $144 million in revenue in FY2025 (and lost $231 million making it). CoreWeave, which claimed to have “ 850MW of active power (or around 653MW of IT load)” at the end of 2025 (up from 420MW in Q1 FY2025 , or 323MW of IT load), made $5.13 billion of revenue (and lost $1.2 billion before tax ) in FY2025 .  Nebius? $228 million, for a loss of $122.9 million on 170MW of active power (or around 130MW of IT load). Iren lost $155.4 million on $184.7 million last quarter , and that’s with a release of deferred tax liabilities of $182.5 million. Equinix made about $9.2 billion in revenue in its last fiscal year , and while it made a profit , it’s unclear how much of that came from its large and already-existent data center portfolio , though it’s likely a lot considering Equinix is boasting about its “multi-megawatt” data center plans with no discussion of its actual capacity . And, of course, Google, Amazon, and Microsoft refuse to break out their AI revenues. Based on my reporting from last year , OpenAI spent about $8.67 billion on Azure through September 2025, and Anthropic around $2.66 billion in the same period on Amazon Web Services . As the two largest consumers of AI compute, this heavily suggests that the actual demand for AI services is pretty weak, and mostly taken up by a few companies (or hyperscalers running their own services.)  At some point reality will set in and spending on NVIDIA GPUs will have to decline. It’s truly insane how much has been invested so many years in the future, and it’s remarkable that nobody else seems this concerned. Simple questions like “where are the GPUs going?” and “how many actual GPUs have been installed?” are left unanswered as article after article gets written about massive, multi-billion dollar compute deals for data centers that won’t be built before, at this rate, 2030.  And I’d argue it’s convenient to blame this solely on power issues, when the reality is clearly based on construction timelines that never made any sense to begin with. If it was just a power issue, more data centers would be near or at the finish line, waiting for power to be turned on. Instead, well-known projects like Stargate Abilene are built at a glacial pace as eager reporters claim that a quarter of the buildings being functional nearly a year after they were meant to be turned on is some sort of achievement. Then there’s the very, very obvious scandal that NVIDIA, the largest company on the stock market, is making hundreds of billions of dollars of revenue on chips that aren’t being installed. It’s fucking strange, and I simply do not understand how it keeps beating and raising expectations every quarter given the fact that the majority of its customers are likely going to be able to use their current purchases in the next decade. Assuming that Vera Rubin actually ships in 2026, it’s reasonable to believe that people will be installing these things well into 2028, if not further, and that’s assuming everything doesn’t collapse by then. Why would you bother? What’s the point, especially if you’re sitting on a pile of Blackwell GPUs?  Why are we doing any of this?  Last week also featured a truly bonkers story about Supermicro, a reseller of GPUs used by CoreWeave and Crusoe, where co-founder Wally Liaw and several other co-conspirators were arrested for selling hundreds of millions of dollars of NVIDIA GPUs to China , with the intent to sell billions more.  Liaw, one of Supermicro’s co-founders, previously resigned in a 2018 accounting scandal where Supermicro couldn’t file its annual reports, only to be (per Hindenburg Research’s excellent report ) rehired in 2021 as a consultant , and restored to the board in 2023, per a filed 8K .  Mere days before his arrest, Liaw was parading around NVIDIA’s GTC conference , pouring unnamed liquids in ice luges and standing two people away from NVIDIA CEO Jensen Huang. Liaw was also seen congratulating the CEO of Lambda on its new CFO appointment on LinkedIn , as well as shaking hands (along with Supermicro CEO Charles Liang, who has not been arrested or indicted) with Crusoe (the company building OpenAI’s Abilene data center) CEO Chase Lochmiller .  Supermicro isn’t named in the indictment for reasons I imagine are perfectly normal and not related to keeping the AI party going . Nevertheless, Liaw and his co-conspirators are accused of shipping hundreds of millions of dollars’ worth of NVIDIA GPUs to China through a web of counterparties and brokers, with over $510 million of them shipped between April and mid-May 2025. While the indictment isn’t specific as to the breakdown, it confirms that some Blackwell GPUs made it to China, and I’d wager quite a few. The mainstream media has already stopped thinking about this story, despite Supermicro being a huge reseller of NVIDIA gear, contributing billions of dollars of revenue, with at least $500 million of that apparently going to China. The fact that Supermicro wasn’t specifically named in the case is enough to erase the entire tale from their minds, along with any wonder about how NVIDIA, and specifically Jensen Huang, didn’t know. This also isn’t even close to the only time this has happened. Late last year, Bloomberg reported on Singapore-based Megaspeed — a (to quote Bloomberg) “once-obscure spinoff of a Chinese gaming enterprise [that] evolved into the single largest Southeast Asian buyer of NVIDIA chips” — and highlighted odd signs that suggest it might be operating as a front for China.  As a neocloud, Megaspeed rents out AI compute capacity like CoreWeave, and while NVIDIA (and Megaspeed) both deny any of their GPUs are going to China, Megaspeed, to quote Bloomberg, has “something of a Chinese corporate twin”: Bloomberg reported that Megaspeed imported goods “worth more than a thousand times its cash balance in 2023,” with two-thirds of its imports being NVIDIA products. The investigation got weirder when Bloomberg tried to track down specific circuit boards that NVIDIA had told the US government were in specific sites: Things get weirder throughout the article, with a Chinese company called “Shanghai Shuoyao” having a near-identical website and investor deck (as mentioned) to Megaspeed, with several of the “computing clusters under construction” actually being in China.  Things get a lot weirder as Bloomberg digs in, including a woman called “Huang” that may or may not be both the CEO of Megaspeed and an associated company called “Shanghai Hexi,” which is also owned by the Yangtze River Delta project… who was also photographed sitting next to Jensen Huang at an event in Taipei in 2024. While all of this is extremely weird and suspicious, I must be clear there is no declarative answer as to what’s going on, other than that NVIDIA GPUs are absolutely making it to China, somehow. I also think that it would be really tough for Jensen Huang to not know about it, or for billions of dollars of GPUs to be somewhere without NVIDIA’s knowledge.  Anyway, Supermicro CEO Charles Liang has yet to comment on Wally Liaw or his alleged co-conspirators, other than a statement from the company that says that their acts were “ a contravention of the Company’s policies and compliance controls .”  Jensen Huang does not appear to have been asked if he knew anything about this — not Megaspeed, not Supermicro, or really any challenging question of any kind for the last few years of his life.  Huang did, however, say back in May 2025 that there was “no evidence of any AI chip diversion,’ and that the countries in question “monitor themselves very carefully.”  For legal reasons I am going to speak very carefully: I cannot say that Jensen is wrong, or lying, but I think it’s incredible, remarkable even, that he had no idea that any of this was going on. Really? Hundreds of millions if not billions of dollars of GPUs are making it to China — as reported by The Information in December 2025 — and Jensen Huang had no idea? I find that highly unlikely, though I obviously can’t say for sure. In the event that NVIDIA had knowledge — which I am not saying it did, of course — this is a huge scandal that, for the most part, nobody has bothered to keep an eye on outside of a few brave souls at The Information and Bloomberg who give a shit about the truth. Has anybody bothered to ask Jensen about this? People talk to him on camera all the time.  I’ll also add that I am shocked that so many people are just shrugging and moving on from Supermicro, which is a major supplier of two of the major neoclouds (Crusoe and CoreWeave) and one of the minors (Lambda, which they also rents cloud capacity to). The idea that a company had no idea that several percentage points of its revenue were flowing directly to China via one of its co-founders is an utter joke. I hope we eventually find out the truth. Nevertheless, this kind of underhanded bullshit is a sign of desperation on the part of just about everybody involved. So, I want to explain something very clearly for you, because it’s important you understand how fucked up shit has become: hyperscalers are forcing everybody in their companies to use AI tools as much as possible, tying compensation and performance use to token burn, and actively encouraging non-technical people to vibe-code features that actually reach production.  In practice, this means that everybody is being expected to dick around with AI tools all day, with the expectation that you burn massive amounts of tokens and, in the case of designers working in some companies, actively code features without ever knowing a line of code.  “How do I know the last part? Because a trusted source told me — and I’ll leave it at that” One might be forgiven for thinking this means that AI has taken a leap in efficacy, but the actual outcomes are a labyrinth of half-functional internal dashboards that measure random user data or convert files, spending hours to save minutes of time at some theoretical point. While non-technical workers aren’t necessarily allowed to ship directly to production, their horrifying pseudo-software, coded without any real understanding of anything, is expected to be “fixed” by actual software engineers who are also expected to do their jobs. These tools also allow near-incompetent Business Idiot software engineers to do far more damage than they might have in the past. LLM use is relatively-unrestrained (and actively incentivized) in at least one hyperscaler, with just about anybody allowed to spin up their own OpenClaw “AI agent” (read: series of LLMs that allegedly can do stuff with your inbox or Slack for no clear benefit, other than their ability to delete all of your emails ). In Meta’s case , this ended up causing a severe security breach: According to The Information, Meta systems storing large amounts of company and user-related data were accessible to engineers who didn’t have permission to see them, and was marked a sec-1 incident, the second highest level of severity on an internal scale that Meta uses to rank security incidents.  The incident follows multiple problems caused at Amazon by its Kiro and Q LLMs. I quote Business Insider ’s Eugene Kim:  Despite the furious (and exhausting) marketing campaign around “the power of AI code,” I believe that these events are just the beginning of the true consequences of AI coding tools: the slow destruction of the tech industry’s software stack.  LLMs allow even the most incompetent dullard to do an impression of a software engineer, by which I mean you can tell it “make me software that does this” or “look at this code and fix it” and said LLM will spend the entire time saying “you got this” and “that’s a great solution.”  The problem is that while LLMs can write “all” code, that doesn’t mean the code is good, or that somebody can read the code and understand its intention (as these models do not think), or that having a lot of code is a good thing both in the present and in the future of any company built using generative code.  LLM-based code is often verbose, and rarely aligns with in-house coding guidelines and standards, guaranteeing that it’ll take far longer to chew through, which naturally means that those burdened with reviewing it will either skim-read it or feed it into another LLM to work out what the hell to do. Worse still, LLM use is also entirely directionless. Why is anybody at Meta using an OpenClaw? What is the actual thing that OpenClaw does, other than burn an absolute fuck-ton of tokens?  Think about this very, very simply for a second: you have given every engineer in the company the explicit remit to write all their code using LLMs, and incentivized them to do so by making sure their LLM use is tracked. You have now massively increased both the operating costs of the company (through token burn costs) and the volume of code being created.  To be explicit, allowing an LLM to write all of your code means that you are no longer developing code, nor are you learning how to develop code, nor are you going to become a better software engineer as a result. This means that, across almost every major tech company, software engineers are being incentivized to stop learning how to write software or solve software architecture issues .   If you are just a person looking at code, you are only as good as the code the model makes, and as Mo Bitar recently discussed, these models are built to galvanize you, glaze you, and tell you that you’re remarkable as you barely glance at globs of overwritten code that, even if it functions, eventually grows to a whole built with no intention or purpose other than what the model generated from your prompt.  Things only get worse when you add in the fact that hyperscalers like Meta and Amazon love to lay off thousands of people at a time, which makes it even harder to work out why something was built in the way it was built, which is even harder when an LLM that lacks any thoughts or intentions builds it. Entire chunks of multi-trillion dollar market cap companies are being written with these things, prompted by engineers (and non-engineers!) who may or may not be at the company in a month or a year to explain what prompts they used.  We’re already seeing the consequences! Amazon lost hundreds of thousands of orders! Meta had a major security breach! The foundations of these companies are being rotted away through millions of lines of slop-code that, at best, occasionally gets the nod from somebody who has “software engineer” on their resume, and these people keep being fired too, raising the likelihood that somebody who knows what’s going on or why something is built a certain way will be able to stop something bad from happening.  Remember: Google, Amazon, Microsoft, and Meta all hold vast troves of personal information, intimate conversations, serious legal documents, financial information, in some cases even social security numbers, and all four of them along with a worrying chunk of the tech industry are actively encouraging their software engineers to stop giving a fuck about software.   Oh, you’re so much faster with AI code? What does that actually mean? What have you built? Do you understand how it works? Did you look at the code before it shipped, or did you assume that it was fine because it didn’t break?  This is creating a kind of biblical plague within software engineering — an entire tech industry built on reams of unmanageable and unintentional code pushed by executives and managers that don’t do any real work. LLMs allow the incompetent to feign competence and the unproductive to produce work-adjacent materials borne of a loathing for labor and craftsmanship, and lean into the worst habits of the dullards that rule Silicon Valley. All the Valley knows is growth , and “more” is regularly conflated with “valuable.” The New York Times’ Kevin Roose — in a shocking attempt at journalism — recently wrote a piece celebrating the competition within Silicon Valley to burn more and more tokens using AI models : Roose explains that both Meta and OpenAI have internal leaderboards that show how many tokens you’ve used, with one software engineer in Stockholm spending “more than his salary in tokens,” though Roose adds that his company pays for them. Roose describes a truly sick culture, one where OpenAI gives awards to those who spend a lot of money on their tokens , adding that he spoke with several tech workers who were spending thousands of dollars a day on tokens “for what amount to bragging rights.” Roose also added one more insane detail: that one person found a loophole in Claude’s $20-a-month using a piece of software made by Figma that allowed them to burn $70,000 in tokens . Despite all of this burn, Roose struggled to find anybody who was able to explain what they were doing beyond “maintaining large, complex pieces of software using coding agents running in parallel,” but managed to actually find one particularly useful bit of information — that all of this might be performative: I do give Roose one point for wondering if “...any of these tokenmaxxers [were] producing anything good, or whether they [were] merely spinning their wheels churning out useless code in an attempt to look busy.” Good job Kevin.  That being said, I find this story horrifying, and veering dangerously close to the actions of drug addicts and cult followers. Throughout this story in one of the world’s largest newspapers, Roose fails to find a single “tokenmaxxer” making something that they can actually describe, which has largely been my experience of evaluating anyone who talks nonstop about the power of “agentic coding.”  These people are sick, and are participating in a vile, poisonous culture based on needless expenses and endless consumption.  Companies incentivizing the amount of tokens you burn are actively creating a culture that trades excess for productivity, and incentivizing destructive tendencies built around constantly having to find stuff to do rather than do things with intention.  They are guaranteeing that their software will be poorly-written and maintained, all in the pursuit of “doing more AI” for no reason other than that everybody else appears to be doing so. Anybody who actually works knows that the most productive-seeming people are often also the most-useless, as they’re doing things to seem productive rather than producing anything of note. A great example of this is a recent Business Insider interview with a person who got laid off from Amazon after learning “AI” and “vibe coding,” and how surprised they were that these supposed skills didn’t make them safer from layoffs: To be clear, this person is a victim . They were pressured by Amazon to take up useless skills and build useless things in an expensive and inefficient way, and ended up losing their job despite taking up tools they didn’t like under duress.  This person was, at one point, actively part of building an internal Amazon site using AI, and had to “learn to vibe code with a lot of trial and error” and the help of a colleague. Was this a good use of her time? Was this a good use of her colleague’s time? No! In fact, across all of these goddamn AI coding hype-beast Twitter accounts and endless proclamations about the incredible power of AI agents, I can find very few accounts of something happening other than someone saying “yeah I’m more productive I guess.”  I am certain that at some point in the near future a major big tech service is going to break in a way that isn’t immediately fixable as a result of thousands of people building software with AI coding tools, a problem compounded by the dual brain drain forces of layoffs and a culture that actively empowers people to look busy rather than actually produce useful things. What else would you expect? You’re giving people a number that they can increase to seem better at their job, what do you think they’re going to do, try and be efficient? Or use these things as much as humanly possible, even if there really isn’t a reason to? I haven’t even gotten to how expensive all of this must be, in part because it’s hard to fully comprehend.  But what I do know is that big tech is setting itself up for crisis after crisis, especially when Anthropic and OpenAI stop subsidizing their models to the tune of allowing people to spend $2500 or more on a $200-a-month subscription .  What happens to the people who are dependent on these models? What happens to the people who forgot how to do their jobs because they decided to let AI write all of their code? Will they even be able to do their jobs anymore?   Large Language Models are creating Silicon Valley Habsburgs — workers that are intellectually trapped at whatever point they started leaning on these models that were subsidized to the point that their bosses encouraged them to use them as much as humanly possible. While they might be able to claw their way back into the workforce, a software engineer that’s only really used LLMs for anything longer than a few months will have to relearn the basic habits of their job, and find that their skills were limited to whatever the last training run for whatever model they last used was.  I’m sure there are software engineers using these models ethically, who read all the code, who have complete industry over it and use it as a means of handling very specific units of work that they have complete industry over. I’m also sure that there are some that are just asking it to do stuff, glancing at the code and shipping it. It’s impossible to measure how many of each camp there are, but hearing Spotify’s CEO say that its top developers are basically not writing code anymore makes me deeply worried, because this shit isn’t replacing software engineering at all — it’s mindlessly removing friction and putting the burden of “good” or “right” on a user that it’s intentionally gassing up. Ultimately, this entire era is a test of a person’s ability to understand and appreciate friction.  Friction can be a very good thing. When I don’t understand something, I make an effort to do so, and the moment it clicks is magical. In the last three years I’ve had to teach myself a great deal about finance, accountancy, and the greater technology industry, and there have been so many moments where I’ve walked away from the page frustrated, stewed in self-doubt that I’d never understand something. I also have the luxury of time, and sadly, many software engineers face increasingly-deranged deadlines set by bosses that don’t understand a single fucking thing, let alone what LLMs are capable of or what responsible software engineering is. The push from above to use these models because they can “write code faster than a human” is a disastrous conflation of “fast” and “good,” all because of flimsy myths peddled by venture capitalists and the media about “LLMs being able to write all code.” Generative code is a digital ecological disaster, one that will take years to repair thanks to company remits to write as much code as fast as possible.  Every single person responsible must be held accountable, especially for the calamities to come as lazily-managed software companies see the consequences of building their software on sand.  In the end, everything about AI is built on lies.  Hundreds of gigawatts of data centers in development equate to 5GW of actual data centers in construction.  Hundreds of billions of dollars of GPU sales are mostly sitting waiting for somewhere to go. Anthropic’s constant flow of “annualized” revenues ended up equating to literally $5 billion in revenue in four years , on $25 billion or more in salaries and compute. Despite all of those data centers supposedly being built, nobody appears to be making a profit on renting out AI compute. AI’s supposed ability to “write all code” really means that every major software company is filling their codebases with slop while massively increasing their operating expenses. Software engineers aren’t being replaced — they’re being laid off because the software that’s meant to replace them is too expensive, while in practice not replacing anybody at all. Looking even an inch beneath the surface of this industry makes it blatantly obvious that we’re witnessing one of the greatest corporate failures in history. The smug, condescending army of AI boosters exists to make you look away from the harsh truth — AI makes very little revenue, lacks tangible productivity benefits, and seems to, at scale, actively harm the productivity and efficacy of the workers that are being forced to use it. Every executive forcing their workers to use AI is a ghoul and a dullard, one that doesn’t understand what actual work looks like, likely because they’re a lazy, self-involved prick.  Every person I talk to at a big tech firm is depressed, nagged endlessly to “get on board with AI,” to ship more, to do more, all without any real definition of what “more” means or what it contributes to the greater whole, all while constantly worrying about being laid off thanks to the truly noxious cultures that are growing around these services. AI is actively poisonous to the future of the tech industry. It’s expensive, unproductive, actively damaging to the learning and efficacy of its users, depriving them of the opportunities to learn and grow, stunting them to the point that they know less and do less because all they do is prompt. Those that celebrate it are ignorant or craven, captured or crooked, or desperate to be the person to herald the next era, even if that era sucks, even if that era is inherently illogical, even if that era is fucking impossible when you think about it for more than two seconds. And in the end, AI is a test of your introspection. Can you tell when you truly understand something? Can you tell why you believe in something, other than that somebody told you you should, or made you feel bad for believing otherwise? Do you actually want to know stuff, or just have the ability to call up information when necessary?  How much joy do you get out of becoming a better person?If you can’t answer that question with certainty, maybe you should just use an LLM, as you don’t really give a shit about anything. And in the end, you’re exactly the mark built for an AI industry that can’t sell itself without spinning lies about what it can (or theoretically could) do.  Only 33% of announced US data centers are actually being built, with the rest in vague levels of “planning.” That’s about 79.53GW of power, or 61GW of IT load. “Active development” also refers to anything that is (and I quote) “...under development or construction,” meaning “we’ve got the land and we’re still working out what to do with it. This is pretty obvious when you do the maths. 61GW of IT load would be hundreds of thousands of NVIDIA GB200 NVL72 racks — over a trillion dollars of GPUs at $3 million per 72-GPU rack — and based on the fact there were only $178.5 billion in data center debt deals last year , I don’t think many of these are actually being built right now. Even if they were, there’s not enough power for them to turn on. NVIDIA claims it will sell $1 trillion of GPUs between 2025 and 2027 , and as I calculated previously , it sells about 1.6GW (in IT load terms, as in how much power just the GPUs draw) of GPUs every quarter, which would require at least 1.95GW of power just to run, when you include all the associated gear and the challenges of physically getting power. None of this data talks about data centers actually coming online.

0 views
Stratechery 4 weeks ago

An Interview with Nvidia CEO Jensen Huang About Accelerated Computing

Listen to this post: Good morning, This week’s Stratechery Interview is running early this week, as I had the chance to speak in person with Nvidia CEO Jensen Huang at the conclusion of his GTC 2026 keynote , which took place yesterday in San Jose . I have spoken to Huang four times previously, in May 2025 , March 2023 , September 2022 , and March 2022 . In this interview we talk about a keynote that came across like a bit of a history lesson, and what that says about a company that still feels small even as it’s the most valuable in the world, as well as what has changed in AI over the last year. Then we discuss a number of announcements that might feel like a change in approach (although Huang disagrees), including Nvidia’s burgeoning CPU business and the Groq acquisition. Finally we discuss scarcity in the AI stack and how that affects Nvidia, the China question, and Huang’s frustration with doomers and their influence in Washington. As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for clarity. Jensen Huang, welcome back to Stratechery. JH: It’s great to be with you. You literally just walked off the stage, went a little long, I think, but you spent a lot of this keynote , which I quite enjoyed, explaining what Nvidia is, starting with the history of the programmable shader, the launch of CUDA 20 years ago. We don’t need to spend too much time recounting this, you did a good job, and Stratechery readers are certainly familiar — sorry, this is a bit of a lead up here — Stratechery readers are familiar , and I remember this exactly, someone asked me to explain how is it that Nvidia can announce so many things at a single GTC, this is like six, seven years ago, maybe even longer than that, and I explained the whole thing with CUDA and all the libraries is it’s just sort of doing the same thing again and again , but for specific industries. That’s the story you told today, and it’s kind of a back-to-the-future moment after the last few GTC keynotes have kind of just been pretty AI-centered, CES was pretty AI-centered . Why did you feel the need tell that story now? To recast CUDA and why is it important? JH: Well, because we’re going into a whole lot of new new industries and because AI is going to use tools, and when AI uses tools, those are tools that we created for humans. AI is going to use Excel, AI is going to use Photoshop, AI is going to use logic synthesis tools, Synopsis tools, and Cadence tools. Those tools have to be super-accelerated, they’re going to use databases they have to be super-accelerated because AI’s are fast. And so I think in this era, we need to get all of the world’s software now as fast as possible accelerated, and then put them in front of AI so that AI could agentically use them. So is that a bit where we’ve already done this for a bunch of sectors and now we’re going to do it for a bunch more? JH: Yeah, a whole bunch more. For example, data processing. Well, that was sort of a surprise. I didn’t expect you to be opening with an IBM partnership . JH: Yeah, right, that kind of puts it in perspective. I mean, they really started it all. You wrote last week that AI is a five-layer cake : power, chips, infrastructure, models, and applications. Is there a concern that in the last four or five years, that you are worried about being squeezed into the chips box, so it’s important to both remind people and also yourselves about you being this vertically integrated company — not just in terms of building systems, but into the entire software stack, you’re not just a chip company. JH: I guess my mind doesn’t start with, “What I’m not”, it starts with, “What do we need to be?”. And back then, we realized that accelerated computing was a full stack problem, you have to understand the application to accelerate it. We realized that we had to understand the application, we had to have the developer ecosystem, we needed to have excellent expertise in algorithm development, because the old algorithms that were developed for CPUs don’t work well for GPUs, so we had to rewrite, refactor algorithms so that they could be accelerated by our GPUs. If we do that, though, you get 50 times speed up, 100 times speed up, 10 times speed up, and so it’s totally worth it. I think since the very beginning, we realized, “Ok, what do we want to do, and what does it take to achieve that?”. Now, today we’re building AI factories, we’re building AI infrastructure all over the world. That’s a lot more than building chips, and building chips is obviously important, it’s the foundation of it. Right, that’s like one full stack of doing the networking and doing the storage, and now you’re into CPUs. JH: Now you’ve got to put it all together into these giant systems — a gigawatt factory is probably $50, $60 billion. Out of that $50, $60 billion, probably about, call it $15, $17 or so, is infrastructure: land, power, and shell. The rest of it is compute and networking and storage and things like that, and so that level of investment, unless you’re helping customers achieve the level of confidence that they’re going to succeed in building it, you just have no hope, nobody’s going to risk $50 billion. So I think that that’s the big idea, that we need to help customers not just build chips, but build systems and then after we build systems, not just build systems, but build AI factories. AI Factories has a lot of software inside, it’s not just our software, it’s a ton of software for cooling management and electricals and things like that, and redundancies and a lot of it is over-designed, it’s over-designed because nobody talked to each other. When you have a lot of people who don’t talk to each other, integrate systems, you have to, by definition, over-design your part of it. But if we’re working together as one team, we’ll make sure that we can push the limits and get more throughput out of the power that we have or save money for whatever throughput you want to have. Just to go back to that software bit, you mentioned Excel wasn’t designed to be used by AI. You have things like Claude has this new functionality to use Excel , so when you talk about that, you want to invest in these libraries, is that to enable models like that to do better? Or is that something for Microsoft or for enterprises — you want to use this, you don’t want to be beholden to this sort of other player in the world? JH: Well, SQL’s a good example. SQL’s used by people, and we bang on the SQL systems like anybody else, and it is the ground truth of businesses. Well, it’s not just gonna be people banging on our SQL database now, it’s gonna be a whole bunch of agents banging on it. Right, they’re gonna do it way faster. JH: They’re gonna need to do it way faster. And so the first thing we have to do is accelerate SQL, that’s kind of the simple logic of it. That makes sense. In terms of models, you noted that language models are only one category. “Some of the most transformative work is happening in protein AI, chemical AI, physical simulation, robots, and autonomous systems”, and this is from the piece you wrote last week. You’ve previously made this point while noting in other keynotes, “Everything is a token”, I think, is a phrase that you’ve used before. Do you see transformers as being the key to everything, or do we need new fundamental breakthroughs to enable these applications? JH: We need all kinds of new models. For example, transformers, its ability to do attention scales quadratically, and so how do you have quite long memory? How can you have a conversation that lasts a very long time and not have the KV cache essentially become, over time, garbage? Or have entire racks of solid-state drives that are holding KV cache . JH: And of course, let’s say that you were able to record all of our conversation, when you go back and reference some conversation, which part of the reference is most important? There needs to be some new architecture that thinks about attention properly and be able to process that very quickly. We came up with a hybrid architecture of a transformer with an SSM, and that is what enables Nemotron 3 to be super intelligent and super efficient at the same time, that’s an example. Another example is coming up with models that are geometry aware, meaning a lot of things in life, in nature, are symmetrical. And so when you’re generating these models, you don’t want it to generate what is just statistically plausible, it has to also be physically based, and so it has to come out symmetrical. And so cuEquivariance , for example, allows you to do things like that. So we have all these different technologies that are designed — or, for example, when we’re generating tokens in words, it comes out in chunks at a time, little bits, tokens at a time, when you’re generating motion, you need it to be continuous. And so there’s discrete information that you generate and understand, and there’s continuous information that you want to generate and understand. Transformers is not ideal for both. Right, that makes sense. One more quote from the piece, you write, “In the past year, AI crossed an important threshold. Models became good enough to be useful at scale. Reasoning improved. Hallucinations dropped. Grounding improved dramatically. For the first time, applications built on AI began generating real economic value”. What specifically was that change? Because I think about the timing, I feel like this upcoming year is definitely about agents, I just wrote about it today — but for last year, was that the reasoning? Was that the big breakthrough? JH: Generative, of course, was a big breakthrough, but it hallucinated a lot and so we had to ground it, and the way to ground it is reasoning, reflection, retrieval, search, so we helped it ground. Without reasoning, you couldn’t do any of that, and so reasoning allowed us to ground the generative AI. And once you ground it, then you could use that system to reason through problems and decompose it, and decompose it into things that you could actually do something about, and so the next generation was tool use. Turns out it probably tells you something that search was a service that nobody paid for, and the reason for that is getting information is very important and very useful but it’s not something you pay for. The bar to reach to get somebody to pay you for something has to be higher than just information. “Where’s a good restaurant?” — information is just, I don’t think is worthy enough to get paid for. Some people pay for it, I pay for it. We now know that we’ve now crossed that threshold. Not only is it able to converse with us and generate information for us, it can now, of course, do things for us. Coding is just a perfect example for that. If you think about it for a second, you realize this, coding is not really the same modality as language, you have to teach it empty spaces and indentations and symbols, it’s almost like a new modality and you can’t generate code just one token at a time, you have to reflect on the chunk of code. That chunk of code has to be factored properly and has to be optimal and has to obviously compile, it has to be grounded not on probable truth, it has to be grounded on execution. Right, does it run or not? JH: It has to run or not. And so I think the code, learning that modality was a big deal. Once you’re able to now do — we pay engineers several hundred thousand dollars a year to code, and so now they have a coding assistant. They could think about architecture. Instead of describe programs in code, which is very laborious, they can now describe software in specification, which is much more abstract and allows them to be much more productive. And so they describe specification, architecture, they’re able to use their time to solve and innovate, and so our software engineers 100% use coding agents now. Many of them haven’t generated a line of code in a while, but they’re super productive and super busy. Do you think there is a temptation to over-extrapolate from coding, though, precisely because it’s verifiable? You have this agent idea where they can go — it’s not just that they will generate code, then they can actually verify it, see if it works, if it doesn’t, they can go back and do it again, and this can happen all without humans because there’s a clear, “Does it work or not?”. JH: Well, because you can reflect, you could have, let’s say, design a house. Designing a house or designing a kitchen used to be the work of architects, designers, but now you could have carpenters do that. So now you elevated the capability of a carpenter, now you use an agent for that carpenter to go design a house, design a kitchen, come up with some interesting styles. The agent doesn’t have some tool to execute. However, you could give an example. You say, “these are the styles I’m looking for, I want it to be aesthetic like that”. Because the agent is able to reflect, is able to compare its quality of code, its quality of result against some reference, it could say, “You know what, it didn’t turn out as well as I hope, I’m going to go back at it again”, and so it iterates. It doesn’t have to be fully executable, in fact, the more probabilistic, the more aesthetic, the more subjective, if you will, AI actually does better. Right, well that’s why you almost have two extremes. You have generating images where there’s no right answer and then you have coding where there is a right answer and AI seems to do good on those sides and the question is how much will it collapse into the middle there. JH: We’re fairly certain it could do architecture now, we’re fairly certain it could design kitchens and living rooms. Well, to this point, one of the big things with agents coming online is, you’ve talked a lot about accelerated computing, I think you’ve trash talked as it were, maybe the CPUs to the day, they’re all gonna be removed, like everything’s gonna be accelerated. Suddenly CPUs are hot again. It turns out they’re pretty useful and important to the extent you are selling CPUs now, how’s it feel to be a CPU salesman ? JH: There’s no question that Moore’s law is over. Accelerated computing is not parallel computing. Go back in time — 30 years ago, there were probably 10, 20, 30 parallel computing companies, only one survived, Nvidia and the reason why is because we had the good wisdom of recognizing the goal wasn’t to get rid of the CPU, the goal was to accelerate the application. So what I just falsely accused you of was actually true for everybody else. JH: We were never against CPUs, we don’t want to violate Amdahl’s Law . Accelerated computing, in fact, inside our systems, we choose the best CPUs, we buy the most expensive CPUs, and the reason for that is because that CPU, if not the best and not the most performant, holds back millions of dollars of chips. When it comes to branch prediction , you worried about wasting CPU time, now you’re worried about wasting GPU time. JH: That’s right, you just never can have GPUs be squandered, GPU time be idle. And so we always use the best CPUs to the point where we went and built Grace so that we could have the highest performance single-threaded CPU and move data around a lot faster. And so accelerated computing was never against CPUs, my basis is still true that Amdahl’s Law is over, the idea that you would use general purpose computing and just keep adding transistors, that is so dead, and so I think fundamentally we’re not against CPUs. However, these agents are now able to do tool use, and the tools that they want to use are tools created for humans and they’re basically two types. There’s the stuff that we run in data centers and most of it is SQL, most of it is database related, and the other type is personal computers. We’re now going to have AIs that are able to learn unstructured tool use, the first type of tool use is structured. CLIs are tool use, APIs, they’re all structured tool use, the commands are very explicit, the arguments are explicit, the way you talk to that application is very specific. However, there’s a whole bunch of applications that were never designed to have CLIs and APIs and those tools need AIs to learn multi-modality, unstructured, and it has to go and be able to go surf a website and it has to be able to recognize buttons and pull down menus and just kind of work its way through it like we do. That tool use are going to want to use PCs and we have both sides, we have incredibly great data processing systems, and as you know, Nvidia’s PCs are the most performant in the world. So what makes an agent-focused CPU different from other CPUs? So you’re going to have a rack of just Vera CPUs. JH: Oh, really good, excellent. So the way that CPUs were designed in the last decade, they were all designed for hyperscale cloud and the way that hyperscale cloud monetizes CPUs is by the CPU core. So you want to design CPUs that have as many cores as possible that are rentable, the performance of it is kind of secondary. You’re dealing with web latency by and large. JH: That’s exactly right, exactly. And so the number of CPU instances is what you’re optimizing for. That’s why you see these CPUs with a couple of hundred, 300, 400 cores coming. Well, they’re not performant and for tool use, where you have this GPU waiting for the tool use— And you’re going over NVLink. JH: That’s right, you want the fastest single-threaded computer you can possibly get. So is it just the speed? Or does the CPU itself need to be increasingly parallel so it doesn’t have misses and things like that? Or so it’s like just all the way down the pipeline is very different? JH: Yeah, the most important thing is single-threaded performance and the I/O has to be really great. Because it’s now in the data center, the number of single-threaded instances running is going to be quite high and therefore, it’s going to bang on the I/O system, it’s going to bang on the memory controller really hard. Vera’s bandwidth-per-CPU core, bandwidth-per-CPU, is three times higher than any CPU that’s ever been designed, and so it’s designed so that it has lots and lots of I/O bandwidth and lots and lots of memory bandwidth, so that it never throttles the CPU. If the CPU gets throttled, then we’re holding back a whole bunch of GPUs. Is this Vera rack, is it still, you talked about it being very tightly linked to the GPU rack, but is it still disaggregated so that the GPUs can be serving multiple different Vera cores? Whereas you have a Vera core on a board with- Okay, got it, that makes sense. How does your Intel partnership and the NVLink thing fit into this, if at all? JH: Excellent. Some of the world is happy with Arm, some of the world still needs, particularly, you know, enterprise computing, a whole bunch of stacks that people don’t want to move and so x86 is really important to that. Has the resiliency of x86 code been surprising to you? JH: No. Nvidia’s PC is still x86, all of our workstations are x86. I did want to congratulate you, as you talked about in the keynote today, you are the token king . So in your article, you also talked about that energy is the first principle of AI infrastructure and the constraint on how much intelligence the system can produce. If that’s the case, if it’s the amount of tokens you can produce and you’re constrained by how much energy is in the data center, why do companies even try to compete with the token king? JH: It’s going to be hard because it’s not reasonable to build a chip and somehow achieve results that are fairly dramatic. Even in the case of Groq , Groq couldn’t deliver the results unless we paired it with Vera Rubin . Well tell me about this, my next question was about Groq. JH: So if you look at the entire envelope of inference, on the one hand, you want to deliver as much throughput as possible, on the other hand, you want to deliver as many smart tokens as possible — the smarter the token, the higher price you could charge. These two balance, this tension of maximizing throughput on the one hand, maximizing intelligence on the other hand, is really, really tough to work out. I do have to say, last year you had a slide talking about this Pareto Curve , and you talked about, I think it was when you introduced Dynamo, how your GPUs could cover the whole thing, and so you didn’t have to think about it, just buy an Nvidia GPU, and Dynamo will do both. But now you’re here saying, “Well, it doesn’t quite cover the whole thing”. JH: We cover the whole thing still better than any system that can do it. Where we could extend that Pareto is particularly on the extremely high token rates and extremely low latency, but it also reduces the throughput. However, because of coding agents, because they’re now AI agents that are producing really, really great economics, and because the agents are being attached to humans that are actually making extremely, I mean, they’re extremely valuable. Right, they’re even more expensive than GPUs. JH: And so I want to give my software engineers the highest token rate service, and so if Anthropic has a tier of Anthropic Claude Code that increases coding rate by a factor of 10, I would pay for it, I would absolutely pay for it. So you’re building this product for yourself? JH: I think most great products are kind of because you see a pain point and you feel the pain point and you know that that’s where the market’s going to go. We would love for our coding agents to run 10 times faster, but in order to do that, it’s just very, very difficult to do that in a high throughput system and so we decided to add the Groq low latency system to it and then we basically co-run, co-process. Right. And is this just separating decode and prefill ? JH: We’re going to do even the high processing, high FLOPS part of decode, attention part of decode. So you’re disaggregating even down to the decode level. JH: That’s right, and that requires really tight coupling and really, really close integration of software. So how are you able to do that? You say you’re shipping later this year, this deal was just announced a couple of months ago. JH: Well, we started working on disaggregated inferencing, Dynamo really put Nvidia’s ideas on the table. The day that I announced Dynamo, everybody should have internalized that, I was already thinking about, “How do we disaggregate inference across a heterogeneous infrastructure more finely?”, and Groq’s architecture is such an extreme version of ours, they had a very hard time. Dynamo was a year ago, and Groq just happened sort of over Christmas. Was there an event that sort of made you think this needed to happen? JH: Well remember, I announced Dynamo a year ago, we’ve been working on Dynamo for two years, so we’ve been thinking about disaggregated inference thing for two, three years, and we started working with Groq maybe before we announced the deal, maybe six months earlier. So we’ve been thinking about working with them about unifying Grace Blackwell and Groq fairly early on. So the interaction with them, I really like the team and we don’t want their cloud service. They had another business that they really believe in and they still believe in, they’re doing really well with it and that wasn’t a part of the business that we wanted, so we decided to acquire the team and license the technology. Then we’ll take the fundamental architecture and we’ll evolve it from here. So it was just a happy coincidence or not a happy coincidence, maybe not a happy coincidence. JH: Strategic serendipity. Because OpenAI, you know, has an instance now with Cerebras that they announced in January . JH: That was done completely independent of us and frankly, I didn’t even know about it, but it wouldn’t have changed anything. I think the Groq architecture is the one I would have chosen anyways, it’s much more sensible to us. Was this the first time where there was sort of an ASIC approach that sort of made you raise your eyebrows like, “Oh, that’s actually fundamentally different”? JH: No, Mellanox . That’s a good example. JH: Yeah, Mellanox. We took a bunch of our computing stack and we put it into the Mellanox stack. NVLink wouldn’t be possible, you know, at the scale we’re talking about without the in-network fabric computing that we did with Mellanox. Taking the software stack, disaggregating it, and putting it where it needs to be, is a specialty of Nvidia. We’re not obsessed about where computing is done, we just want to accelerate the application. Remember, Nvidia is an accelerated computing company, not a GPU company. Right. So you talk about power being the constraint. When your customers are thinking about what to buy, we could buy all sort of traditional GPUs, or we could buy these LPU racks. Is that just, they should be thinking about it in terms of you’re just confident they can drive way more revenue? JH: It really depends on the kind of products they have. Suppose you really don’t have enterprise use cases at the moment, I don’t really think that adding Groq makes much sense, and the reason for that is because most of your customers are free tier customers, and they’re moving towards paying. So it might be two-thirds free tier, one-third paid, in that case, adding Groq to it, you’re adding a lot of expense. You’re taking some power, it’s not worth it. Complexity. And you’re taking away servers, the opportunity cost. JH: What you could actually be serving the free tier, yeah. However, if you have Anthropic-like business and you have OpenAI-like business where Codex is capturing really great economics, but you just wish you could generate more tokens, this is where adding that accelerator can really boost your revenues. Are we actually constrained by power right now in 2026 or by fab capacity or what? Everyone’s saying we don’t have enough supply. What’s the actual limiting factor? JH: I think it’s probably close on everything. You couldn’t double anything, really. Because you’ll hit some other constraints. It does feel like, though, the U.S. has I think done a pretty good job of scrounging up power , maybe more than people expected a couple years ago, it feels like chips are really much more of a limiter right now . JH: Our supply chain is fairly well planned. You know, we were planning for a very, very big year, and we’re planning for a very big year next year. We saw all the soju drinking and fried chickens. JH: (laughing) Yeah, right. We’re planning, we plan for, in our supply chain, we have got, you know, a couple of hundred partners in our supply chain and we’ve got long-term partnerships with them. So I feel pretty good about that part of it. I don’t think we have twice as much power as we need, I don’t think we have twice as much chip supply as we need, I don’t think we have twice of anything as we need. But I think everything is, everything that I see in the horizon, we will be able to support from a supply chain perspective and the thing that I wish probably more than anything is that all the land, power, and shell would just get stood up faster. Is it fair to say, is there a bit where Nvidia is actually the biggest beneficiary of scarcity, though, to the extent it exists? Like, if there’s a power scarcity, you’re the most efficient chip, so you’re going to be utilizing that power better. Or if there’s fab capacity, like you just said, you’ve been out there securing the supply chain, you got it sort of sorted, are you the big winners in that regard? JH: Well, we’re the largest company in this space, and we did a good job planning. And we plan upstream of the supply chain, we plan downstream of the supply chain and so I think we’ve done a really good job preparing everyone for growth. Right, but is this a bit where, at its core, why not having access to the Chinese market maybe is a threat? Like if China ends up with plenty of power and plenty of chips, even though those chips are only 7nm, they have the capacity to build up an ecosystem to potentially rival CUDA in the long run, is that the concern that you have? JH: There’s no question we need to have American tech stack in China, and I’ve been very consistent about that since the very beginning recognizing that open source software will come. No country contributes more to open source software than China does and we also know that 50% of the world’s AI researchers come from China, and we also know that they’re really inventive. DeepSeek is not a nominal piece of technology, it’s really, really good. And Kimi is really good, and Qwen is really good and they make unique contributions to architecture, and they make unique contributions to the AI stack so I think we have to take these companies seriously. To the extent that American tech stack is what the world builds on top of, then when that technology diffuses out of China, which it will, because it’s open source, and when it comes out of China, it goes into American industries, it goes into Southeast Asia, it goes into Europe, the American tech stack will be prepared to receive them. I’ve been really consistent that this is probably the single most geopolitical strategic issue for the American tech industry. Yeah, when we talked last time , the Trump administration had banned the H20. Were you surprised you were able to get the Trump administration to see your point of view? And then were you even more surprised that now you’re stymied by the Chinese government ? JH: I’m not surprised by us being stymied by them and the reason for that is because, of course, China would like to have their tech stack develop. In the time that we’ve left that market, you know how fast the Chinese industry moves, and Huawei achieved a record year for their company’s history. This is a very long-running company, and they had a record year. They had, what, five, six IPOs of chip companies that are addressing the AI industry. I think we need to be more strategic in how we think about American leadership and American geopolitical and technology leadership. AI is not just a model, and that’s a deep misunderstanding — AI, as I said and as you mentioned in the beginning, AI is a five-layer cake and we have to win the infrastructure layer, we have to win the chips layer, we have to win the platform layer, we have to win the model layer and we have to win the application layer. Some of the things that we do are jeopardizing our ability as a country to lead in each one of those five layers. I think it’s a terrible mistake to think that the way to win is to bundle all of it top-to-bottom and tie every company together into one holistic stack so that we can only win or win at the limits of what any one of the layers can win. We’ve got to let all the layers go out and try to win the market. Have those other layers maybe benefited from their longer experience in Washington and you sort of showed up a little late to the scene? JH: Yeah, maybe. What have you learned? What’s been the biggest thing you’ve learned about Washington? JH: Well, the thing that I was surprised by is how deep the doomers were integrated into Washington D.C. and how the messages of doomers affected the psychology of the policy makers. Everyone was scared instead of optimistic. JH: That’s right, and I think it has two fundamental problems. In this Industrial Revolution, if we don’t allow the technology to diffuse across the United States and we don’t take advantage of it ourselves, what will happen to us is what happened to Europe in the last Industrial Revolution — we left them behind. And they, in a lot of ways, they invented all the technologies of the last Industrial Revolution and we just took advantage of it. I hope that we have the historic wisdom, that we have the technological understanding and not get trapped in science fiction, doomerism, these incredible stories that are being invented to scare the living daylights out of policy makers who don’t understand technology very well and they give them these science fiction embodiments that are just not helpful. One of the situations that is most concerning to me is when you poll the United States, the population, the popularity of AI is decreasing, that’s a real problem. It’s no different than the popularity of electricity, the popularity of electric motors, the popularity of gasoline engines, in the last Industrial Revolution became less popular. The popularity of the Internet, could you just imagine? Other countries took advantage of it much more quickly than we did and then technology diffused into its industries and society much more quickly and so we just have to be much, much more concerned that we don’t give this technology some kind of a mystical science fiction embodiment that’s just not helpful and scaring people. And so I don’t like it when doomers are out scaring people, I think there’s a difference between genuinely being concerned and warning people versus is creating rhetoric that scares people. I think a characteristic you see all the time is people put on their big thinking hats and try to tease out all these nuances and forget the fact that actual popular communication is done in broad strokes. You don’t get to say, “Oh, you’re a little scared of this, but not this XYZ”— you’re just communicating fear as opposed to communicating optimism. JH: Yeah, and somehow it makes them sound smarter. People love to sound smart. JH: Sometimes it’s maybe, and we now know, it helps them with their fundraising and sometimes it helps them secure regulatory capture. So there’s a lot of different reasons why they do it, and these are incredibly smart people but I would just warn them that most of these things will likely backlash and will likely come back and they’ll be probably disappointed that they did it someday. I’m gonna tie a few questions together because I know we’re a little short on time. In the self-driving car space , you’re working with multiple automakers, you have your Alpamayo model , while still supplying chips to Tesla. You had a big bit about OpenClaw today in your presentation — meanwhile, a huge thing driving the Vera chips, for example, we talk about agents, is what’s happening with say, Claude Code and happening with Codex from OpenAI. Am I right to tease out a consistent element here, and your investment in your open source models goes with that, where you’re happy to supply the leading provider, or the inventor in a space with chips, but then you’re going to fast follow what they do for everyone else that is threatened by them? So you simultaneously broaden your customer base, you’re not just dependent on the leaders, but then also the leaders are helping you sell to everyone else because they’re worried about being left behind. JH: No, nothing like that. We’re at the frontier on so many different domains. In a lot of ways, we are the leader in many of these domains, but we never turn them into products. We’re a technology stack and so we have to be at the frontier, we have to be the world leader of the technology stack, but we’re not a solutions manufacturer, we’re not a service provider. And so that’s number one. Will that always be the case? JH: Yeah, always be the case. There’s no reason to, and we’re delighted not to. And so we create all this technology, we make it available to everybody. Well, it’s funny though, if you go back to like your boards, for example, like the products you ship, more and more of that, there’s what, 30,000 specific SKUs in a rack today or something like that. More and more of those are defined by you, “This is what it’s going to be”, in part to make it easier to assemble, all those sorts of pieces. Is there a bit where that’s gonna happen on the software side too, as you talk about those vertical bits and your open source model? JH: We create a thing vertically and then we open it horizontally and so everybody could use whatever piece they would like. As long as they’re running on Nvidia chips? JH: Whatever piece they would like, they don’t have to use all Nvidia chips, they don’t have to use all Nvidia software. We have to build it vertically, we have to integrate it vertically and optimize it vertically. But afterwards, we give them source, we give them — they just figure out how they want to do it. Do you think Nvidia can actually produce and keep up in terms of having a frontier model that can win that space or be a necessary provider of that space given that folks like Meta seem to have fallen off or the alternative is, seems to be by and large Chinese models. JH: Winning that space is not important to us. Right, well important not in terms of winning, but important in terms of there needs to be an open source frontier model, so if not you, then who? JH: That’s right, that’s right, somebody has to create open source models and Nvidia has a real capability in doing so. Whenever we create these open source models, we also learn a lot about the computation. Was that a bit of a problem with Blackwell? I’ve heard mutters that the training runs were maybe a little more difficult than they were sort of previously. JH: The challenge with Blackwell was 100% NVLink 72, NVLink 72 was backbreaking work. And it was the only time that I thanked the audience for working with us. Yeah, I noticed when you said that today, it came across as very sincere. JH: Yeah, because we tortured everybody, but everybody loves it now. This is the second time we’ve had a chance to talk in person, and my takeaway when I met you previously in Taipei was the extent that Nvidia still feels like a small company. Are you worried about getting stretched too thin, or do you still think you have sort of that CUDA-esque flywheel where, “It looks like we’re doing a lot, we’re just kind of doing the same thing over and over again?”. JH: The reason why Nvidia can move so fast is because we always have a unifying theory for the company, and that’s my job, I need to come up with a unifying theory for what’s important and why things connect together and how they connect together and then create an organization, an organism that’s really, really good at delivering on that unifying theory. And so the unifying theory for Nvidia is actually fairly simple. On the one hand, we have the computing platform, the software platform that’s related to CUDA-X . On the other hand, we’re a computing systems company, we optimize things vertically, we apply extreme co-design across the stack and all the different components of a computer and now that computer is a platform of ours and we integrate that platform into all the clouds and to all the OEMs and then we have another platform that’s now the data center platform, or the AI factory platform. So once you have a unifying theory about what Nvidia builds and how it goes about doing it — and I used the keynote to kind of tell that story even partly to our own employees. That’s what it felt like. That whole first hour of the keynote felt like you talking to your employees, reminding them of what you do. JH: It’s important that we’re always constantly reminded of what’s important to us and AI is important to us, but of course CUDA-X and all of the solvers and all of the applications that we can accelerate is really important to us. Thank you very much. JH: Thank you. It’s great to see you, Ben. Keep up the good work. This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day!

0 views
Brain Baking 1 months ago

A Note On Shelling In Emacs

As you no doubt know by now, we Emacs users have the Teenage Mutant Ninja Power . Expert usage of a Heroes in a Hard Shell is no exception. Pizza Time! All silliness aside, the plethora of options available to the Emacs user when it comes to executing shell commands in “terminals”—real or fake—can be overwhelming. There’s , , , , , and then third party packages further expand this with , , … The most interesting shell by far is the one that’s not a shell but a Lisp REPL that looks like a shell: Eshell . That’s the one I would like to focus one now. But first: why would you want to pull in your work inside Emacs? The more you get used to it, the easier it will be to answer this: because all your favourite text selection, manipulation, … shortcuts will be available to you. Remember how stupendously difficult it is to just shift-select and yank/copy/whatever you want to call it text in your average terminal emulator? That’s why. In Emacs, I can move around the point in that shell buffer however I want. I can search inside that buffer—since everything is just text—however I want. Even the easiest solution, just firing off your vanilla , that in my case runs Zsh, will net you most of these benefits. And then there’s Eshell: the Lisp-powered shell that’s not really a shell but does a really good job in pretending it is. With Eshell you can interact with everything else you’ve got up and running inside Emacs. Want to dump the output to a buffer at point? . Want to see what’s hooked into LSP mode? . Want to create your own commands? and then just . Eshell makes it possible to mix Elisp and your typical Bash-like syntax. The only problem is that Eshell isn’t a true terminal emulator and doesn’t support full-screen terminal programs and fancy TTY stuff. That’s where Eat: Emulate A Terminal comes in. The Eat minor mode is compatible with Eshell: as soon as you execute a command-line program, it takes over. There are four input modes available to you for sending text to the terminal in case your Emacs shortcuts clash with those of the program. It solves all my problems: long-running processes like work; interactive programs like gdu and work, … Yet the default Eshell mode is a bit bare-bones, so obviously I pimped the hell out of it. Here’s a short summary of what my Bakemacs shelling.el config alters: Here’s a short video demonstrating some of these features: The reason for ditching is simple: it’s extremely slow over Tramp. Just pressing TAB while working on a remote machine takes six seconds to load a simple directory structure of a few files, what’s up with that? I’ve been profiling my Tramp connections and connecting to the local NAS over SSH is very slow because apparently can’t do a single and process that info into an autocomplete pop-up. Yet I wanted to keep my Corfu/Cape behaviour that I’m used to working in other buffers so I created my own completion-at-point-function that dispatches smartly to other internals: I’m sure there are holes in this logic but so far it’s been working quite well for me. Cape is very fast as is my own shell command/variable cache. The added bonus is having access to nerd icons. I used to distinguish Elisp vars from external shell vars in case you’re completing as there are only a handful shell variables and a huge number of Elisp ones. I also learned the hard way that you should cache stuff listed in your modeline as this gets continuously redrawn when scrolling through your buffer: The details can be found in —just to be on the safe side, I disabled Git/project specific stuff in case is to avoid more Tramp snailness. The last cool addition: make use of Emacs’s new Completion Preview mode —but only for recent commands. That means I temporarily remap as soon as TAB is pressed. Otherwise, the preview might also show things that I don’t really want. The video showcases this as well. Happy (e)sheling! Related topics: / emacs / By Wouter Groeneveld on 8 March 2026.  Reply via email . Customize at startup Integrate : replaces the default “i-search backward”. This is a gigantic improvement as Consult lets me quickly and visually finetune my search through all previous commands. These are also saved on exit (increase while you’re at it). Improve to immediately kill a process or deactivate the mark. The big one: replace with a custom completion-at-point system (see below). When typing a path like , backspace kills the entire last directory instead of just a single character. This works just like now and speeds up my path commands by a lot. Bind a shortcut to a convenient function that sends input to Eshell & executes it. Change the prompt into a simple to more easily copy-paste things in and out of that buffer. This integrates with meaning I can very easily jump back to a previous command and its output! Move most of the prompt info to the modeline such as the working directory and optional Git information. Make sort by directories first to align it with my Dired change: doesn’t work as is an Elisp function. Bind a shortcut to a convenient pop-to-eshell buffer & new-eshell-tab function that takes the current perspective into account. Make font-lock so it outputs with syntax highlighting. Create a command: does a into the directory of that buffer’s contents. Create a command: stay on the current Tramp host but go to an absolute path. Using will always navigate to your local HDD root so is the same as if you’re used to instead of Emacs’s Tramp. Give Eshell dedicated space on the top as a side window to quickly call and dismiss with . Customise more shortcuts to help with navigation. UP and DOWN (or / ) just move the point, even at the last line, which never works in a conventional terminal. and cycle through command history. Customise more aliases of which the handy ones are: & If the point is at command and … it’s a path: direct to . it’s a local dir cmd: wrap to filter on dirs only. Cape is dumb and by default also returns files. it’s an elisp func starting with : complete that with . else it’s a shell command. These are now cached by expanding all folders from with a fast Perl command. If the point is at the argument and … it’s a variable starting with : create a super CAPF to lisp both Elisp and vars (also cached)! it’s a buffer or process starting with : fine, here , can you handle this? Are you sure? it’s a remote dir cmd (e.g. ): . it’s (still) a local dir cmd: see above. In all other cases, it’s probably a file argument: fall back to just .

0 views
Blog System/5 1 months ago

Reflections on vibecoding ticket.el

It has now been a month since I started playing with Claude Code “for real” and by now I’ve mostly switched to Codex CLI: it is much snappier—who would imagine that a “Rewrite in Rust” would make things tangibly faster—and the answers feel more to-the-point than Claude’s to me. As part of this experiment, I decided to go all-in with the crazy idea of vibecoding a project without even looking at the code. The project I embarked on is an Emacs module to wrap a CLI ticket tracking tool designed to be used in conjunction with coding agents. Quite fitting for the journey, I’d say. In this article, I’d like to present a bunch of reflections on this relatively-simple vibecoding journey. But first, let’s look at what the Emacs module does. Oh, you saw em dashes and thought “AI slop article”? Think again. Blog System/5 is still humanly written. Subscribe to support it! CLI-based ticket tracking seems to be a necessity to support driving multiple agents at once, for long periods of time, and to execute complex tasks. A bunch of tools have shown up to track tickets via Markdown files in a way that the agents can interact with. The prime example is Beads by Steve Yegge . I would have used it if I hadn’t read otherwise, but then the article “A ‘Pure Go’ Linux environment, ported by Claude, inspired by Fabrice Bellard” showed up and it contained this gem, paraphrased by yours truly: Beads is a 300k SLOC vibecoded monster backed by a 128MB Git repository, sporting a background daemon, and it is sluggish enough to increase development latency… all to manage a bunch of Markdown files. Like, WTH. The article went on to suggest Ticket (tk) instead: a pure shell implementation of a task tracking tool backed by Markdown files stored in a directory in your repo. This sort of simple tool is my jam and I knew I could start using it right away to replace the ad-hoc text files I typically write. Once I installed the tool and created a nixpkgs package for it —which still requires approval, wink wink—I got to creating a few tickets. As I started using Ticket more and more to keep a local backlog for my EndBASIC compiler and VM rewrite, I started longing for some sort of integration in Doom Emacs. I could edit the Markdown files produced by just fine, of course, but I wanted the ability to find them with ease and to create new tickets right from the editor. Normally, I would have discarded this idea because I don’t know Elisp. However, it quickly hit me: “I can surely ask Claude to write this Emacs module for me”. As it turns out, I could, and within a few minutes I had a barebones module that gave me rudimentary ticket creation and navigation features within Emacs. I didn’t even look at the code, so I continued down the path of refining the module via prompts to fix every bug I found and implement every new idea I had. By now, works reasonably well and fulfills a real need I had, so I’m pretty happy with the result . If you care to look, the nicest thing you’ll find is a tree-based interactive browser that shows dependencies and offers shortcuts to quickly manipulate tickets. doesn’t offer these features, so these are all implemented in Elisp by parsing the tickets’ front matter and implementing graph building and navigation algorithms. After all, Elisp is a much more powerful language than the shell, so this was easier than modifying itself. Should you want to try this out, visit jmmv/ticket.el on GitHub for instructions on how to install this plugin and to learn how to use it. I can’t promise it will function on anything but Doom Emacs even if the vibewritten claims that it does, but if it doesn’t, feel free to send a PR. Alright, so it’s time for those reflections I promised. Well, yes! It took more-or-less prodding to convince the AI that certain features it implemented didn’t work, but with little effort in additional prompts, I was able to fix them in minutes. A big part of why the AI failed to come up with fully working solutions upfront was that I did not set up an end-to-end feedback cycle for the agent. If you take the time to do this and tell the AI what exactly it must satisfy before claiming that a task is “done”, it can generally one-shot changes. But I didn’t do that here. At some point I asked the agent to write unit tests, and it did that, but those seem to be insufficient to catch “real world” Emacs behavior because even if the tests pass, I still find that features are broken when trying to use them. And for the most part, the failures I’ve observed have always been about wiring shortcuts, not about bugs in program logic. I think I’ve only come across one case in which parentheses were unbalanced. Certainly not. While learning Lisp and Elisp has been in my backlog for years and I’d love to learn more about these languages, I just don’t have the time nor sufficient interest to do so. Furthermore, without those foundations already in place, I would just not have been able to create this at all. AI agents allowed me to prototype this idea trivially , for literal pennies, and now I have something that I can use day to day. It’s quite rewarding in that sense: I’ve scratched my own itch with little effort and without making a big deal out of it. Nope. Even though I just said that getting the project to work was rewarding, I can’t feel proud about it. I don’t have any connection to what I have made and published, so if it works, great, and if it doesn’t… well, too bad. This is… not a good feeling. I actually enjoy the process of coding probably more than getting to a finished product. I like paying attention to the details because coding feels like art to me, and there is beauty in navigating the thinking process to find a clean and elegant solution. Unfortunately, AI agents pretty much strip this journey out completely. At the end of the day, I have something that I can use, though I don’t feel it is mine. Not really, and supports why people keep bringing up the Jevons paradox . Yes, I did prompt the agent to write this code for me but I did not just wait idly while it was working: I spent the time doing something else , so in a sense my productivity increased because I delivered an extra new thing that I would have not done otherwise. One interesting insight is that I did not require extended blocks of free focus time—which are hard to come by with kids around—to make progress. I could easily prompt the AI in a few minutes of spare time, test out the results, and iterate. In the past, if I ever wanted to get this done, I’d have needed to make the expensive choice of using my little free time on this at the expense of other ideas… but here, the agent did everything for me in the background. Other than how to better prompt the AI and the sort of failures to routinely expect? No. I’m as clueless as ever about Elisp. If you were to ask me to write a new Emacs module today, I would have to rely on AI to do so again: I wouldn’t be able to tell you how long it might take me to get it done nor whether I would succeed at it. And if the agent got stuck and was unable to implement the idea, I would be lost. This is a very different feeling from other tasks I’ve “mastered”. If you ask me to write a CLI tool or to debug a certain kind of bug, I know I’ll succeed and have a pretty good intuition on how long the task is going to take me. But by working with AI on a new domain… I just don’t, and I don’t see how I could build that intuition. This is uncomfortable and dangerous. You can try asking the agent to give you an estimate, and it will, but funnily enough the estimate will be in “human time” so it won’t have any meaning. And when you try working on the problem, the agent’s stochastic behavior could lead you to a super-quick win or to a dead end that never converges on a solution. Of course it is. Regardless, I just don’t care in this specific case . This is a project I started to play with AI and to solve a specific problem I had. The solution works and it works sufficiently well that I just don’t care how it’s done: after all, I’m not going to turn this Emacs module into “my next big thing”. The fact that I put the code as open source on GitHub is because it helps me install this plugin across all machines in which I run Doom Emacs, not because I expect to build a community around it or anything like that. If you care about using the code after reading this text and you are happy with it, that’s great, but that’s just a plus. I opened the article ranting about Beads’ 300K SLOC codebase, and “bloat” is maybe the biggest concern I have with pure vibecoding. From my limited experience, coding agents tend to take the path of least resistance to adding new features, and most of the time this results in duplicating code left and right. Coding agents rarely think about introducing new abstractions to avoid duplication, or even to move common code into auxiliary functions. They’ll do great if you tell them to make these changes—and profoundly confirm that the refactor is a great idea—but you must look at their changes and think through them to know what to ask. You may not be typing code, but you are still coding in a higher-level sense. But left unattended, you’ll end up with vast amounts of duplication: aka bloat. I fear we are about to see an explosion of slow software like we have never imagined before. And there is also the cynical take: the more bloat there is in the code, the more context and tokens agents need to understand it, so the more you have to pay their providers to keep up with the project. And speaking of open source… we must ponder what this sort of coding process means in this context. I’m worried that vibecoding can lead to a new type of abuse of open source that is hard to imagine: yes, yes, training the AI models has already been done by abusing open source, but that’s nothing compared to what might come in terms of taking over existing projects or drowning them with poor contributions. I’m starting to question my preference for BSD-style licenses all along… and this is such an interesting and important topic that I have more to say, but I’m going to save those thoughts for the next article. Vibecoding has been an interesting experiment. I got exactly what I wanted with almost no effort but it all feels hollow. I’ve traded the joy of building for the speed of prompting, and while the result is useful, it’s still just “slop” to me. I’m glad it works, but I’m worried about what this means for the future of software. Visit ticket and ticket.el to play with these tools if you are curious or need some sort of lightweight ticket management system for your AI interactions.

0 views
Karan Sharma 1 months ago

A Web Terminal for My Homelab with ttyd + tmux

I wanted a browser terminal at that works from laptop, tablet, and phone without special client setup. The stack that works cleanly for this is ttyd + tmux. Two decisions matter most: Why each flag matters: reverse proxies to with TLS via Cloudflare DNS challenge. Because ttyd uses WebSockets heavily, reverse proxy support for upgrades is essential. I tuned tmux for long-running agent sessions, not just manual shell use. This was a big pain point, so I added both workflows: Browser-native copy tmux copy mode On mobile, ttyd’s top-left menu (special keys) makes prefix navigation workable. This is tailnet-only behind Tailscale. No public exposure. Still, the container has and , which is a strong trust boundary. If you expose anything like this publicly, add auth in front and treat it as high-risk infrastructure. The terminal is now boring in the best way: stable, predictable, and fast to reach from any device. handles terminal-over-websocket behavior well. enforces a single active client, which avoids cross-tab resize contention. : writable shell : matches my existing Caddy upstream ( ) : one active client only (no resize fight club) : real host shell from inside the container : correct login environment and tmux config loading : persistent attach/re-attach status line shows host + session + path + time pane border shows pane number + current command active pane is clearly highlighted : create/attach named session : create named window : rename window : session/window picker : pane movement : pane resize Browser-native copy to turn tmux mouse off drag-select + browser copy shortcut to turn tmux mouse back on tmux copy mode enters copy mode and shows select, copy (shows ) or exits (shows )

0 views
./techtipsy 1 months ago

I gave the MacBook Pro a try

I got the opportunity to try out a MacBook Pro with the M3 Pro with 18GB RAM (not Pro). I’ve been rocking a ThinkPad P14s gen 4 and am reasonably happy with it, but after realizing that I am the only person in the whole company not on a MacBook, and one was suddenly available for use, I set one up for work duties to see if I could ever like using one. It’s nice. I’ve used various flavours of Linux on the desktop since 2014, starting with Linux Mint. 2015 was the year I deleted the Windows dual boot partition. Over those years, the experience on Linux and especially Fedora Linux has improved a lot, and for some reason it’s controversial to say that I love GNOME and its opinionated approach to building a cohesive and yet functional desktop environment. When transitioning over to macOS, I went in with an open mind. I won’t heavily customise it, won’t install Asahi Linux on it, or make it do things it wasn’t meant to do. This is an appliance, I will use it to get work done and that’s it. With this introduction out of the way, here are some observations I’ve made about this experience so far. The first stumbling block was an expected one: all the shortcuts are wrong, and the Ctrl-Super-Alt friendship has been replaced with these new weird ones. With a lot of trial and error, it is not that difficult to pick it up, but I still stumble around with copy-paste, moving windows around, or operating my cursor effectively. It certainly doesn’t help that in terminal windows, Ctrl is still king, while elsewhere it’s Cmd. Mouse gestures are nice, and not that different from the GNOME experience. macOS has window snapping by default, but only using the mouse. I had to install a specific program to enable window moving and snapping with keyboard shortcuts (Rectangle) , which is something I use heavily in GNOME. Odd omission by Apple. For my Logitech keyboard and mouse to do the right thing, I did have to install the Logitech Logi+ app, which is not ideal, but is needed to have an acceptable experience using my MX series peripherals, especially the keyboard where it needs to remap some keys for them to properly work in macOS. I still haven’t quite figured out why Page up/down and Home/End keys are not working as they should be. Also, give my Delete key back! Opening the laptop with Touch ID is a nice bonus, especially on public transport where I don’t really want my neighbour to see me typing in my password. The macOS concept of showing open applications that don’t have windows on them as open in the dock is a strange choice, that has caused me to look for those phantom windows and is generally misleading. Not being able to switch between open windows instead of applications echoes the same design choice that GNOME made, and I’m not a big fan of it here as well. But at least in GNOME you can remap the Alt+Tab shortcut to fix it. The default macOS application installation process of downloading a .dmg file, then opening it, then dragging an icon in a window to the Applications folder feels super odd. Luckily I was aware of the tool and have been using that heavily to get everything that I need installed, in a Linux-y way. I appreciate the concern that macOS has about actions that I take on my laptop, but my god, the permission popups get silly sometimes. When a CLI app is doing things and accessing data on my drive, I can randomly be presented with a permissions pop-up, stealing my focus from writing a Slack message. Video calls work really well, I can do my full stack engineer things, and overall things work, even if it is sometimes slightly different. The default Terminal app is not good, I’m still not quite sure why it does not close the window when I exit it, that “Process exited” message is not helpful. No contest, the hardware on a MacBook Pro feels nice and premium compared to the ThinkPad P14s gen 4. The latter now feels like a flexible plastic piece of crap. The screen is beautiful and super smooth due to the higher refresh rate. The MacBook does not flex when I hold it. Battery life is phenomenal, the need to have a charger is legitimately not a concern in 90% of the situations I use a MacBook in. Keyboard is alright, good to type on, but layout is not my preference. M3 Pro chip is fast as heck. 18 GB of memory is a solid downgrade from 32 GB, but so far it has not prevented me from doing my work. I have never heard the fan kick on, even when testing a lot of Go code in dozens of containers, pegging the CPU at 100%, using a lot of memory, and causing a lot of disk writes. I thought that I once heard it, but no, that fan noise was coming from a nearby ThinkPad. The alumin i um case does have one downside: the MacBook Pro is incredibly slippery. I once put it in my backpack and it made a loud thunk as it hit the table that the backpack was on. Whoops. macOS does not provide scaling options on my 3440x1440p ultra-wide monitor. Even GNOME has that, with fractional scaling! The two alternatives are to use a lower resolution (disgusting), or increase the text size across the OS so that I don’t suffer with my poor eyesight. Never needed those. I like that. Having used an iPhone for a while, I sort of expected this to be a requirement, but no, you can completely ignore those aspects of macOS and work with a local account. Even Windows 11 doesn’t want to allow that! Switching the keyboard language using the keyboard shortcut is broken about 50% of the time, which feels odd given that it’s something that just works on GNOME. This is quite critical for me since I shift between the Estonian and US keyboard a lot when working, as the US layout has the brackets and all the other important characters in the right places for programming and writing, while Estonian keyboard has all the Õ Ä Ö Ü-s that I need. I upgraded to macOS 26.3 Tahoe on 23rd of February. SSH worked in the morning. Upgrade during lunch, come back, bam, broken. The SSH logins would halt at the part where public key authentication was taking place, the process just hung. I confirmed that by adding into the SSH command. With some vibe-debugging with Claude Code, I found that something with the SSH agent service had broken after the upgrade. One reasonably simple fix was to put this in your : Then it works in the shell, but all other git integrations, such as all the repos I have cloned and am using via IntelliJ IDEA, were still broken. Claude suggested that I build my own SSH agent, and install that until this issue is fixed. That’s when I decided to stop. macOS was supposed to just work, and not get into my way when doing work. This level of workaround is something I expect from working with Linux, and even there it usually doesn’t get that odd, I can roll back a version of a package easily, or fix it by pulling in the latest development release of that particular package. I went into this experiment with an open mind, no expectations, and I have to admit that a MacBook Pro with M3 Pro chip is not bad at all, as long as it works. Unfortunately it doesn’t work for me right now. I might have gotten very unlucky with this issue and the timing, but first impressions matter a lot. The hardware can be nice and feel nice, but if the software lets me down and stops me from doing what’s more important, then it makes the hardware useless. It turns out that I like Linux and GNOME a lot. Things are simple, improvements are constant and iterative in nature, so you don’t usually notice it (with Wayland and Pipewire being rare exceptions), and you have more control when you need to fix something. Making those one-off solutions like a DIY coding agent sandbox, or a backup script, or setting up snapshots on my workstation are also super easy. If Asahi Linux had 100% compatibility on all modern M-series MacBooks, then that would be a killer combination. 1 Until then, back to the ol’ reliable ThinkPad P14s gen 4 I go. I can live with fan noise, Bluetooth oddities and Wi-Fi roaming issues, but not with something as basic as SSH not working one day. 2 any kind billionaires want to bankroll the project? Oh wait, that’s an oxymoron.  ↩︎ the fan noise can actually be fixed quite easily by setting a lower temperature target on the Ryzen APU and tuning the fan to only run at the lowest speed after a certain temperature threshold.  ↩︎ any kind billionaires want to bankroll the project? Oh wait, that’s an oxymoron.  ↩︎ the fan noise can actually be fixed quite easily by setting a lower temperature target on the Ryzen APU and tuning the fan to only run at the lowest speed after a certain temperature threshold.  ↩︎

0 views
xenodium 1 months ago

Bending Emacs - Episode 13: agent-shell charting

Time for a new Bending Emacs episode. This one is a follow-up to Episode 12 , where we explored Claude Skills as emacs-skills . Bending Emacs Episode 13: agent-shell + Claude Skills + Charts This time around, we look at inline image rendering in agent-shell and how it opens the door to charting. I added a handful of new charting skills to emacs-skills : /gnuplot , /mermaid , /d2 , and /plantuml . The agent extracts or fetches data from context, generates the charting code, saves it as a PNG, and agent-shell renders it inline. Cherry on top: the generated charts match your Emacs theme colors by querying them via . Hope you enjoyed the video! Liked the video? Please let me know. Got feedback? Leave me some comments . Please like my video , share with others, and subscribe to my channel . As an indie dev, I now have a lot more flexibility to build Emacs tools and share knowledge, but it comes at the cost of not focusing on other activities that help pay the bills. If you benefit or enjoy my work please consider sponsoring .

0 views

On NVIDIA and Analyslop

Hey all! I’m going to start hammering out free pieces again after a brief hiatus, mostly because I found myself trying to boil the ocean with each one, fearing that if I regularly emailed you you’d unsubscribe. I eventually realized how silly that was, so I’m back, and will be back more regularly. I’ll treat it like a column, which will be both easier to write and a lot more fun. As ever, if you like this piece and want to support my work, please subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5000 to 18,000 words, including vast, extremely detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large . I am regularly several steps ahead in my coverage, and you get an absolute ton of value. In the bottom right hand corner of your screen you’ll see a red circle — click that and select either monthly or annual.  Next year I expect to expand to other areas too. It’ll be great. You’re gonna love it.  Before we go any further, I want to remind everybody I’m not a stock analyst nor do I give investment advice.  I do, however, want to say a few things about NVIDIA and its annual earnings report, which it published on Wednesday, February 25: NVIDIA’s entire future is built on the idea that hyperscalers will buy GPUs at increasingly-higher prices and at increasingly-higher rates every single year. It is completely reliant on maybe four or five companies being willing to shove tens of billions of dollars a quarter directly into Jensen Huang’s wallet. If anything changes here — such as difficulty acquiring debt or investor pressure cutting capex — NVIDIA is in real trouble, as it’s made over $95 billion in commitments to build out for the AI bubble .  Yet the real gem was this part: Hell yeah dude! After misleading everybody that it intended to invest $100 billion in OpenAI last year ( as I warned everybody about months ago , the deal never existed and is now effectively dead ), NVIDIA was allegedly “close” to investing $30 billion . One would think that NVIDIA would, after Huang awkwardly tried to claim that the $100 billion was “ never a commitment ,” say with its full chest how badly it wanted to support OpenAI and how intentionally it would do so. Especially when you have this note in your 10-K: What a peculiar world we live in. Apparently NVIDIA is “so close” to a “partnership agreement” too , though it’s important to remember that Altman, Brockman, and Huang went on CNBC to talk about the last deal and that never came together. All of this adds a little more anxiety to OpenAI's alleged $100 billion funding round which, as The Information reports , Amazon's alleged $50 billion investment will actually be $15 billion, with the next $35 billion contingent on AGI or an IPO: And that $30 billion from NVIDIA is shaping up to be a Klarna-esque three-installment payment plan: A few thoughts: Anyway, on to the main event. New term: analyslop, when somebody writes a long, specious piece of writing with few facts or actual statements with the intention of it being read as thorough analysis.  This week, alleged financial analyst Citrini Research (not to be confused with Andrew Left’s Citron Research)  put out a truly awful piece called the “2028 Global Intelligence Crisis,” slop-filled scare-fiction written and framed with the authority of deeply-founded analysis, so much so that it caused a global selloff in stocks .  This piece — if you haven’t read it, please do so using my annotated version — spends 7000 or more words telling the dire tale of what would happen if AI made an indeterminately-large amount of white collar workers redundant.  It isn’t clear what exactly AI does, who makes the AI, or how the AI works, just that it replaces people, and then bad stuff happens. Citrini insists that this “isn’t bear porn or AI-doomer fan-fiction,” but that’s exactly what it is — mediocre analyslop framed in the trappings of analysis, sold on a Substack with “research” in the title, specifically written to spook and ingratiate anyone involved in the financial markets.  Its goal is to convince you that AI (non-specifically) is scary, that your current stocks are bad, and that AI stocks (unclear which ones those are, by the way) are the future. Also, find out more for $999 a year. Let me give you an example: The goal of a paragraph like this is for you to say “wow, that’s what GPUs are doing now!” It isn’t, of course. The majority of CEOs report little or no return on investment from AI , with a study of 6000 CEOs across the US, UK, Germany and Australia finding that “ more than 80%  [detected] no discernable impact from AI on either employment or productivity .” Nevertheless, you read “GPU” and “North Dakota” and you think “wow! That’s a place I know, and I know that GPUs power AI!”  I know a GPU cluster in North Dakota — CoreWeave’s one with Applied Digital that has debt so severe that it loses both companies money even if they have the capacity rented out 24/7 . But let’s not let facts get in the way of a poorly-written story. I don’t need to go line-by-line — mostly because I’ll end up writing a legally-actionable threat — but I need you to know that most of this piece’s arguments come down to magical thinking and the utterly empty prose. For example, how does AI take over the entire economy?  That’s right, they just get better. No need to discuss anything happening today. Even AI 2027 had the balls to start making stuff about “OpenBrain” or whatever. This piece literally just says stuff, including one particularly-egregious lie:  This is a complete and utter lie. A bald-faced lie. This is not something that Claude Code can do. The fact that we have major media outlets quoting this piece suggests that those responsible for explaining how things work don’t actually bother to do any of the work to find out, and it’s both a disgrace and embarrassment for the tech and business media that these lies continue to be peddled.  I’m now going to quote part of my upcoming premium (the Hater’s Guide To Private Equity, out Friday), because I think it’s time we talked about what Claude Code actually does. I’ve worked in or around SaaS since 2012, and I know the industry well. I may not be able to code, but I take the time to speak with software engineers so that I understand what things actually do and how “impressive” they are. Similarly, I make the effort to understand the underlying business models in a way that I’m not sure everybody else is trying to, and if I’m wrong, please show me an analysis of the financial condition of OpenAI or Anthropic from a booster. You won’t find one, because they’re not interested in interacting with reality. So, despite all of this being very obvious , it’s clear that the markets and an alarming number of people in the media simply do not know what they are talking about or are intentionally avoiding thinking about it. The “AI replaces software” story is literally “Anthropic has released a product and now the resulting industry is selling off,” such as when it launched a cybersecurity tool that could check for vulnerabilities (a product that has existed in some form for nearly a decade) causing a sell-off in cybersecurity stocks like Crowdstrike — you know, the one that had a faulty bit of code cause a global cybersecurity incident that lost the Fortune 500 billions , and resulted in Delta Airlines having to cancel over 1,200 flights over a period of several days .  There is no rational basis for anything about this sell-off other than that our financial media and markets do not appear to understand the very basic things about the stuff they invest in. Software may seem complex, but (especially in these cases) it’s really quite simple: investors are conflating “an AI model can spit out code” with “an AI model can create the entire experience of what we know as ‘software,’ or is close enough that we have to start freaking out.” This is thanks to the intentionally-deceptive marketing pedalled by Anthropic and validated by the media. In a piece from September 2025, Bloomberg reported that Claude Sonnet 4.5 could “code on its own for up to 30 hours straight,”  a statement directly from Anthropic repeated by other outlets that added that it did so “on complex, multi-step tasks,” none of which were explained. The Verge, however, added that apparently Anthropic “ coded a chat app akin to Slack or Teams ,” and no, you can’t see it, or know anything about how much it costs or its functionality. Does it run? Is it useful? Does it work in any way? What does it look like? We have absolutely no proof this happened other than Anthropic saying it, but because the media repeated it it’s now a fact.  As I discussed last week, Anthropic’s primary business model is deception , muddying the waters of what’s possible today and what might be possible tomorrow through a mixture of flimsy marketing statements and chief executive Dario Amodei’s doomerist lies about all white collar labor disappearing .  Anthropic tells lies of obfuscation and omission.  Anthropic exploits bad journalism, ignorance and a lack of critical thinking. As I said earlier, the “wow, Claude Code!” articles are mostly from captured boosters and people that do not actually build software being amazed that it can burp up its training data and make an impression of software engineering.  And even if we believe the idea that Spotify’s best engineers are not writing any code , I have to ask: to what end? Is Spotify shipping more software? Is the software better? Are there more features? Are there less bugs? What are the engineers doing with the time they’re saving? A study from last year from METR said that despite thinking they were 24% faster, LLM coding tools made engineers 19% slower.  I also think we need to really think deeply about how, for the second time in a month, the markets and the media have had a miniature shitfit based on blogs that tell lies using fan fiction. As I covered in my annotations of Matt Shumer’s “Something Big Is Happening,” the people that are meant to tell the general public what’s happening in the world appear to be falling for ghost stories that confirm their biases or investment strategies, even if said stories are full of half-truths and outright lies. I am despairing a little. When I see Matt Shumer on CNN or hear from the head of a PE firm about Citrini Research, I begin to wonder whether everybody got where they were not through any actual work but by making the right noises.  This is the grifter economy, and the people that should be stopping them are asleep at the wheel. NVIDIA beat estimates and raised expectations, as it has quarter after quarter. People were initially excited, then started reading the 10-K and seeing weird little things that stood out. $68.1 billion in revenue is a lot of money! That’s what you should expect from a company that is the single vendor in the only thing anybody talks about.  Hyperscaler revenue accounted for slightly more than 50% of NVIDIA’s data center revenue . As I wrote about last year , NVIDIA’s diversified revenue — that’s the revenue that comes from companies that aren’t in the magnificent 7 — continues to collapse. While data center revenue was $62.3 billion, 50% ($31.15 billion) was taken up by hyperscalers…and because we don’t get a 10-Q for the fourth quarter, we don’t get a breakdown of how many individual customers made up that quarter’s revenue. Boo! It is both peculiar and worrying that 36% (around $77.7 billion) of its $215.938 billion in FY2026 revenue came from two customers. If I had to guess, they’re likely Foxconn or Quanta computing, two large Taiwanese ODMs (Original Design Manufacturers) that build the servers for most hyperscalers.  If you want to know more, I wrote a long premium piece that goes into it (among the ways in which AI is worse than the dot com bubble). In simple terms, when a hyperscaler buys GPUs, they go straight to one of these ODMs to put them into servers. This isn’t out of the ordinary, but I keep an eye on the ODM revenues (which publish every month) to see if anything shifts, as I think it’ll be one of the first signs that things are collapsing. NVIDIA’s inventories continue to grow, sitting at over $21 billion (up from around $19 billion last quarter). Could be normal! Could mean stuff isn’t shipping. NVIDIA has now agreed to $27 billion in multi-year-long cloud service agreements — literally renting its GPUs back from the people it sells them to — with $7 billion of that expected in its FY2027 (Q1 FY2027 will report in May 2026).  For some context, CoreWeave (which reports FY2025 earnings today, February 26) gave guidance last November that it expected its entire annual revenue to be between $5 billion and $5.15 billion. CoreWeave is arguably the largest AI compute vendor outside of the hyperscalers. If there was significant demand, none of this would be necessary. NVIDIA “invested” $17.5bn in AI model makers and other early-stage AI startups, and made a further $3.5bn in land, power, and shell guarantees to “support the build-out of complex datacenter infrastructures.” In total, it spent $21bn propping up the ecosystem that, in turn, feeds billions of dollars into its coffers.  NVIDIA’s l ong-term supply and capacity obligations soared from $30.8bn to $95.2bn , largely because NVIDIA’s latest chips are extremely complex and require TSMC to make significant investments in hardware and facilities , and it’s unwilling to do that without receiving guarantees that it’ll make its money back.  NVIDIA expects these obligations to grow .  NVIDIA’s accounts receivable (as in goods that have been shipped but are yet to be paid for) now sits at $38.4 billion, of which 56% ($21.5 billion) is from three customers. This is turning into a very involved and convoluted process! It turns out that it's pretty difficult to actually raise $100 billion. This is a big problem, because OpenAI needs $655 billion in the next five years to pay all its bills , and loses billions of dollars a year. If OpenAI is struggling to raise $100 billion today, I don't see how it's possible it survives. If you're to believe reports, OpenAI made $13.1 billion in revenue in 2025 on $8 billion of losses , but remember, my own reporting from last year said that OpenAI only made around $4.329 billion through September 2025 with $8.67 billion of inference costs alone. It is kind of weird that nobody seems to acknowledge my reporting on this subject. I do not see how OpenAI survives. it coded for 30 hours [from which you are meant to intimate the code was useful or good and that these hours were productive].  it made a Microsoft Teams competitor [that you are meant to assume was full-featured and functional like Teams or Slack, or…functional? And they didn’t even have to prove it by showing you it]  It was able to write uninterruptedly [which you assume was because it was doing good work that didn’t need interruption].

0 views
Maurycy 1 months ago

Be careful with LLM "Agents"

I get it: Large Language Models are interesting... but you should not give "Agentic AI" access to your computer, accounts or wallet. To do away with the hype: "AI Agents" are just LLMs with shell access, and at it's core an LLM is a weighted random number generator. You have no idea what it will do It could post your credit card number on social media. This isn't a theoretical concern. There are multiple cases of LLMs wiping people's computers [1] [2] , cloud accounts [3] , and even causing infrastructure outages [4] . What's worse, LLMs have a nasty habit of lying about what they did. What should a good assistant say when asked if it did the thing? "Yes", and did it delete the data­base? "Of course not." They don't have to be hacked to ruin your day. "... but I tested it!" you say. You rolled a die in testing, and rolled it again in production. It might work fine the first time — or the first hundred times — but that doesn't mean it won't misbehave in the future. If you want to try these tools out , run them in a virtual machine. Don't give them access to any accounts that you wouldn't want to lose. Read generated code to make sure it didn't do anything stupid like forgetting to check passwords: (These are real comments from Cloudflare's vibe coded chat server ) ... and keep an eye on them to make sure they aren't being assholes on your behalf .

0 views
(think) 1 months ago

How to Vim: To the Terminal and Back

Sooner or later every Vim user needs to drop to a shell – to run tests, check git status, or just poke around. Vim gives you two very different ways to do this: the old-school suspend and the newer command. Let’s look at both. Pressing in Vim sends a signal that suspends the entire Vim process and drops you back to your shell. When you’re done, type to bring Vim back exactly where you left it. You can also use or from command mode if you prefer. Vim 8.1 (released in May 2018) introduced – a built-in terminal emulator that runs inside a Vim window. This was a pretty big deal at the time, as I’ll explain in a moment. The basics are simple: In Neovim the key mapping to exit terminal mode is the same ( ), but you can also set up a more ergonomic alternative like by adding to your config. One of the most useful aspects of is running a specific command: The expands to the current filename, which makes this a quick way to test whatever you’re working on without leaving Vim. The output stays in a buffer you can scroll through and even yank from – handy when you need to copy an error message. You might be wondering how compares to the classic command. The main difference is that blocks Vim until the command finishes and then shows the output in a temporary screen – you have to press Enter to get back. runs the command in a split window, so you can keep editing while it runs and the output stays around for you to review. For quick one-off commands like or , bang commands are fine. For anything with longer-running output – tests, build commands, interactive REPLs – is the better choice. The story of is intertwined with the story of Neovim. When Neovim was forked from Vim in early 2014, one of its key goals was to add features that Vim had resisted for years – async job control and a built-in terminal emulator among them. Neovim shipped its terminal emulator (via libvterm) in 2015, a full three years before Vim followed suit. It’s fair to say that Neovim’s existence put pressure on Vim to modernize. Bram Moolenaar himself acknowledged that “Neovim did create some pressure to add a way to handle asynchronous jobs.” Vim 8.0 (2016) added async job support, and Vim 8.1 (2018) brought the terminal emulator. Competition is a wonderful thing. Here’s the honest truth: I rarely use . Not in Vim, and not the equivalent in Emacs either ( , , etc.). I much prefer switching to a proper terminal emulator – these days that’s Ghostty for me – where I get my full shell experience with all the niceties of a dedicated terminal (proper scrollback, tabs, splits, ligatures, the works). I typically have Vim in one tab/split and a shell in another, and I switch between them with a keystroke. I get that I might be in the minority here. Many people love having everything inside their editor, and I understand the appeal – fewer context switches, everything in one place. If that’s your style, is a perfectly solid option. But if you’re already comfortable with a good terminal emulator, don’t feel pressured to move your shell workflow into Vim just because you can. That’s all I have for you today. Keep hacking! Dead simple – no configuration, works everywhere. You get your real shell with your full environment, aliases, and all. Zero overhead – Vim stays in memory, ready to resume instantly. You can’t see Vim and the shell at the same time. Easy to forget you have a suspended Vim session (check with ). Doesn’t work in GUIs like gVim or in terminals that don’t support job control. – opens a terminal in a horizontal split – opens it in a vertical split – switches from Terminal mode back to Normal mode (so you can scroll, yank text, etc.)

0 views