Posts in Shell (20 found)
Langur Monkey 2 days ago

Langur Agent

Langur Agent is a simple, open, hackable CLI AI agent for Linux and macOS. It connects to any service providing an OpenAI-compatible endpoint. It features: The source is available in this repository . Langur Agent has been tested on Linux and macOS only. Install the agent with: Run the agent with the default session: If you need an API key to access the endpoint, put it in the file. Langur Agent looks for the file in the following locations, in order: Create the file with the API key: The agent uses to load at startup. The package reads from the environment automatically. You can also set in your shell profile. On first run, the configuration is created in . You can configure the agent interactively with the slash command. The agent works with any OpenAI-compatible endpoint, so LM Studio, Ollama, OpenWebUI, or any other service you configure. Here are the default values: Run the agent, and then you can enter your prompt. You can use the following key bindings during input: During inference, you can cancel the turn and return to the input prompt with Ctrl + c . Use to print information about the available commands, and to configure the agent interactively. Internally, Langur Agent uses sessions to separate different memory histories. Sessions are named by the user. By default, the agent uses the session. You can start in a different session (either create a new one, or restore it if it exists) with the argument: The default session’s name is , so the following two commands are equivalent: You can also list the existing sessions with : Sessions contain: For now, the configuration file is the same for all sessions. Sessions are matched by the directory name in the sessions location ( ). You can rename a session by just renaming the directory! You can enable mode for the current session with the command , or permanently in the configuration . External editor —In mode, exit INSERT mode ( Esc ), then press v to edit your prompt in an external editor (uses your or variable). There are a few commands available to use in the agent loop. You can list them with . Also, use (e.g. ) to show additional help for a command. Persistent memory follows XDG Base Directory spec in : In addition to persistent memory, the agent maintains a chat history of recent user input and assistant output pairs. This provides context that survives beyond the LLM’s context window. Here is how it works: Persistence: Configuration: Langur Agent can be easily customized and extended by adding new tools, commands, and skills. If you create a cool new tool, skill, or slash command, consider contributing it via a pull request! Create a file in or use one of the existing ones. To create a tool, create a method and decorate it with : Tools are auto-discovered on startup. The process is very similar to tools. You need to create your method, preferably in , and decorate it with . A slash command must return, in that order, , , , : Decorated commands are automatically registered, and auto-completed in the input prompt. Add a file in with YAML front matter, following the agentskills.io standard: The front matter and are parsed and shown in the skills list. The body is injected into the system prompt. session management memory management visual candy autocompletion interactive configuration Python 3.13+ for dependency management Current directory, Home directory, Alt + Enter : add a new line Enter : submit the prompt Ctrl + q : quit The input history Chat memory (see chat memory ) Notes (see session memory ) User profile (see session memory ) — user information — persistent notes (added via tool) Memory is loaded into the system prompt each turn tool adds notes during a session tool explicitly persists memory to disk Memory is auto-saved when the agent exits (interactive mode) Each user message and assistant response is stored in memory Reasoning is omitted from chat memory Automatically compacted when exceeding the configured character limit The user can trigger the compaction any time with Chat memory is attached to the system prompt on each turn The agent displays the last 10 exchanges, with long messages truncated Chat history is persisted to Automatically loaded on startup Saved after every exchange (user input or assistant response) Compacted history is also persisted to disk : a indicating if the command succeeded or failed. : an optional short status message. It is printed with or . : an optional with the Python Rich-formatted content, it is printed to the output. : an optional formatted in Markdown, it is printed to the output.

0 views

How my minimal, memory-safe Go rsync steers clear of vulnerabilities

Back in January 2025, multiple different security researchers published a total of 6 security vulnerabilities in rsync , some of which allow arbitrary code execution and file leaks, so naturally I was wondering whether/how my gokrazy/rsync implementation was affected. Did implementing my own (compatible, but minimal) rsync in Go, a modern and memory-safe programming language, really rule out entire classes of security vulnerabilities? This deep dive article was in the making since January 2025, but was delayed because we uncovered more unpublished vulnerabilities in the process! The “Security Vulnerabilities” section now covers all 12 vulnerabilities from the January 2025 batch and the May 2026 batch. If you are running (upstream, samba) rsync in production, upgrade to version 3.4.3 or newer. If you are running gokrazy/rsync in production, upgrade to version v0.3.3 or newer. Feel free to skip over the nitty-gritty security issue details and jump directly to: For context, I blogged about rsync, how I use it, and how it works back in June 2022. See also all posts tagged “rsync” . The original motivation for writing my own rsync (back then only a server, today all directions are supported) was to provide the software packages of distri, my Linux distribution research project for fast package management , which I wanted to host on router7 , my small home Linux+Go internet router, which in turn is built on gokrazy , my Go appliance platform. I am still running multiple gokrazy/rsync servers for this original purpose, and also many others! Having rsync available as a primitive (that you can link into your Go programs!) is really nice. This article covers the following security vulnerabilities: The first batch of the vulnerabilities above was announced on the oss-security mailing list , but note that the original report has more detail compared to the oss-security summaries! The later vulnerabilities were announced via GitHub Security Advisories on the rsync project . When the checksums are read by the daemon, two different checksums are read: Most importantly, note that field is filled with bytes. always has a size of 16: rsync.h is an attacker-controlled value and can have a value up to bytes, as the next snipper shows: The problem here is that can be larger than 16 bytes, depending on the digest support the binary was compiled with: md-defines.h support is common and sets the value to 64. As a result, an attacker can write up to 48 bytes past the buffer limit. Upstream fix: The upstream fix for CVE-2024-12084 changes the field to a dynamically-allocated field, which is allocated with length, and fixes the bounds check to check against the (checksum length for this transfer’s algorithm). Can Go help prevent this? Yes: Missing or incorrect bounds checks will not result in a heap buffer overflow in Go! Instead, attempting to write out of bounds will result in a panic because the Go runtime performs bounds checks. How does gokrazy/rsync fare? gokrazy/rsync also had insufficient validation! Our issue was different, though: It wasn’t size confusion, we just were not doing any validation of the sum header at all — oops! We can confirm that the Go runtime’s bounds check triggers on an attempt to write out of bounds by changing the code like so and running the tests: As expected, the Go runtime panics with the following message: Of course, crashing the entire server is not the best failure mode, so I added the missing bounds checking to turn the panic into an error . Because of the same lack of validation as in the previous CVE-2024-12084 vulnerability, an attacker could select a checksum algorithm with short checksums (e.g. with 8 byte checksums), but then claim they were sending longer checksums (e.g. 9 bytes), making the victim leak one byte of uninitialized stack content in the response. Leaking one byte of stack content may seem benign, but as the Google Security report puts it: The first pair of vulnerabilities are a Heap Buffer Overflow and an Info Leak. When combined, they allow a client to execute arbitrary code on the machine a Rsync server is running on. The client only requires anonymous read-access to the server. The daemon matches checksums of chunks the client sent to the server against the local file contents in . Part of the function prologue is to allocate a buffer on the stack of bytes: The daemon then iterates over the checksums the client sent and generates a digest for each of the chunks and compares them to the remote digest: Notably, the number of bytes that are compared again are bytes. In this case, the comparison does not go out of bounds since can be a maximum of . However, the local buffer, not to be confused with the attacker-controlled , is a buffer on the stack that is not cleared and thus contains uninitialized stack contents. A malicious client can send a (known) checksum for a given chunk of a file, which leads to the daemon writing 8 bytes to the stack buffer . The attacker can then set to 9 bytes. The result of such a setup would be that the first 8 bytes match and an attacker-controlled 9th byte is compared with an unknown value of uninitialized stack data. An attacker can divide a file into 255 chunks and as a result leak one byte per file download. An attacker can incrementally repeat the process, either in the same connection or by resetting the connection. As a result, they can leak bytes of uninitialized stack data, which can contain pointers to Heap objects, Stack cookies, local variables and pointers to global variables and return pointers. With those pointers they can defeat ASLR. Upstream fix: There are two relevant upstream fixes: Can Go help prevent this? Yes: By design, Go initializes all variables to the zero value. Go programmers do not need to remember to explicitly initialize variables. How does gokrazy/rsync fare? gokrazy/rsync is not affected by this vulnerability: Variables are always initialized in Go. Additionally, selecting checksums other than MD4 was only introduced in protocol version 30 (gokrazy/rsync implements protocol version 27). Description: (quoting the Google Security report ) When the syncing of symbolic links is enabled, either through the or ( ) flags, a malicious server can make the client write arbitrary files outside of the destination directory. A malicious server can send the client a file list such as: Symbolic links, by default, can be absolute or contain characters such as . In practice, the client validates the file list and when it sees the entry, it will look for a directory called , otherwise it will error out. If the server sends as [both, a directory and a symbolic link], [the client] will only keep the directory entry, thus the attack requires some more details to work. In mode, which the server can enable for the client, the server sends the client multiple file lists. The deduplication of the entries happens on a per-file-list basis. As a result, a malicious server can send a client multiple file lists, where: As a result, the directory is created first and is considered a valid entry in the file list. Then, the attacker changes the type of to a symbolic link. When the server then instructs the client to create the file, it will follow the symbolic link and thus files can be created outside of the destination directory. Can Go help prevent this? No. This vulnerability is caused by a logic error: when multiple file lists are used, the merged file list needs to be re-verified. But see Defense in depth: Go’s Upstream fix: The upstream fix for CVE-2024-12087 adds the missing validation. How does gokrazy/rsync fare? gokrazy/rsync is not affected by this vulnerability: gokrazy/rsync does not implement the incremental recursion mode ( ). The trade-off here is implementation complexity vs. resource usage: the incremental recursion mode allows working with the file set in a “windowed” way, as opposed to having to scan the entire file set before any transfer can begin. See also my How does rsync work? blog post. Description: (quoting the Google Security report ) The CLI flag makes the client validate any symbolic links it receives from the server. The desired behavior is that symbolic links target can only be 1) relative to the destination directory and 2) never point outside of the destination directory. The function is responsible for validating these symbolic links. The function calculates the traversal depth of a symbolic link target, relative to its position within the destination directory. As an example, the following symbolic link is considered unsafe: As it points outside the destination directory. On the other hand, the following symbolic link is considered safe as it still points within the destination directory: This function can be bypassed as it does not consider if the destination of a symbolic link contains other symbolic links in the path. For example, take the following two symbolic links: In this case, foo would actually point outside the destination directory. However, the function assumes that is a directory and that the symbolic link is safe. Upstream fix: The upstream fix for CVE-2024-12088 makes stricter by not allowing anywhere within the path, except at the very beginning. Can Go help prevent this? No. This vulnerability is caused by a logic error: the validation function was incorrect. We could have implemented that same bug. But see Defense in depth: Go’s How does gokrazy/rsync fare? gokrazy/rsync is not vulnerable: The feature is not yet implemented in gokrazy/rsync. The rsync receiver (in client mode) did not sanitize file names provided by the rsync sender, or otherwise prevent opening files outside the destination tree. A malicious sender could instruct a receiver to compare checksums of arbitrary files outside the destination tree. By observing the receiver’s reaction to a provided one-byte checksum, a malicious sender can leak arbitrary files. When a client connects to a malicious server the server is able to leak the contents of an arbitrary file on the client’s machine. In the client will read type as well as the from the server if the server sets the appropriate flags. The flag will not be set for the client. The caller ( ) then uses the server provided values to determine a file to compare the incoming data with. In the contents of the file specified by are copied into the destination file. This can be achieved by the server sending a negative token. The server sends a checksum to compare. If they don’t match, a 0 is returned. When the return value is 0 the receiver will then send a to the generator. The generator will then write a message to the server. The server can use this as a signal to determine if the checksum they sent was correct. By starting off with a of 1 a malicious server is able to determine the contents of the target file byte by byte. Upstream fix: The upstream fix for CVE-2024-12086 prevents opening files outside the destination tree by verifying the sender-provided path. Can Go help prevent this? Yes, Go offers an API to prevent this, see Defense in depth: Go’s . How does gokrazy/rsync fare? gokrazy/rsync is not vulnerable: the fuzzy matching feature was introduced with rsync protocol version 29, but gokrazy/rsync implements protocol version 27. Description: (quoting the Red Hat Security Advisory ) A flaw was found in rsync. This vulnerability arises from a race condition during rsync’s handling of symbolic links. Rsync’s default behavior when encountering symbolic links is to skip them. If an attacker replaced a regular file with a symbolic link at the right time, it was possible to bypass the default behavior and traverse symbolic links. Depending on the privileges of the rsync process, an attacker could leak sensitive information, potentially leading to privilege escalation. Upstream fix: The upstream fix for CVE-2024-12747 changes calls in the rsync sender to use the option. The paths are not expected to be symlinks at that point in the algorithm (symlinks would be handled with ). Can Go help prevent this? Yes, Go offers an API to prevent this, see Defense in depth: Go’s . How does gokrazy/rsync fare? gokrazy/rsync was vulnerable before commit , which introduces the same mitigation that upstream rsync uses. To reproduce the issue, use the following steps: Check out gokrazy/rsync v0.2.7: Patch the code as follows to undo the fix and execute the attack: Running the test now shows that the server traversed the symlink: A surprising discovery When I shared a draft of this article with Damien Neil, member of the Go Security Team and the author of the traversal-resistant API , he pointed out: I believe the gokrazy fix for CVE-2024-12747 is insufficient. You’re calling with , but only prevents symlink traversal in the last path component. This is probably still vulnerable to replacing an earlier path component so can be redirected by symlinking to . We reported this to the rsync security contact address in April 2025. In December 2025 I learned that someone else had also independently discovered and reported this issue. Ultimately, this resulted in CVE-2026-29518, published on 2026-05-20. Description: (quoting the rsync 3.4.3 NEWS entry ) TOCTOU symlink race condition allowing local privilege escalation in daemon mode without chroot. An rsync daemon configured with is exposed to a time-of-check / time-of-use race on parent path components. A local attacker with write access to a module can replace a parent directory component with a symlink between the receiver’s check and its open(), redirecting reads (basis-file disclosure) and writes (file overwrite) outside the module. Under elevated daemon privilege this allows privilege escalation. Default is not exposed. Reach: local attacker on the daemon host, write access to a module path, daemon configured with . Upstream fix: The upstream fix for CVE-2026-29518 uses , which is similar to Go’s API. Can Go help prevent this? Yes, Go offers an API to prevent this, see Defense in depth: Go’s . How does gokrazy/rsync fare? gokrazy/rsync was vulnerable until I switched the sender and the receiver to the traversal-resistant API . Description: (quoting the GitHub Security Advisory ) Description: The receiver’s compressed-token decoder accumulated a 32-bit signed counter without overflow checking. A malicious sender can trigger an overflow that, with careful manipulation, leaks process memory contents to the attacker – environment variables, passwords, heap and library pointers – significantly weakening ASLR and facilitating further exploitation. Reach: authenticated daemon connection with compression enabled (the default for protocols >= 30 when both peers advertise it). Disabling compression on the daemon (“refuse options = compress” in rsyncd.conf) is the available workaround. Upstream fix: The upstream fix for CVE-2026-43618 introduces the missing checks. How does gokrazy/rsync fare? gokrazy/rsync is not vulnerable because it does not implement compression. See gokrazy/rsync issue #35 for details on why compression support sounds simple, but is non-trivial. Description: (quoting the GitHub Security Advisory ) The 2025 fix that added a guard in was not applied to the visually-identical block in . A malicious rsync server can drive any connecting client into a deterministic by setting in the compatibility flags, sending a flist whose first sorted entry is not a leading “.” directory (which causes to set ), then sending a transfer record with and a non- iflag word. The receiver reads and dereferences the result. On glibc x86-64 the dereferenced pointer is mmap chunk metadata that lands at an unmapped address, hence a clean ; non-glibc allocators have not been audited. Reach: any rsync client doing a normal pull from an attacker-controlled URL. Works for both rsync:// URLs and remote-shell pulls. is the protocol-30+ default; no special options are required on the victim. Workaround: on the client. Upstream fix: The upstream fix for CVE-2026-43620 adds the guard to as well. How does gokrazy/rsync fare? Just like for CVE-2024-12087 , gokrazy/rsync is not affected by this vulnerability: gokrazy/rsync does not implement the incremental recursion mode ( ). Description: (quoting the GitHub Security Advisory ) Description: Earlier fixes for symlink races on the receiver’s open() call (CVE-2026-29518) missed the same race class on every other path-based system call: chmod, lchown, utimes, rename, unlink, mkdir, symlink, mknod, link, rmdir, lstat. On rsync daemons with “use chroot = no” a local attacker with filesystem access on the daemon host can swap a symlink into a parent directory component between the receiver’s check and one of these syscalls, redirecting it outside the exported module. The fix routes each affected path-based syscall through a parent dirfd opened under RESOLVE_BENEATH-equivalent kernel-enforced confinement (openat2 on Linux 5.6+, O_RESOLVE_BENEATH on FreeBSD 13+ and macOS 15+, per-component O_NOFOLLOW walk elsewhere). Default “use chroot = yes” is not exposed. Reach: local attacker on the daemon host, write access to a module path, daemon configured with use chroot = no. Upstream fix: The upstream fix for CVE-2026-43619 uses the family of syscalls, just like Go’s . Can Go help prevent this? Yes, Go offers an API to prevent this, see Defense in depth: Go’s . How does gokrazy/rsync fare? gokrazy/rsync is not affected, because it uses Go’s API throughout. Description: (quoting the GitHub Security Advisory ) On an rsync daemon configured with the global rsyncd.conf setting, the reverse-DNS lookup of the connecting client was performed after the daemon had chrooted into . If did not contain the files glibc needs for resolution ( , , , NSS service modules), the lookup failed and the connecting hostname was set to “UNKNOWN”. Hostname-based deny rules (“hosts deny = *.evil.example”) therefore could not match, and an attacker controlling their PTR record could connect from a hostname the administrator had intended to deny. IP-based ACLs are unaffected. The per-module setting is unrelated to this issue. Reach: rsync daemon configured with AND hostname-based ACLs AND does not include the libc resolver fixtures. Upstream fix: The upstream fix for CVE-2026-43617 moves the DNS lookup to an earlier point in the protocol. How does gokrazy/rsync fare? gokrazy/rsync is not vulnerable because we only implement IP-based allow/deny lists, not hostname-based allow/deny lists. Description: (quoting the GitHub Security Advisory ) The rsync client’s HTTP proxy support contains an off-by-one out-of-bounds stack write in ( ). After issuing the request, rsync reads the proxy’s first response line one byte at a time into a 1024-byte stack buffer with the bound , so the loop only ever writes . If the proxy (or a man-in-the-middle in front of it) returns 1023+ bytes on the first response line without a terminator, the loop exits with — a slot the loop never wrote, so holds stale stack bytes left there by the earlier that formatted the outgoing request. The post-loop code then does: The lands one byte past the end of the on-stack , corrupting whatever lives in the adjacent stack slot. AddressSanitizer reports at in the frame. Upstream fix: The upstream fix for CVE-2026-45232 validates the attacker-supplied data. How does gokrazy/rsync fare? gokrazy/rsync does not implement such proxy support, so it is not vulnerable. Let’s summarize how Go fares: Aside from being written in Go, another key difference between gokrazy/rsync and the official upstream rsync is that the gokrazy implementation is minimal : Let’s have a look at whether gokrazy/rsync was affected by each CVE at the time of publishing: To be clear: all known vulnerabilities are fixed in gokrazy/rsync! The table above documents what the state was at the time when each CVE was published. In other words: When the January 2025 vulnerabilities were published, gokrazy/rsync panicked (CVE-2024-12084) and was vulnerable to a TOCTOU race (CVE-2024-12747). In the process of fixing the TOCTOU issue, we discovered CVE-2026-29518, which was fixed in gokrazy/rsync before the CVE was published. CVE-2026-43619 was discovered even later, but was also already fixed in gokrazy/rsync with the same fix: using Go’s everywhere. As I was reading the vulnerability reports, I noticed that the reports were slightly misleading by their choice of words: most reports just spoke of “server” and “client”. However, in an rsync transfer, both sides, the rsync client and the rsync server can assume either role: sender (upload files) or receiver (download files)! Some setups come with further restrictions that make certain attacks harder or impossible to pull off. For example, when running in daemon mode, file system access can be restricted to the pre-configured module paths (but not in command mode!). Here is a diagram to give you an overview of the 4 different setups and role/protocol layering: In the context of our vulnerability reports, I would say that the Arbitrary File Leak vulnerability (CVE-2024-12086)’s original title “Server leaks arbitrary client files” can easily be misunderstood. Instead, I would say: The rsync receiver will leak arbitrary files to a malicious sender . I have verified that a malicious client sender can make an unpatched remote rsync open files outside the destination tree (e.g. the system password database) when running in command mode, for example over SSH. (But, when running in daemon mode, the server enables additional path sanitization, which prevents this attack.) Similarly, the Symlink Path Traversal vulnerability (CVE-2024-12087) speaks about a “malicious server”, but again, it should be “malicious sender”, which can be either the client or the server. The OpenBSD project is known for its security focus, so how does openrsync compare? openrsync is not affected by the Heap Buffer Overflow (CVE-2024-12084) and Stack Info Leak (CVE-2024-12085) vulnerabilities because it validates the checksum length and only supports one checksum size/algorithm (MD4). openrsync is not affected by CVE-2024-12086, CVE-2024-12087 and CVE-2024-12088 because it does not implement the relevant features (like gokrazy/rsync). Even if it was vulnerable, openrsync’s defense-in-depth measures like using OpenBSD’s and to restrict file system access would have prevented successful exploitation — at least when running on OpenBSD. openrsync is not affected by CVE-2024-12747 because it used from the very moment they implemented symlink support . But, because is not a sufficient fix for this issue, openrsync is affected by CVE-2026-29518! The above covers the January 2025 batch of vulnerabilities; the May 2026 batch is similar in that most features just are not implemented. Overall, I say: Well done, Kristaps and contributors! By diligently implementing validation, restricting the attack surface and employing defense-in-depth measures, openrsync manages to not be affected by almost all of the reported vulnerabilities. Which APIs and environments can we use on Linux for defense-in-depth measures? I’ll go through the ones supports, ordered by traditional to modern. Within a few weeks after starting the project, I added support for dropping privileges and using mount/pid namespaces on Linux to restrict the file system objects that my rsync server could work with. This approach works very well to mitigate path traversal attacks, but requires privileges, meaning we need to run as or in a Linux user namespace (if enabled on your distribution / system). That limitation makes mount namespaces well-suited for server setups, but usually unavailable for interactive one-off transfers that are typically running under a human’s user account. In the same commit that introduced Linux mount/pid namespace support, I also included a systemd service file that restricted file system access to home directories and encouraged folks in the README to further restrict file system access, depending on what their use-case allows. These file system restrictions, if set up correctly, mitigate the File Leak (CVE-2024-12086) and Path Traversal (CVE-2024-12087) vulnerabilities. The Symlink Race Condition (CVE-2024-12747) relies on privilege escalation through the rsync process, but thanks to the DynamicUser feature, our process has fewer privileges than other users. Similarly to mount namespaces, these measures are great for server setups, but too cumbersome to set up for interactive one-off usages. I stumbled upon Justine’s blog post Porting OpenBSD pledge() to Linux (2022) and was reminded that Linux offers the Landlock API for unprivileged, per-process access control, similar to OpenBSD’s system call, which openrsync uses. The basic idea is that once your program knows the directory it works with, it makes a call like and no longer has access to other file system locations. I had previously heard of Landlock at a Go Meetup, so I knew there was Go support for Landlock. Back in 2022, I enabled Landlock support in the gokrazy kernel images. So I gave it a shot in March 2025 and implemented Landlock support to restrict file system access . It took me a few hours, which seems a little longer than one might expect at first. Making Landlock work (and/or skipping it) in our test environment ran into a couple of road blocks: Our tests had defined many functions that get run in the same process, but when repeatedly adding rulesets, we would exceed the limit of 16 (!) policy layers per process. Once I had it set up just right, it is a beautiful solution. Now we can restrict rsync transfers to their sources (read-only) or destination directories (read-write), even for unprivileged invocations of ! 🎉 The downside to Landlock is that Landlock operates at the process level. This means that Landlock policies must include the files that your program needs, e.g. needs to be able to read for user id lookup, so if the attacker is after the file, Landlock does not help. In February 2025, the Go 1.24 release introduced the API, which is resistant against path traversal, see The Go Blog: Traversal-resistant file APIs (by Damien Neil, March 2025). This API allows more fine-grained control (per file system operation) compared to Landlock. Go 1.25 (released in August 2025) added more methods to , making it a convenient choice for most file system usage. I have converted all of ’s file system usage to use , which is a great fit: users configure input/output directories, but the filenames received over the network are untrusted. That’s exactly what was designed for! When I first looked into using , I thought that some system calls could inherently not be made with this API, like for example to create device node files. Damien explained: It won’t support mknod, though. However, you should be able to use it to enable a safe mknod: If you’re curious how that looks in practice, check out ’s usage in , line 15-29 . Another stumbling block was when I realized that unlike with , Linux only implements , but no (as of Linux 7.0)! Luckily, Lennart Poettering pointed out that there’s a trick to skip path resolution without : you can probably bind to in the meantime… And indeed, this works! Path resolution is skipped because we only specify a basename (last component of a path) after the known-safe , not a path (see line 49-56 ). With these two tips, v0.3.1 and newer are fully using , meaning all file system access is traversal-safe! 🥳 Lacking validation causes vulnerabilities It is interesting to note that aside from the TOCTOU vulnerabilities (CVE-2024-12747, CVE-2026-29518 and CVE-2026-43619), all other vulnerabilities were caused by missing or incorrect input validation. In three cases, there was just no validation to begin with. In another case (CVE-2024-12088), the subject matter of file system path resolution is tricky enough that the existing validation did not cover all edge cases. As the Go verdict section explains in more detail, the most valuable structural fixes are to provide bounds checking (= always-on validation) and safe-by-default APIs like Go’s . Too much complexity A few of the vulnerabilities came from evolution of the rsync protocol: The code used to correctly perform sufficient validation, but then new features were added. For example, when checksum algorithm negotiation was added (protocol version 30), the validation was not correctly updated. When incremental recursion was added (also protocol version 30), the validation that made sense for individual file lists was not updated for the new processing approach of merging incremental file lists. Avoiding complexity avoids vulnerabilities! Both gokrazy/rsync and also openrsync were not vulnerable to 8 out of the 12 security vulnerabilities simply because they do not implement the feature with the vulnerability. Of course, these features were added to rsync because they were valuable to someone at some point, and of course I am not saying that we should just… not develop software any further, ever. But, I consider it ideal to use an implementation whose complexity is appropriate for and proportional to the complexity of the use-case . In other words: for simple use-cases, reach for a simple implementation. Only reach for the fully-featured implementation where needed. The verdict on whether using Go has helped . The verdict on whether a minimal re-implementation like gokrazy/rsync helps . My comparison with OpenBSD’s (written in C). Defense in depth mechanisms one can use on Linux. The conclusion . CVE-2024-12084 to 12088 (original report) CVE-2024-12747 (discovered separately by Aleksei Gorban “loqpa”) CVE-2026-29518 (discovered by Damien Neil and myself! and independently by Nullx3D ) CVE-2026-43617 to 43620 CVE-2026-45232 rsync performed insufficient validation: It read the (attacker-controlled) checksum length from the network and compared the length against . However, rsync’s data structures always declared a 16 byte buffer: is always 16 (bytes), which is sufficient to hold an MD4 or MD5 checksum. used to be 16 (bytes), but can be larger when rsync is compiled with SHA256 or SHA512 checksum support. Hence, the bounds check was ineffective! An attacker could write out of bounds. This issue was introduced with commit in September 2022 , which added SHA256/SHA512 checksum support. A 32-bit Adler-CRC32 Checksum A digest of the file chunk. The digest algorithm is determined at the beginning of the protocol negotiation. The corresponding code can be seen below: sender.c : The “Some checksum buffer fixes” commit prevents this attack because the attacker-controlled can no longer be larger than the transfer’s checksum length. The “prevent information leak off the stack” commit initializes the memory to zero, thereby making any stack leak through impossible. Check out gokrazy/rsync v0.2.7: Patch the code as follows to undo the fix and execute the attack: The Go runtime’s bounds checks turn more serious security issues into a panic. A panic is still a denial-of-service risk, but that’s much preferable. Go initializes memory to zero, making info leaks like CVE-2024-12085 impossible. Go’s API prevents most of the remaining vulnerabilities. Only one out of twelve vulnerabilities (CVE-2026-43617) is a proper bug in the application logic that using Go could not have prevented. gokrazy/rsync is unaffected by many vulnerabilities because it does not implement the feature in question, for example . Like all other wire protocol-compatible rsync implementations, gokrazy/rsync targets protocol version 27, because later protocol versions introduce significant complexity. In some cases, features that would be good to implement come with significant blockers, e.g. compression is tricky, see gokrazy/rsync issue #35 for details. os.Root.OpenFile the parent directory of the target, File.Fd to get the file descriptor for that directory, https://pkg.go.dev/golang.org/x/sys/unix#Mknodat to create the file.

0 views

AI Is Too Expensive

If you liked this piece, you should subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words, including vast, detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large . My Hater's Guides To Private Credit and Private Equity are essential to understanding our current financial system, and my guide to how OpenAI Kills Oracle pairs nicely with my Hater's Guide To Oracle . This week, I’ll publish the second part to my ongoing series (“ What If…We’re In An AI Bubble? ”) about the factors and events that will cause the AI bubble to finally pop.  Subscribing to premium is both great value and makes it possible to write these large, deeply-researched free pieces every week.  AI is, as it stands, not economically viable for anybody involved other than the construction firms, NVIDIA, and the surrounding hardware companies benefitting from the irrational exuberance of a data center buildout that doesn’t appear to be happening at the speed we believed .  Every AI startup loses millions or billions of dollars a year, and nobody appears to have worked out a way to stop hemorrhaging cash. Hyperscalers have invested over $800 billion in the last three years, with plans to add another $700 billion or so in 2026 and another $1 trillion in 2027 , meaning that they need to make at least three trillion dollars in AI specific revenue just to break even , and $6 trillion or more for AI to be anything other than a wash. I went into detail about this (albeit at a lower, pre-2026/2027 capex number) in a premium piece last year .  To give you some context, Microsoft made $281 billion , Meta $200 billion , Amazon $716 billion , and Google $402.8 billion in revenue in their most-recent fiscal years for every single product combined, for a total of $1.599 trillion. None of them will talk about their actual AI revenues. Yes, yes, I know Microsoft said that it had $37 billion in AI revenue run rate ($3.08 billion a month or so) and Amazon had $15 billion, or around $1.25 billion a month , but both of these are snapshots of single months that are meant to make it sound like they’re going to make that much in a year but in the end, you don’t actually know anything about how much money they’ve made from AI. We do, however, now know that Microsoft has spent an approximate $100 billion on its OpenAI partnership after testimony from an executive during the otherwise-dull Musk-OpenAI trial, per Bloomberg : This is a fascinating insight for a few reasons: At the end of 2025, OpenAI claimed that it had 1.9GW of capacity (likely referring to total power draw rather than the actual critical IT of the infrastructure at its disposal), which, per analyst estimates, ( $42 to $44 million per megawatt ) works out to around $79.8 billion. This claim was made around six months before the release of Microsoft’s most recent quarterly results.  In other words, Microsoft has spent 4 years sinking (either through spending or allocating the capex in advance) nearly $300 billion into…building OpenAI? Okay, fine. Microsoft also has 20 million Microsoft 365 Copilot subscribers for an absolute maximum revenue of $7.2 billion…if every single one were paying $30 a month, which they are most assuredly not as Microsoft has been offering discounts on it for years . Based on my reporting from last year , Microsoft made around $7.5 billion from OpenAI’s inference spend and $761 million from its revenue share in Fiscal Year 2025, a year when it invested (either spent or allocated) around $88.2 billion in capital expenditures. I didn’t report it at the time, but I also had the numbers for all of Microsoft’s revenues for the first three quarters of Fiscal Year 2025 — a total of $8.9 billion of total AI revenue, with around $4.35 billion in revenues when you removed OpenAI’s inference. If we assume that Microsoft’s other AI services grew 10% quarter-over-quarter, I estimate that Microsoft likely made around $17.9 billion in AI revenue in FY2025, or a little under a fifth of its capex.  And let’s be clear: none of these numbers include the actual operating expenses. Data centers, after all, need electricity to run, and AI data centers in particular need a lot of electricity. And some — though, admittedly, not many — people to handle the things like maintenance, repairs, and operations. And then there are things like taxes, insurance, and the other day-to-day costs that, when you add them all together, make a big, scary number.  You can argue that “actually GPUs are profitable to run” ( I disagree! ), but for any of this to make sense, four things have to happen: All four must be true. If AI revenues don’t explode, capex can stop, margins can be positive, and your best-case scenario is…you maybe broke even. If capex never stops being invested, you need revenues to explode dramatically — to the tune of effectively doubling Microsoft, Meta and Google’s entire businesses, and tripling Amazon Web Services’ annual revenue ( $128 billion ) — and for said revenues to be margin-positive, because if they’re not, eventually other healthy businesses will slow, leaving AI to tear a hole in overall margins. In all cases, AI revenue must stay consistent because, well, you need to get paid . I also cannot find an economic scenario where this pays itself off.  Let’s assume that Anthropic is actually at $45 billion in annualized revenue ( I believe it’s doing some very worrisome maths to get there ), or around $3.75 billion a month. On an annualized basis, this would not be enough — assuming it had zero operating expenses (rather than losing billions) — to recover a single year of capital expenditures from Microsoft, Google, Meta, or Amazon from 2024 or 2023. Even if OpenAI’s entire cloud spend ( $50 billion ) for 2026 went to Microsoft and it doubled its Microsoft 365 Copilot revenue (at full cost) to $14.4 billion, it estimates it will invest $190 billion in capital expenditures this year. Amazon’s $15 billion AI run rate, even if it doubled, wouldn’t put much of a dent in its $200 billion in investment plans . While we don’t know Google’s AI revenues, it plans to invest $185 billion in capex this year . These AI revenues have to be completely fucking insane and they need to be that way extremely fucking soon , because otherwise the best they’ll be able to say is “our first few years of capex weren’t particularly useful but the stuff we built after it was,” which still works out to a few hundred billion dollars of waste. Things get even worse when you realize that at least 70% of Microsoft, Google, and Amazon’s compute is dedicated to Anthropic and OpenAI , two companies that burn so many billions of dollars that Microsoft, Google and Amazon have already fed them a combined $54 billion in the last three years, with $28 billion of that coming in the last month and Anthropic due another $50 billion from Google and Amazon if certain performance obligations are met. And there’s no real sign, outside of Anthropic and OpenAI’s compute spend (which is reliant on hyperscaler and venture capital money), of any real explosion in AI revenue. Per The Information (in a chart I love to share!), more than 50% of hyperscalers’ revenue backlogs comes from these companies: If massive, incredible demand for AI existed, wouldn’t these remaining performance obligations be near the trillion mark? Wouldn’t there be other Anthropic or OpenAI sized chunks of revenue? There’s allegedly incredible, unstoppable, insatiable demand for compute. Why isn’t it lining up? Let’s take a look at those RPOs! That was a lot of numbers, so let me make it simpler: outside of OpenAI and Anthropic, these three companies do not appear to be significantly increasing their revenues, and the only way to get that revenue is to feed money to one or both of these companies.   Put aside all the theoreticals and hypotheticals and metaphors and imaginary future scenarios and tell me: what, in the next year, are Microsoft, Google and Amazon going to do about this problem? How do they solve it? If we assume the absolute best-case scenario, these companies are making a combined $70 billion in annual revenue on investments that now — including the money invested in the companies themselves — total over $900 billion. Doubling that won’t be enough. Tripling it won’t be enough. In fact, to pay this off, these companies will need to be making over $100 billion each in AI revenue in the next year , because otherwise there is no covering these losses. And it all comes back to a very simple point: AI is too expensive. If the margins were good, they’d be sharing the margins. If the revenues were good, they’d be sharing the revenues (and no, run rates aren’t revenues). If the business was strong, it would be a separate category in their earnings.  But LLMs are too expensive! They cost too much to run, and said costs appear to increase linearly with revenues. The more a user uses a product, the more it costs the company to run it, and the more capacity they can take up. The only way to capture any growth is to buy and install GPUs , which in turn requires you to build somewhere to put them, which takes time and money.  I’m really struggling to see the argument in favor of continued capex investment. You’re more than $800 billion in the hole with, I estimate, less than half of that resulting in operational GPUs and capacity. Said capacity is mostly taken up by OpenAI and Anthropic, two companies that burn billions of dollars and do not appear to have an answer for how they might stop.  The more you build, the more your infrastructure becomes dependent on the continued existence of two perennially-unprofitable ultra-oafs, as your existent AI product lines are, at best, add-ons to products like Google Workspace or Microsoft 365, or further expansion of cloud compute capacity with lower margins and higher up-front costs than anything you’ve ever built.  Every quarter is an opportunity to put yourself another $30 billion or so in the hole, all in the hopes that, I assume, OpenAI or Anthropic will pay you $100 billion or $200 billion over the course of a few years, because nobody else in the entire universe is spending that much on compute. You are not recovering these investments without either a massive new product line that doesn’t exist today or three or four Anthropic or OpenAI-sized compute contracts. Put another way, Amazon needs another AWS ($128 billion a year), Microsoft another Azure ( $75 billion a year, including OpenAI’s 2025 compute spend ) and Google a business line at least half the size of search (around $200 billion a year). These businesses have grown to this size by providing extranormally large amounts of value from the very moment they were created and impenetrable monopolies — and while there are quite literally other cloud providers that can physically provide the infrastructure to OpenAI and Anthropic ( Oracle is trying to compete and may die as a result ), the actual “monopoly” here is “being able to deploy hundreds of billions of dollars.” Anthropic proved this when it took 300MW of compute from Elon Musk .  In Oracle’s case, as I’ve explained at length , it has to successfully build 7.1GW of capacity, have that capacity actually be margin-positive (doubtful!), and then actually get paid for it by the time it’s built in, oh, I dunno, 2032?  Sadly, I have bad news about Oracle, Microsoft, Amazon, and Google’s largest customers.  Here’s a fun game: ask an AI booster how OpenAI or Anthropic becomes profitable! Here’s what they’ll say: I must be abundantly clear that nobody has any proof that anyone is profitable on inference, but we have plenty of proof they’re not. They’ll likely cite known liar Sam Altman saying OpenAI is profitable on inference from a party from August 2025 , or Dario Amodei saying ( in a sentence around “stylized facts” that are “not exact” and are specifically “a toy model” and specifically not about Anthropic ) “the inference has some gross margin that’s more than 50%.”  Here’s a really simple way to dispute this: Coatue said that Anthropic’s revenues were 85% API calls in 2025 . If it’s profitable on inference, how is it still losing money? You’re gonna say “training,” but that doesn’t actually answer the question: if Anthropic’s process of providing tokens to its models is profitable, how is it losing so much money? Why offer a subscription platform at all?  As I’ll get to, Anthropic has companies paying massive amounts for tokens — hundreds of millions a year in some cases — that’s all inference . Why are you bothering with these stinky, nasty monthly subscriptions? The “inference is profitable” argument is a bedtime story told to people that can’t reconcile the logic of a company that allows people to burn between $8 and $13.50 of every dollar of their subscription revenue.   Otherwise, you have to reconcile with the fact that both Anthropic and OpenAI are both incinerating money and have no real path to any kind of sustainability other than, well, not doing that. One very, very specific counter-argument people make is that open source models are cheap, and can somehow be compared to OpenAI and Anthropic’s, despite the fact that we have no idea what the actual parameters of Sonnet, GPT, Opus, or any other of their models actually are.  What we do know is that both of these companies lose billions of dollars. What we do know is that OpenAI, per The Information , plans to burn $852 billion through the end of 2030, and that as of March 6, 2026 (per CFO Krishna Rao’s sworn affidavit), Anthropic made “exceeding” (sigh) $5 billion in revenue and spent $10 billion on inference and training.  Anthropic has done a great deal of work to obfuscate how much it actually makes or spends, but I think it’s likely it burns even more than OpenAI, given the fact that it’s had to raise $75 billion in the last 6 months ( assuming its new $30 billion round closes ), and that’s not including an additional $30 billion from Google and Amazon if certain unknown milestones are hit.  Then there’s the issue of those RPOs. Anthropic is now on the hook for $200 billion to Google, $100 billion to Amazon and $30 billion to Microsoft, I assume over the course of the next three or four years.  So let’s lay this out. Anthropic — based on its own affidavit from March — appears to have spent $3 to make $1 of revenue on a compute basis, and that’s before you include any and all other costs like staff or electricity or the vocal coach that Dario Amodei uses to add that bass to his voice.  Additionally, it needs $330 billion to pay its cloud obligations to Amazon, Google, and Microsoft over the next four years. I’d estimate it needs $5 billion a year for its compute deal xAI (so $20 billion over the total period) and an estimated $30 billion to cover its deal with CoreWeave . That brings us to a total of $380 billion. It’s hard to estimate the actual costs associated with running Anthropic because so much of the reporting no longer makes sense as a result of that affidavit. Nevertheless, I think it’s fair to assume it will need at least $20 billion of operating expenses across that four year period. We don’t even need to play in the realm of “what might Anthropic or OpenAI’s revenues be?” to understand the problem here. Both companies aggressively burn money, and neither of them have any answer as to how they might stop. Numerous reports about how Anthropic will turn “cash flow positive” in either 2027 or 2028 are fantastical, illogical, entirely driven by ridiculous projections, and should never have been reported as anything other than an attempt by companies to mislead their investors. In both cases, reporters should’ve had more asterisks on those numbers than Q*Bert reading Frank’s lines from Blue Velvet . And we have plenty of evidence that they’re losing more money over time. In January 2026, The Information reported that Anthropic’s gross margins were 40% in 2025 — 10% lower than its “optimistic” projections, specifically attributed to “...the costs of running Anthropic models from paying customers, in a process known as inference, on servers from Google and Amazon,” adding that those costs were “23% higher than the company anticipated.” In February, The Information ran another story saying that OpenAI’s gross margins fell from 40% in 2024 to 33% in 2025, a full 13% lower than its projected margins of 46%, all because (and I quote) “...the company having to buy more expensive compute at the last minute in response to higher than expected demand for its chatbots and models.” You know, exactly what Anthropic has had to do. This is what I’ve referred to as the knife-catching problem for compute demand — you either don’t order enough compute and have to rush to buy some last-minute as demand intensifies, or you order too much, and, well, to quote Dario Amodei: And right now, as I’ve covered , there’s not enough compute being built to keep up with Anthropic or OpenAI’s voracious demands, meaning that they will both be bartering to buy whatever’s available at whatever price it’s available at. This naturally will savage their already-negative margins… …and then what? No, really, and then what? One of you fucking AI boosters, answer me, how does this actual reverse course? Because even if Anthropic were making $100 billion in annual revenue, it would probably be losing $300 billion or more to get there. The fact it had to raise $30 billion in February , $15 billion in April, and now $30 billion more in May all while allegedly pulling in more than $3 billion a month in revenue suggests that its COGS are fucking horrendous, and its growth is coming at a terrible financial cost. Let’s say that Anthropic keeps growing and ( as The Information suggests ) hits $100 billion in annualized revenue (around $8.3 billion a month). How, exactly, does it afford to make that much money? Because right now it’s (allegedly) about to hit $45 billion in annualized revenue, and needs so much money that it’s absorbing (along with OpenAI) the majority of venture capital raised this year, and very clearly does not have any path to bring its costs down. The answer is simple: it can’t! There is no mechanism to do so. More compute does not make OpenAI or Anthropic’s services cheaper to offer. There is no magical silicon coming that will make any of this more affordable, and no, Anthropic is not “profitable on inference,” because if it were, that massive revenue growth would have leveled out its margins rather than require it to raise a little less than the combined value of every Major League Baseball team , or more if you add the other $50 billion that Amazon and Google have promised based on secretly-held performance obligations. The same goes for OpenAI, which “raised” $122 billion (around $45 to $50 billion in real cash, with the rest either paid in installments or on it IPOing or reaching (sigh) AGI) in February and is now already considering raising more . Somebody might counter-argue that this is companies raising as a means of boosting their valuations, I think that’s a very convenient way of looking at two extremely problematic companies.  I should also ask why neither of them appear to be seriously considering going public. While both were rumoured earlier in the year to be planning to do so in 2026, both appear poised to raise more private capital. I think the answer is simple: their CFOs know that doing so would reveal their actual margins, which are hot dogshit with sprinkles on top.  Nobody has a sensible or logical response here. Which leads us right to our next point! One important detail to keep in mind here is that as of a month or two ago, Anthropic moved all enterprise customers to token-based-billing, which will begin, I believe, a true stress-test of the true “value” of AI as costs skyrocket. Just last week I ran the first of a two (or three, potentially) part premium series called “What If We’re In An AI Bubble?” and touched on the gruesome subject of whether organizations could afford to pay for AI long-term : Earlier in the week, carnival barker and Salesforce CEO Marc Benioff said his company would spend $300 million on Anthropic tokens in 2026 , and as I discussed in my premium from Friday , unrestrained AI spending is inflating the revenues of Anthropic and OpenAI in a way that isn’t sustainable for anybody involved: The problem is simple: nobody actually knows how much AI is going to cost them in any given quarter. This means that the current token spend you’re seeing is entirely experimental, which is why organizations keep burning through their tokens so fast.  This massive growth in spend is what underpins the “massive” (I have serious questions about its accounting) growth in Anthropic’s revenue. Executives have, across the board, given their engineers free reign to burn as many tokens as they’d like, and while I severely doubt that Anthropic actually hit $50 billion in annualized revenue outside of not-quite-fraudulent non-GAAP measurements, I believe its revenue growth has come from an artificial boost from a tech industry searching for a reason to pay somebody money. To be very clear about what I mean, I think there is currently an AI token binge across both Anthropic and OpenAI. Enterprises do not know the actual value of AI, and do not know how much they should actually be budgeting, which is why Uber and others are running through their token budgets but not, it seems, spending less. We’re currently in an abundance phase — one where nobody is truly thinking about the costs outside of their fear of missing out — but there’s this nasty undercurrent of “wait, how much does that cost?” followed by “oh, fuck, well…you know I love AI but…” Put another way, the current spend on AI tokens is not something that’s indicative of lasting, reliable revenue. In some cases, the pressure to use AI for everything is turning companies’ software stacks into slop. Things are worse elsewhere. Something is wrong at Zillow. Something about LLMs has done something to its technical leadership, something that makes them talk strange and send weird slide decks with confusing, slop-ridden sentences.  The real estate tech firm spent over $1 million on AI services in the first quarter of 2026, and in April it spent $749,000 in tokens across Cursor and Anthropic’s services, as well as through AWS Bedrock. As of the end of the month, it was nearly 75% of the way through its annual Cursor token budget of $1.1 million.  As of the middle of May, its total AI spend had already crested over $300,000, and its Cursor budget sat dangerously close to the edge at 85%. This is particularly-concerning when you consider that Zillow’s net income for Q1 2026 was $46 million , and ranged from $2 million to $10 million each quarter of 2025.  Zillow is currently on course to spend at least $7 million on AI in 2026, and at its current pace might hit as much as $10 million, which would amount to a little less than 50% of its 2025 net income ( $23 million ).  You’re probably wondering how Zillow manages to spend so much on AI, and the answer — as I’ll get into in next week’s free newsletter — is that its technical executives appear to have AI psychosis, saying that the short-term goal is for “software engineers to never open a code editor again.” The reality is chaos. In a slide deck that I’ll discuss later, Zillow revealed that while engineering resources have largely stayed the same, outputs requiring human review have increased by nearly 50%. Meanwhile, code deployments and pull requests increased by 39%, and software reviewer load increased by 29,000 hours each month , creating a massive burden on the 1,500 or so engineers working at the company.  In simpler terms, that’s about 19 hours of extra work per engineer that’s literally just looking at extra code written by LLMs.  On Blind, the anonymous social network for tech workers, Zillow workers complain about Zillow’s code “slowly becoming AI slop,” with “much more code getting approved without guardrails or input due to people not being able to keep up the other’s velocity or just not caring anymore.”  One worker claimed that “the slop is job security,” adding that they “don’t want the output to be good or documentation to be clean [as] management will replace [them] with offshore/nearshore/AI agents at the slightest whiff of evidence that the slop cannon is self sustaining.” Another said that they felt “lost in the agentic world,” and that they “didn’t have full grasp of where we are going or what [their] role is,” with a “lot of overlap in what people are doing.” Another said that “people are burning tokens just to hit internal AI adoption targets,” adding that “this is what happens when leadership ties metrics to usage instead of outcomes,” saying that it “literally subsidized busywork.” This is all part of what an internal slide deck viewed by this publication called “AI-Native Engineering,” promising a “path to an agentic Zillow” and “faster outcomes for customers,” though customers are never mentioned in any other slide.  The deck — pumped full of AI-generated text — talks about “generic AI being a commodity,” saying that “Zillow-aware AI is a competitive advantage,” and at no point explains what that means. It encourages engineers to go from “AI-Assisted” to “AI-Native,” with “systems enabling org-wide leverage,” with engineers moving from being “soloists” — individual developers with AI tools — to “conductors” that orchestrate AI agents, to “composers” that “define systems AI can safely play,” adding that “2026 is the transition from conductor to composer.” Yet the strangest part is named “2027: A Tuesday,” discussing a theoretical day in the office for whoever is left at the company. This theoretical example is, apparently, a process that would take weeks, but now takes under two hours.”  Zillow intends, based on this deck, to sacrifice everything to AI — code review, vulnerability fixes, policy checks, deployments, testing, and basically having agents take over everything , no matter how small, like having an agent do dependency updates and security hotfixes that could be handled with a simple shell script. To quote Zillow: In practice, sources at Zillow tell me that there has been no actual movement toward this vision. Software engineers still open IDEs and review code manually, with one describing Zillow’s “vision” as “nonsense,” adding that “you can’t just throw buzzwords on a slide deck and change how all the engineers do their jobs.”  As for why token burn is so high, sources tell me that engineers are actively encouraged to use AI for everything , as much as possible, writing PRD (product requirement documents) in AI, then using the AI to make stuff based on the PRD, then doing a deck with AI, then writing emails with AI, using AI to brainstorm, or create weird, esoteric automations, with some managers pushing workers to have one personal AI “goal” to aspire to. Zillow’s agentic “vision” is apparently a remit from the C-suite. It’s hard to tell if this is AI psychosis or just classic Business Idiot bullshit.  Perhaps it’s a little of both. Every organization I’ve talked to has exceeded or is nearing the edge of their annual token budget barely five months into the year, which means that everybody has suddenly given themselves an extra few million dollars’ worth of operating expenses for reasons that escape effectively everybody I’ve talked to.  Every engineer tells me the same thing: “I’m being made to do this, I don’t want to do this, my managers do not seem to understand, my bosses seem to understand even less than my managers, and if I don’t use AI somebody is going to fire me.”  Put another way, CEOs and CTOs are screeching at their underlings to “use AI as much as possible” to “find its incredible benefits” without anybody really knowing what those are and how much it’ll cost to get there. This might be because Anthropic obfuscates the data that might tell customers the real costs.  Per Laura Bratton at The Information , Bratton’s article has numerous quotes from executives saying that Anthropic lacks transparency and granularity into the ways that tokens are being burned across an organization, in a way that I think sounds very, very suspicious, particularly when you add the following:  While I’m not accusing Anthropic of anything untoward, massive, multi-million dollar contracts that involve individuals burning thousands or tens of thousands of dollars’ worth of tokens with no service level agreement, transparency or true granularity into the burn is a perfect setup for a company — not saying it’s Anthropic! — to do something dastardly with those numbers.  While an individual might be able to monitor their own personal usage, in an organization of hundreds or thousands of engineers, who’s to know if, say, the particular token burn is consistent across every member of the company, or that those costs are actually matching up with what the user is doing? This is a company ostensibly worth $900 billion dollars acting with disregard for the basic measurement of “how much did this cost, and how did it cost so much?” And in the end, how do you even measure it at scale? Say you’ve got 1,500 engineers, and they’re spending a combined $1 million tokens a month. How the fuck do you actually measure the return on investment for that spend?   How many tokens does it take to do one thing? Is it consistent across every model? Is it consistent across every employee? Are you even measuring how many tokens a task costs? Because if you’re not, that token budget is basically throwing a dart blindfolded.  Okay, now you’ve measured a task, did you make sure to measure it multiple times? Because LLMs can randomly do things differently even with the same prompt and same Claude.MD file and same strictures and same data sources. You’re gonna need at least 10 samples of each task, and you’re gonna need to make sure somebody who actually knows what they’re doing can measure them, because if you get a dimwit, they’re going to say it can do something it can’t. Unless, of course, you can’t actually measure how many tokens a particular task can take with much accuracy, in which case every single AI token budget is bullshit. And each model does things differently depending on many different variables, some of them a result of the user, some of them a result of the AI labs themselves. Alright, well, maybe you just need KPIs — measurements you can aspire toward , and by pursuing them you can start working out how much it costs to do stuff.  Wait, which metric works there exactly?  In fact, it’s pretty hard to measure anything like “efficiency” or “productivity” in any business, because every metric connected to them can be gamed, leaving managers and executives with the problematic situation where they have to start learning how things work so they can see if they’re good. Before AI, this wasn’t as much of a problem, in the sense that inefficiencies and wasted hours weren’t directly connected to a chatbot that is specifically designed to burn money. Managers and executives could come up with whatever deranged, self-gratifying office bullshit they pleased, wasting hours of people’s time in the process, but doing so didn’t immediately connect to a massive, ever-increasing cost. AI is a perfect storm of failed concepts and organizations, and the apex of the Era of the Business Idiot , an epoch where we’re ruled by people so thoroughly disconnected from the actual workforce that it was inevitable that a technology would be created specifically to grift them. LLMs are dangerous for many, many reasons, but the under-discussed one is how well they play to a certain kind of executive imbecile. Generative AI is — to quote Mo Bitar — really good at doing an impression of work, much like most managers and c-suite executives, and even if it’s completely incapable of doing something, it’ll absolutely say it can and tell you you’re amazing for suggesting it. And that’s why Business Idiots love it.  Where regular human beings would say annoying things like “that’s not possible within that timeline” or “we don’t have the resources to do it,” AI will say “of course, right away!” and burn as many tokens as possible.  When it makes mistakes, it’ll apologize — as it should because it failed you — but then promise to do better next time, all while costing so much less, at least in theory , than a regular, stinky human being.  It’ll create a PRD of a theoretical software project with the confident and vigor that you need to take it immediately to a software engineer and say “build this immediately,” and when the software engineer tells you a bunch of bullshit about it not being possible , it’ll spit out several convincing-sounding responses. Fuck, why even bother talking to that engineer at all? Claude Code can mock up a prototype that you can then shove in their fucking face before you fire them for not using AI to do it themselves. Any executive-level fuckwit you’ve met in your life now has a seemingly-powerful tool that can burp up mimicry of open source software and, if you constantly prompt it, eventually get something half-functional onto some sort of web server. When you face bugs, it’ll try and fix them, sometimes also “fixing” (adding or deleting code) from elsewhere to be helpful, like when Cursor using Anthropic’s Claude Opus 4.6 model deleted an entire production database and all its backups . It will never, ever say no, even if it’s incapable, even if it has no thoughts, even if what you are asking is equal parts impossible and unreasonable in both its timescale and scope. A Business Idiot, given his druthers, can sit there and fuck around and make an LLM spit out something that makes him feel like he’s coding, which in turn makes him feel that you, a lazy and stupid engineer , could do even more with the power of AI. It doesn’t matter that it costs an absolute shit-ton of money, or that there’s no way to measure its efficacy. The Lion does not concern himself with things like “efficacy” or “productivity,” and the Lion is increasingly tired of your whining! The Lion doesn’t even understand what it is you do every day other than not doing what The Lion is asking for! You laugh, but this is genuinely how the majority of managers and executives think and act, and now they have a special chatbot that can fart out functional-enough prototypes to convince a Business Idiot they can do anything, because executives and managers do not regularly do much work and thus have no idea what it looks like other than when they look over your shoulder, which is why they wanted you back in the office! Organizations aren’t burning millions or hundreds of millions of dollars a year on AI because it’s good , they’re doing it because they are run by people who do not know what the fuck they’re doing.   In a sane world, randomly adding a massive, ever-expanding operating expense to your business with the express intent of — to quote IT firm Workato’s CIO , “eating the costs while employees experiment” — would have the board blow up your house. In our world, one dominated by disconnected, self-involved and massively-overpaid dullards, many businesses pushing their workers to use AI are doing so because the other guy is doing it, with about as much strategy and forethought as one would expect from somebody who spends 90% of their life reading emails, going to meetings, or going to lunch. The majority of those I see trumpeting the so-called benefits of AI do not appear to do anything of note. I have yet to see one so-called multi-agent orchestrator engineer psychopath ship something remarkable or impressive or even functional. I have yet to see any AI-obsessed boss write or create or author or do anything I can remember. I don’t see any of these fuckwits running a company on their own outside of those who have learned to sell stuff to other AI psychosis victims or executive midwits of varying size.  And why oh why is it always the language of inevitability and possessiveness? Nobody who’s this insistent, aggressive and violative with their language of “it’s here and if you don’t adopt it you’re stupid and dead” has ever been right about anything. Nobody this desperate, insistent and forceful has ever had good intentions, good vibes or brought good omens — they are always bearers of some kind of con.  Most technology is sold on elevating and ascending human beings. AI cheapens every interaction by creating a work-shaped product from a person that doesn’t respect you enough to give you work that’s barely fit for a human because it wasn’t made for one.  You must accept becoming a dogshit dealer that loves accepting and receiving low quality goods. You must celebrate intentionless and decaying slop, and defend it and the machine that made it with your entire being. You must sully yourself — treat its unexceptional, sloppy and unreliable outputs as signs of sentience, or at least the proof that digital sentience is possible. You must defend horrible, abrasive, ugly, loud monoliths of steel full of $50,000 graphics cards. You must say they are necessary, and you must aggressively antagonize those who do not.  Every time you defend generative AI you defend a machine of capital that has burned $1 trillion and created one of the most-wasteful products in history. If people disagree with you, you must attempt to harm them somehow — ostracize them, mock them, attack them, denigrate them. You will justify this as moral, because you have been manipulated by a technology built and sold by two of the greatest grifters of all time — Dario Amodei and Sam Altman.  Anything less is opposition to an industry with all the trappings of authoritarianism down to the media toadies, the propaganda and the seizure of land in the name of a nebulous “greater good.” But man, these men got people good.  Sam Altman helped propagate a technology perfect for conning people with potential, a larger extrapolation of Altman’s own life of taking dogshit — Loopt, for example! — and parlaying it into larger opportunities. It can make a really half-hearted demo of a lot of things, and that’s good enough to sell to Business Idiot.  Dario Amodei took this grift and perfected it. Anthropic is a company purpose-built to con people into giving it by money by making people feel smart. LLMs can do work-shaped stuff, sometimes, as long as you debase yourself to accept mediocre and often-broken stuff that you have to keep a vigilant eye on, and either use a subsided product that loses Anthropic money or pay a shit ton of money as an enterprise to Anthropic and they still lose money.  These companies were only capable of growing in an economy dominated by the gullible and work-shy. Only a capitalist culture dominated by people who don’t actually do or know stuff have let this get so far. Nobody wants this, nobody wanted it since the beginning, it was forced upon everyone, and to pretend otherwise is laughable and offensive. The amount of people who use this shit a bit and become convinced that we’re mere years from it costing over a trillion dollars to somehow making trillions of dollars and being an entirely different and good product should be aware that they are being manipulated. The more you feel compelled to defend AI the more scrutiny you must show it.  I am not your enemy! If you think that I am, you are on the side of a corporation or a product. You can try it, like it, and I don’t really care, but the second I see you trying to be condescending or judgmental or aggressive toward another person for not agreeing with your product choices I immediately feel suspicious. Can’t you see how these people act? Can’t you see how strange it is to defend a thing you pay money for that has terrible economics? If it wasn’t the “in” thing, being an AI person would be considered really weird. I look forward to the day it is. I hope you guys like having the stuff you said since 2022 repeated back to you! I’ve been saving it all. Time is running out for a graceful bow, and you better act quick!  If you feel self conscious while other people dunk on AI, that’s weird! I see people say they don’t like Macs all the time. Who gives a fuck! I’m not going to go to the mat for Tim Cook. People can make their own decisions.  Those comparing AI to AOL mailing CDs to people should feel ashamed of themselves. This is like if every single time you opened a magazine an AOL CD flew at your head, your boss told you he would replace you with a modem if you didn’t go online, and the news constantly ran segments called “I didn’t receive an email: father forgets son forever because he wasn’t online” or panels with “Internet experts” who said “I am on the Internet superhighway right now, and I’m certain that within 10 years AOL Time Warner will be able to email myself to my dad.”  Imagine if Shingy was a billionaire and went on TV every day in 1999 and told you “ the world must get ready, because you’re about to get a ICQ message from The Lord .” Generative AI was purpose-built to grift an economy run by executives and managers who don’t actually do any work. Its success has been driven by a remarkable, society-wide ignorance in the management sect, and its continued proliferation is only possible through the media’s continued trust and faith in the idea that CEOs are busy because they’re actually doing work. Yet even a Business Idiot eventually realizes that too much money is being spent, and the first one of these dimwits to cut their token budget will send the rest of them running for the doors. We should lock them. We should make everybody who obsessed over theoretical ideas about what AI can or will do ashamed for their intellectual deceit or constant ignorance.  At the end of the AI era, the only thing that will change the rot at the heart of our economy is the acceptance that the majority of companies are run by lazy, self-involved and ignorant fuckwits, and accountability for those who refused to scrutinize them. Microsoft has spent a total of $293.8 billion in capex since the beginning of Fiscal Year 2023 (which began in the back half of 2022). This means that around 30% of Microsoft’s capex ($87 billion) went to building OpenAI’s infrastructure. Based on discussions with sources familiar with Azure architecture, this is the vast majority of Microsoft’s operational capacity. AI revenues have to explode. Capex has to stop being invested. GPUs need to be margin positive, including both their cost and the debt associated with operationalizing them. AI revenue has to stay consistent both before and after you stop spending that capex. Microsoft’s RPOs jumped from $392 billion to $625 billion between Q1 and Q2 FY26 (or calendar year Q4 2025 and Q1 2026), driven by the $250 billion in “incremental Azure spend” from OpenAI (including already-existent commitments) locked up in October 2025 and the $30 billion promised as part of its deal with Anthropic from November 2025 . Based on Microsoft’s own disclosures , without Anthropic and OpenAI’s additions, RPO would have been effectively flat, as evidenced by the fact that in Q3FY26, remaining performance obligations sat at $627 billion .  Amazon’s RPOs jumped from $244 billion in Q4 2025 to $364 billion in Q1 2026, driven by its February 2026 $100 billion expansion of its $38 billion compute deal from November, and its extended partnership with Anthropic for 5GW of compute capacity unattached from any kind of dollar number.  Google’s RPOs jumped from $242.8 billion in Q4 2025 to $467.6 billion in Q1 2026, driven by ( per The Information ) $200 billion in committed spend on TPUs and compute from Anthropic, meaning that it has expanded its future revenues by an unremarkable $24.8 billion when you remove Anthropic’s spend, when RPOs had previously jumped $85 billion between Q3 and Q4, likely driven by its compute deal from October 2025 . It’s fair to assume a chunk of the remaining RPOs are from its deal to rent TPUs to Meta , announced in February 2026, which makes it likely that it accounts for the majority of the remaining $24.8 billion. Silicon will get cheaper. They’ll start selling services. They’re profitable on inference. It’s an example of a typical working day. At 8:30AM, the engineer notes that confirmation rates in Dallas dropped 3% overnight.  ‘Dallas inventory spiked; buyers went from 3 showings to 7. The agent shows the pattern: we're hitting the same buyer 7 times in 24 hours with "tour confirmed" pings. They're overwhelmed; they're muting us.’ The line before this says: “I don't open the codebase — I open the spec and eval dashboard.” Half an hour later, the engineer changes the spec, which is then tested against previous data, showing an improvement.  “The PM and I review diffs, check guardrails, approve.” Diffs are “differences” — essentially comparing two versions of the same document to see which lines have been changed.  The code is then rolled out.  At 11AM, the senior engineer mentors a junior engineer:  ‘A junior engineer's rescheduling agent is failing evals. I ask one question: "What happens if the buyer picks a slot the seller just blocked while the agent is negotiating?" We identify the race condition and add a constraint: "Always re-check availability at confirmation time." She updates the spec and evals. The agent passes.’ It is absolutely adorable they’re pretending that they’ll have junior engineers if this hellscape vision comes to life.  You can’t say “burn as many tokens as possible,” because employees will — as happened at both Amazon and Meta — deliberately create ways to burn more tokens using scripts and automations.  You can’t say “use AI every day,” because even if they do so, that doesn’t actually set up a success criteria. You can’t tell software engineers to try and “ship more software,” because that, again, emphasizes doing more, not making good stuff , and leads to an increase in velocity rather than how good the stuff is. You can’t say “pull requests” or any other metric a software engineer can manipulate, because in 100% of the situations where you give a software engineer a number to hit they will focus entirely on hitting that number.

0 views

Back to silence

“All of man’s miseries stem from his inability to sit quietly in a room alone.” — Blaise Pascal When I think about the challenge of living in the modern world, the constant that continually crosses my mind is that we are so unable to revel in silence anymore. We fill the potential silence with noise, continual distraction, the inability to sit with our thoughts. Music, podcasts, audio books, we have the content available to never have to sit with ourselves ever again. I have started to, when driving alone, to turn off all music and noise and just be in the silence as I drive. This time is helpful to re-center, to think about that which all of “this” is, and to find some semblance of peace. I could, instead, listen to my 40th audiobook of the year, or that podcast in which the guests are saying nothing of purpose or value for the 3rd time this week. That is the thing - nothing at all is being said . Nothing is being brought up that would make this existence “more” or “better”. In fact, the more that I consume, the less that I feel that I am anyone at all. With children at home, with a family, there is never a quiet moment. But - I love the noise and fun that comes with it. The difference is that there is nowhere we can seemingly go today without the noise - the notifications and the colors and lights and sounds that are designed to have us give over our attention so as to become better consumers. The “noise” that my children produce is nothing in comparison to the algorithmic noise that is meant to keep me from finding and talking to God. In fact, the noise my children produce bring me closer to God . Let us discuss how to get back to the silence. I see some people’s phones and am horrified by that which I see: Notification blobs everywhere, taking up their whole screen and pinging constantly. It is a wonder that people have any attention span at all these days. These applications are designed to be that little bird in your ear, telling you that they are right there so that you can spend your ever decreasing time on this Earth scrolling mindlessly. People literally watch others livestreaming their life - instead of living their own. On my phone, I have simply disabled all notifications that are not phone calls. Even then, I set daily periods of airplane mode between 9pm - 9am in which I am unreachable and offline People know they can reach me by calling me. When I pick up, I say “hello, what’s wrong” - I don’t want to be disturbed. In the future, I will only have a data plan: no phone number, and only authenticated methods such as xmpp for voice will be allowed. My phone will remain in airplane mode for most of the day. The thinking that we should always be accessible is asanine, it is the affliction of the age that must be fought against with ruthlessness. On my computers, I have turned off the notifications center in noctalia shell, so I literally never get any notifications whatsoever. I never seen anyone pinging me, emails coming in, messages on any applications, etc. I check IRC a few times every other hour, but otherwise have no idea if anyone is trying to reach me. I no longer allow unfettered browser usage, and limit the browser to certain hours of the day. I believe that unmetered and uncontrolled access to the internet is not something that is healthy to the human being, yet it is now commonplace with the little computer in our pocket, you would be considered quite “weird” to do limit connectivity - well, so be it! Email is supposed to be asynchronous by default. Instead of replying immediately, I check emails once at 9am and once at 4pm. No more than this. Again, I include in my emails that if anything urgent needs to be handled, to simply call me. I get so many interesting emails on a weekly basis and want to reply to them all, so this batches my time to give quality responses and encourage interaction. I have never understood how people have notifications across a half dozen applications on their devices, but were I to allow social media on my phone (I don’t), I would recommend turning off all notifications and never going back. The best way to mitigate this point of noise is to not install it in the first place. The more that I use technology, the more I want analogue methods of communication and work. So, I defer to my notebook and pen daily. Once an evening, I then organize these notes in my org-mode setup. Face to face communication is preferred over phone calls or texting. The time has always been ripe to reclaim the peace that we are meant to have. Everyone is moving so quickly without thought of where we are going, and it is detrimental to the human being. So, turn it all off. The world wants you to live on fear, of missing out, of missing the proverbial train. But nobody ever asks “where is the train going?” Perhaps it is better to miss the train than to be driven off the cliff with everyone else. As always, God bless, and until next time. If you enjoyed this post, consider Supporting my work , Checking out my book , Working with me , or sending me an Email to tell me what you think.

0 views

Where Are All The Data Centers?

If you liked this piece, please subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words, including vast, detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large . My Hater's Guides To Private Credit and Private Equity are essential to understanding our current financial system, and my guide to how OpenAI Kills Oracle pairs nicely with my Hater's Guide To Oracle . My last piece was a detailed commentary on the circular nature of the AI economy — and how the illusion of AI demand is just that, an illusion.  Subscribing to premium is both great value and makes it possible to write these large, deeply-researched free pieces every week.  During every bubble there’s one very obvious thing that keeps happening: things are said, these things are repeated, and are then considered fact. Sam Bankman-Fried was the smiling, friendly, “ self-made billionaire ” face of the crypto industry. NFTs were the future of art, and would change the way people think about the ownership of digital media. The actual evidence, of course, never lined up. NFT trading was dominated by wash trading — market manipulation through two parties deliberately buying and selling an asset to raise the price. Cryptocurrency never took off as anything other than a speculative asset, and altcoins are effectively dead . Sam Bankman-Fried was only a billionaire if you counted his billions of illiquid FTX tokens, but that didn’t stop people from saying he wanted to save the world weeks after the collapse of Terra Luna, a stablecoin that he himself had bet against and may have helped collapse .  Three months before his arrest, a CNBC reporter would fly to the Bahamas to hear SBF tell the story of how he “ survived the market wreckage and still expanded his empire, ” with the answer being that he had “stashed away ample cash, kept overhead low, and avoided lending,” as opposed to the truth, which was “crime.”  The point is that before every scandal is somebody emphatically telling you that everything’s fine. Everything seems real because there’s enough proof, with “enough proof” being a convincing-enough person saying that “most of FTX’s volume comes from customers trading at least $100,000 per day,” when the actual volume was manipulated by FTX itself , and the “$100,000 a day in customer funds” were being used by FTX to prop up its flailing token .  In the end, the “proof” that SBF was rich and that FTX was solvent was that nobody had run out of money and that nothing bad had happened to anybody. SBF was a billionaire sixteen times over because enough people had said that it was true.  Anyway, one of the most commonly-held parts of the AI bubble is that massive amounts — gigawatts’ worth — of data centers have both already been and continue to be built… …but then you look a little closer, and things start getting a little more vague. While Wood Mackenzie’s report said that there was “ 25GW of data center capacity added to the funnel ” in Q4 2025 does not say how much came online. CBRE said back in February that “net absorption of 2497MW” happened in primary markets in 2025 , with other reports saying that somewhere between 700MW and 2GW of capacity was absorbed every quarter of 2025. At the time, I reached out for any clarity about the methodology in question and received no response. Okay, so, I know data centers are getting built and that they exist . I believe some capacity is coming online. But gigawatts? Or even hundreds of megawatts? How much data center capacity is actually coming online?  Why did Anthropic get so desperate it took on a years old data center, xAI’s Colossus-1 , full of even older chips from a competitor — one whose CEO described the company as “evil, ” and that’s currently facing a lawsuit from the NAACP over allegations the facility’s gas turbines are polluting black neighborhoods ?  Remember, Colossus-1 is an odd data center, with around 200,000 H100 and H200 GPUs and an indeterminate amount of Blackwell GB200s, weighing in at around 300MW of total capacity… which isn’t really that much if we’re talking about gigawatts being built every quarter, is it?    So, I have two very simple questions to ask: how long does it take to build a data center, and how much data center capacity is actually coming online? These simple questions are surprisingly difficult to answer. There exists very little reliable information about in-progress data centers, and what information exists is continually muddied by terrible reporting — claiming that incomplete projects are “operational” because some parts of them have turned on , for example — and a lack of any investor demand for the truth. Hyperscalers do not disclose how many data centers they’ve built, nor do they disclose how much capacity they have available.  I find this utterly inexcusable, given the fact that Amazon, Google, Meta and Microsoft have sunk over $800 billion in capex (and more if you count investments into Anthropic and OpenAI) in the last three years . So I went and looked, and what I found was confusing. So, you’re going to hear people say “well Ed , data centers are being built ,” and what I’m talking about is data centers that have been fully constructed and then turned on . It’s really, really easy to find data centers that are under construction , but as I’ve discussed in the past, that can mean everything from a pile of scaffolding to a near-complete data center . Yet finding the latter is very, very difficult. I’ve spent the last week searching for data centers that broke ground in 2023 or 2024 that have actually been finished, and come up surprisingly empty-handed. Some projects are stuck in construction hell, eternally dueling with planning departments over permitting, some are chugging along with no real substantive updates, some, as is the case with Nscale’s Loughton, England data center, have done effectively nothing for the best part of a year , some are perennially adding more capacity to the order as a means of continuing raking in construction bills, and some are claiming their data centers are “operational” as only a single phase has turned on. You should also know that even once construction has finished, the buildings themselves must be fully filled with the necessary cooling, power and compute hardware, at which point it can be configured to meet a client’s specifications (which can take months), at which point the unfortunate soul building the facility can actually start making money. I think it’s also worth revisiting how difficult data center construction is, and how large these new projects are.  This starts with a very simple statement: nobody has actually built a 1GW data center (to be clear, it’s usually a campus of multiple buildings networked together) yet. There are campuses — such as Stargate Abilene — which promise to reach 1.2GW, but nearly two years in sit at two buildings at around 103MW of critical IT load each with, based on discussions with sources with direct knowledge of Abilene’s infrastructure , a third building sitting fully-constructed but with barely any gear inside it. It’s fundamentally insane how many different companies are trying to build these things considering how difficult even the simplest data center is to build. Take, for example, American Tower Corporation’s edge data center in Raleigh, North Carolina, which I’ll mention a little later. This is a 1MW facility — or one-thousandth the size of a gigawatt facility — occupying 4000 sq ft of real estate at first and expanding to 16,000 if ATC actually gets it up to 4MW. That’s about two-and-a-bit times larger than the typical American home . And, from ground-breaking to ribbon-cutting , it took eleven months to complete. And that’s not including all the other necessary time-consuming bits, like finding land, securing permits, and so on.  That’s a simple one. People want to build data center campuses a thousand times larger than that. Look at how difficult it is. In fact, it’s so difficult that the companies can’t build all of it at once. Larger data center campuses are almost always divided into “phases,” in part because that’s the smartest way to build them, and in part with the express intention of convincing you that they’re “fully operational.”  For example, CNBC’s MacKenzie Sigalos reported in October 2025 that Amazon’s Indiana-based (allegedly) 2.2GW Project Rainier data center was “operational,” but only seven out of a planned 30 buildings were actually operational, and her comment of “with two more campuses [of indeterminate capacity] underway.” This comment was buried two videos and 600 words into a piece that declared the data center was “now operational,” with the express intent of making you think the whole thing was operational. To give her credit, at least she didn’t copy-paste the outright lie from Amazon, which claimed that Rainier was “ fully operational ” in a press release the same day. You’ll also note that Amazon never provides any clarity about the actual capacity of Rainier. Sigalos did exactly the same thing when the first (of eight) buildings of Stargate Abilene opened, declaring that “OpenAI’s first data center in $500 billion Stargate project is open in Texas,” burying the comment that only one was operational with another nearly complete several hundred words earlier.  These are intentionally attempts to obfuscate the actual progress of the data center buildout, and if I’m honest, I’ve spent months trying to work out why big companies that were supposedly building large swaths of data centers would be trying to do so. Unless, of course, things weren’t going to plan. In its last (Q3 FY26) quarterly earnings call , Microsoft CEO Satya Nadella claimed that “[Microsoft] added another gigawatt of capacity this quarter, and [remained] on track to double [its] overall footprint in two years.” A quarter earlier , he claimed to have added “nearly one gigawatt of total capacity,”  with Karl Keirstead of UBS saying that he “...thought the one gigawatt added in the December quarter was extraordinary and hints that the capacity adds are accelerating.” As I’ll discuss below, I can find no evidence of anything more than a few hundred megawatts of Microsoft’s data center capacity coming online. While I’ll humour the idea that it doesn’t announce every new data center, and that there may be colocation and neocloud counterparties ( 67% of CoreWeave’s revenue comes from Microsoft, for example ) that make up the capacity, as I’ll also discuss, I don’t know where the hell that might be. So, to be aggressively fair, I asked Microsoft to answer the following questions on May 4, 2026: A Microsoft representative from WE Communications promised to "circle back" by 5PM ET on Monday May 4th, but did not return further requests for comment via text and email, which is incredibly strange considering the simple and straightforward nature of my questions. That’s probably because the vast majority of its publicly-announced or documented data center capacity doesn’t appear to be getting finished. In September 2025, CEO Satya Nadella claimed that Microsoft had added 2GW of capacity “in the last year,” and acted as if Fairwater, a project with two actively-constructed data centers with one in Wisconsin that broke ground in September 2023 and another in Atlanta that broke ground in July 2024 , was something to be “announced” rather than “a very expensive project that has taken forever.” Nadella also claimed that there are “multiple identical Fairwater datacenters under construction,“ though he neglected to name them. To be clear, “Fairwater” refers to a project where multiple data centers are linked with high-speed networking to make one larger cluster, a project that sounds ambitious because it is , and also unlikely because it’s yet to have been built.  Fairwater Atlanta — the latter of the Fairwaters — was “launched” in November 2025 and it’s unclear how much capacity it has. Cleanview claims it’s at 350MW of capacity , and Microsoft’s own community outreach page claims construction would be completed by the beginning of October 2025 , but, as I’ll get to, it’s unclear whether this is just one phase, given that reporting shows multiple other buildings still under construction . I have serious doubts that Microsoft stood up a 350MW data center in less than a year, given everything else I’m about to explain. Fairwater Wisconsin is also a data center of indeterminate size, but Cleanview claims Phase 1 is 400MW , quoting a story from FOX6 News Milwaukee from September 2025 that said that Microsoft was “investing an additional $4 billion to expand the campus,” featuring a video of a very much in construction data center saying the following: So, $3.3 billion — at a rate of around $14 million per megawatt per analyst Jerome Darling of TD Cowen — is about 235MW of capacity, which is a lot lower than 400MW.   Seven months later, Satya Nadella said that the Fairwater datacenter in Wisconsin was “going live, ahead of schedule,” a sentence written in the present tense, but also said that it “ will bring together hundreds of thousands of GB200s in a single seamless cluster,” which is in the future tense.  It’s a great time to remind you that Microsoft claims that it brought online roughly eight times that capacity (around 2GW) in the past six months.  To make matters worse, it doesn’t appear that Fairwater Wisconsin is actually operational. Ricardo Torres of the Milwaukee Journal-Sentinel reports that Microsoft has said it isn’t actually online , and that while there “...is equipment inside the data center conducting start-up opportunities…the company anticipates [they] will continue to happen for the next several weeks.”  Epoch AI’s satellite footage of Fairwater Wisconsin — which mentions  a completely wrong capacity because it’s uniquely terrible at calculating it ( it claimed Colossus-1 has 425MW capacity, for example) — notes that as of April 2026, one building appeared to be operational, with a second under construction. So, that’s one building in Wisconsin that might be complete, and based on the permitting application from August 2023 dug up by Epoch, the project is designed to have 117MW of capacity, which is a lot lower than 235MW. While Epoch didn’t have permitting for building two, it did for three and four, which are designed to have around 719MW of capacity , and as of April 2026 still appear to be slabs of concrete.  In simpler terms, there’s at most around 117MW of capacity running at Fairwater Wisconsin. The Fairwater data centers are Microsoft’s most-publicized data centers, yet they’re shrouded in secrecy, with the Atlanta Journal-Constitution having to file an open records request to find the site being developed by QTS, a data center developer owned by Blackstone . Videos of Fairwater Atlanta from last November show a giant campus with two large buildings and a patch of yet-to-be-developed dirt. DataCenterMap refers to it as “ under construction .” Epoch AI’s satellite footage notes that as of February 2026, building four’s roof was complete and “all mechanical equipment appears to be installed,” but “there is still a lot of construction activity around the building.”  Based on air permits filed as part of the project (that Epoch found), it appears that each building is powered by a number of Caterpillar 3516C Generator Sets at around 2.5MW each, with building one having 47 (117.5MW), building two having 13 (32.5MW), building three having 30 (75MW), and building four having 35 (87.5MW). If we’re very generous and assume that three buildings are complete, that means that Fairwater Atlanta is at around 225MW of capacity (not IT load!). So, that’s about 342MW of data center capacity being built by one of the largest companies in the world, in its most-publicized and written-about data centers. Put another way, for Microsoft to come remotely close to its so-called 2GW of capacity in the last six months, it will have had to bring online a little under six times that capacity. I’m calling bullshit. I really did want Microsoft to give me some answers, but I’m very confused as to how it can remotely claim it brought even a gigawatt of capacity online in the last year. I also question whether Microsoft is actually building multiple other “identical” Fairwater data centers, as I can’t find any announcements or pronouncements or mentions or hints as to where they might be. In fact, I’m having a little trouble finding where else Microsoft has been building data centers, and those I can find are extremely suspicious. In Microsoft’s announcement of its Wisconsin data center , it mentioned two other projects — one in Narvik Norway that had already been announced months beforehand by OpenAI , and another with Nscale in Loughton, England that was also announced by OpenAI that very same day as part of the entirely fictional Stargate project . If you’re wondering how those are going, Microsoft had to take over the entire Narvik project (which does not appear to have started construction) from OpenAI , and the Loughton data center ( which OpenAI also backed out of ) is currently a pile of scaffolding . For two straight quarters , Microsoft has said it’s brought on an entire gigwatt of capacity,and I have to ask: where?  Because when you actually look at the projects it’s announced, very little appears to have been built, and that which has is nowhere near its theoretical capacity. To be specific about what Microsoft is claiming, it’s saying it’s brought around 4GW of capacity online in the space of two years, and at a 1.35 PUE, that’s about 2.96GW of critical IT load, which works out to the power equivalent of around 284,600 H100 GPUs, which may be possible — after all, Microsoft apparently bought 450,000 H100 GPUs in 2024 — but I can’t find much evidence of data centers that could house that many GPUs, nor that might be in construction.  Let’s dig in. Microsoft broke ground on three data centers in Catawba County North Carolina in 2024 — one in Hickory, another in Lyle Creek, and another in Boyd Farms: Alright, maybe I’m being unfair! Maybe it’s just a North Carolina problem. There must be another that broke ground and got built…right?  Microsoft also broke ground on a data center in Quebec City, Canada in September 2024 , and as of April 2026 , “generator testing has been completed,” and “civil works will continue until Autumn 2026.”  Okay, well, maybe it’s a Canada problem. What about Microsoft’s New Albany, Ohio data center that broke ground in October 2024 ? Well, as of March 2026, “spring activity would resume,” and “beginning soon, soil will be delivered to the site via a designated truck route. I’ll note that Microsoft specifically says that Ames Construction is currently leading it, and that it will “resume the lead role in project communications” once the final phase of construction is done at some unknown time. Alright, well, how about the August 2025 ground breaking in Cheyenne, Wyoming that was allegedly “ due to launch in 2026 ”?  Well, Microsoft hasn’t updated its community page since it said there’d be a community meeting planned for November 2025 and that “neighbors within the vicinity will be notified ahead of construction,” which sounds like construction is yet to commence. Not to worry though, it announced on April 14, 2026 that it planned to expand it to “ accelerate innovation and economic growth ” How about that 2023-announced Southwest Hortolândia Brazil data center ? That’s right, the last update was in September 2025 , and the update was “construction activities continue to progress in alignment with local regulations.” A piece from Folha De S.Paulo from March 2026 mentioned that Microsoft “had begun operating its first artificial intelligence data centers in Brazil,” but satellite footage shows that it’s barely finished. What about the Newport, Wales data center it announced in 2022 ? Well, as of November 2025, a politician was standing on a concrete slab saying how many jobs it’ll theoretically bring in , which it won’t. What about Microsoft’s four data centers in Irving, Texas, announced December 2024 ? The best I’ve got for you is a news report about a data center in Irving Texas breaking ground in January 2025 . Its San Antonio data center, announced in July 2024 ? Well, construction was underway as of December 2025 , and it appears that construction will begin in the summer of 2026 on another one in the area. How about the two data centers outside of Cologne, Germany , announced in November 2024? Well, as of September 2025, Microsoft has… plans to build one of them ? …what about the 900 acres of land it bought in June 2024 in Granger, Indiana ? Great news! According to 16NewsNow , Microsoft officials “could break ground on a proposed data center…in late April or early May [2026].” How about Project Ginger West, a data center planned in Des Moines. Iowa since March 2021 ? Hope you like waiting , because Microsoft itself says that it’s estimated to finish construction in Summer 2028 . Ginger East , announced a few months later? Mid-2028 . Project Ruthenium ( announced 2023 )? I don’t have shit for you I’m afraid. Rutheniumkanda Forever! This company claims it’s built four fucking gigawatts of capacity , but when I go and look to see what it’s actually built I’ve failed to find a single announced data center from the last three years that got turned on outside of its Fairwater Atlanta and Wisconsin sites. To be clear, all of these sites are somewhere in the 200MW to 300MW range. For Microsoft to have brought online 4000MW of data center capacity in the last two years would require it to have completed thirteen or more of these projects, all while choosing not to promote them, with every project operating in such a veil of secrecy that no local or national news outlet reported a single one of them.  I truly cannot work out how Microsoft has brought on any more than 500MW of capacity in the last year based on my research, and think Microsoft is deliberately obfuscating whether said capacity was contracted rather than actively in-use , much like CoreWeave refers to itself having 3.1GW of “ total contracted power ” but only added 260MW of active power capacity in a single quarter at the end of 2025.  However, the exact verbiage used in Microsoft’s earnings transcripts is that it “added another gigawatt of capacity,” which sounds far more like it’s saying it brought them online… …but it didn’t, right? It obviously hasn’t. Where are all the data centers, Satya? Where are they? Why are your PR people too scared to tell me?  No, really, where are they?  So, to be fair, analyst Ben Bajarin, one of the more friendly pro-AI posters, argues that actually all of that capacity is secretly behind-the-scenes , something I’d humour if there was any kind of paper trail to a bunch of Microsoft data centers that were secretly being built.  I’d also be more willing to humour it if any of the data centers that have been publicized as “breaking ground” had actually been finished, or if both Fairwater Atlanta and Wisconsin weren’t so deceptively-marketed. My only devil’s advocate is that Microsoft could, in theory , be working with colocation partners to stand up several gigawatts of capacity through shell corporations and SPVs, but even then , not a single one has any sort of trail to Microsoft? All of that capacity?  It’s really, really weird, and the only answers I get are smug statements about how “Fairwater is ahead of schedule.” But if I’m honest, I’m having trouble even making these numbers add up. Considering how loud, offensive and conspicuous the AI bubble has become, it feels like we should have a far, far better understanding of how much actual capacity has been built. I also think it’s time to start being realistic about how long these things are taking to build. For example, I was only able to find a few data centers that for sure, categorically, definitively opened, and for the most part, it appears that a data center takes around 18 months to go from groundbreaking to opening. And these, I add, are all facilities that are relatively modest — at least, when compared to the kinds of gigawatt-scale campuses that are reportedly in active development.  Digging deeper, I found a lot of projects stuck in development Hell: While there are absolutely data centers under construction , and some, somewhere , are actually being completed , the vast majority of projects I’ve found are either in a mysterious limbo state or, in most cases, under construction years after breaking ground. Across the board, the message seems to be fairly simple: it takes about 18 to 24 months to build any kind of data center, and the bigger they are, the less likely they are to get completed on schedule. Those that actually “come online” aren’t actually fully constructed, but have brought on a single phase — something I wouldn’t begrudge them if they were anything close to honest about it. In reality, data center companies actively deceive the media and customers about the actual status of projects, most likely because it’s really, really difficult to build a data center. In any case, what I’ve found amounts to a total mismatch between the so-called “rapid buildout” of AI data centers and reality.  It also doesn’t make much sense when you factor in how many GPUs NVIDIA sold. In October last year, NVIDIA CEO Jensen Huang told reporters that it had shipped six million Blackwell GPUs in the last four quarters , though it eventually came out that he was counting two cores for every GPU , making the real number three million. I disagree with the framing, I think it’s incoherent and dishonest, but I’ve confirmed this is what NVIDIA meant. In any case, if we assume two cores per GPU, a B200 GPU has a power draw of around 1200W, for around 3.6GW of IT load for 3 million of them. I realize that NVIDIA also sells B100 and B300 GPUs (similar power draw) and NVL72 racks of 72 GB200 GPUs and 36 CPUs, but bear with me. Blackwell GPUs only started shipping with any real seriousness in the first quarter of 2025, which means that a good chunk of these data centers were built with H100 and H200 GPUs in mind. Nevertheless, I can find no compelling evidence that significant amounts — anything over 500,000 GPUs — of Blackwell-based data centers have been successfully brought online.  When I say I struggled to find data centers that had been both announced and brought online, I mean that I spent hours looking, hours and hours and hours, and came up short-handed.  I want to be clear that I know that there is Blackwell capacity actually being built , and believe that the majority of that capacity is retrofits of previous data centers, such as Microsoft’s extension to its Goodyear Arizona campus which it began building in 2018 that likely houses Blackwell GPUs. But I no longer believe that the majority of Blackwell GPUs are doing anything other than collecting dust in a warehouse. Blackwell GPUs require distinct cooling, a great deal more power than an H100, and cost an absolute shit-ton of money, making it unlikely that a 2023 or early-2024 era data center could handle them without significant modifications. I fundamentally do not believe more than a million — if that! — Blackwell GPUs are actually in service.  If that’s the case, NVIDIA is likely pre-selling GPUs years in advance — experimenting with the dark arts of “ bill-and-hold ” — and helping certain partners like Microsoft install the latest generation to create the illusion of utility, availability and viability that does not actually exist. If I’m honest, I also have serious questions about the current status of many H100 and H200 GPUs. Based on what I’ve found, I’d be surprised if more than 3GW of actual capacity was turned on in the last two years, which means that NVIDIA has sold anywhere from double to triple the amount of GPUs that the world can hold. While the Anthropic-Musk compute deal is an obvious sign about xAI’s lack of demand for compute, it’s also, as I mentioned earlier, a clear sign that AI data centers are mostly not getting finished, and those that do get finished are taking two or three years even for smaller builds. While it sounds a little wild, I think in reality only a few hundred megawatts — if that — of actual, usable AI compute capacity is being spun up every quarter. If I was wrong, there’d be significantly more progress on, well, anything I could find.  Why can’t Microsoft offer up a data center that isn’t called Fairwater, and why are its Fairwater data centers taking so long? How much actual capacity has Microsoft brought online? Because it certainly isn’t fucking 2GW in six months. I’m willing to believe that Microsoft has a number of collocation agreements with parties that don’t disclose their involvement. I’m also willing to believe that Microsoft doesn’t publicize every single data center it’s building or has built.  2GW of capacity is a lot. It’s nearly ten times the (likely) existing capacity of Fairwater Atlanta. If Microsoft is bringing so much capacity online, why can’t we find it, and why won’t they tell us? And no, this isn’t some super secret squirrel “they’re building secret data centers for the government” thing, it’s very clearly a case where “capacity” refers to “something other than data centers that actually got brought online. Despite their ubiquity in the media, AI data centers are relatively new concepts that are barely five years old. They are significantly more power-intensive than a regular data center, requiring massive amounts of cooling and access to water to the point that the surrounding infrastructure of said data center is often a massive construction project unto itself.  For example, OpenAI and Oracle’s Stargate Abilene data center is (in theory) made up of two massive electrical substations , a giant gas power plant and eight distinct data center buildings, each with around 50,000 GB200 GPUs, at least in theory. Every data center requires that power exists — as in it’s being generated in both the manner and capacity necessary to turn it on, either through external or grid-based power — and is accessible at the data center site. This means that every single data center, no matter how big, is its own construction nightmare. You’ve got the power, the labor, the permits, the planning, the construction firm, the power company, the specialist gear, the temporary power (because on-site power is slow ), the backup power (because you can’t just rely on the grid for something you’re charging millions for!), the cooling, the uninterruptible power supplies — endless lists of shit that needs to go very well or else the bloody thing won’t work. These are very difficult and large projects to complete. Edged Computing’s (theoretically) 96MW data center in Illinois is 200,000 square feet in effectively two large squares. For comparison, every single inch of gambling space in Caesar’s Casino Vegas is around 130,000 square feet . These things are fucking huge, fucking difficult, and fucking expensive, and all signs point to capacity not coming online.  Let’s go back to Anthropic mopping up Musk’s fallow data center capacity, which stinks of desperation for both companies. If there were modern data centers full of GB200s being turned on and available anywhere in the next month or two, wouldn’t it be more financially prudent to wait for it, even if it’s just on an efficiency level? A franken-center made up of H100s and H200s with some GB200s stapled onto the side feels like a stopgap solution. I have similar questions about the results of adding this capacity — that “...Anthropic plans to use [it] to directly improve capacity for Claude Pro and Claude Max subscribers ,” “doubling” (whatever that means) the 5-hour rate limit and removing the recently-added peak rate limits.  What’s the plan here, exactly? Less than a month ago Anthropic’s Head of Growth, Amol Avasare , said that Anthropic was “looking at different options to keep delivering a great experience for users” because Max accounts were created before the era of Claude Code and Cowork . How does adding 300MW of capacity magically resolve that problem? Was that always the plan?  Or was this a knee-jerk reaction to the surging popularity of OpenAI’s Codex ? Because the original justification for peak hours was that Anthropic needed to manage “ growing demand for Claude ,” demand that I bet Anthropic claims hasn’t gone anywhere. It’s also important to remember that last year, OpenAI’s margins (which are already non-GAAP), per The Information , were worse than expected because (and I quote) it had to “..to buy more expensive compute at the last minute in response to higher than expected demand for its chatbots and models.”  In other words, Anthropic has deliberately tanked its already-negative 2026 gross margins by desperately buying the fallow compute from a company whose CEO threw up the nazi salute , called the company “ misanthropic and evil ,” and has the “right to reclaim the compute” if Anthropic “engages in actions that harm humanity.” Surely you’d wait a few months for some new, less tainted source of compute, right? And surely it wouldn’t be such a big deal, because new data centers get switched on every day, right?  So, let’s get to brass tacks. Anthropic and OpenAI have now committed to spending $748 billion across Amazon Web Services, Google Cloud, and Microsoft Azure , accounting for more than 50% of their remaining performance obligations. The very future of hyperscaler revenue depends both on Anthropic and OpenAI’s continued ability to pay and both of them having something to actually pay for.  I also think it’s fair to ask why Microsoft’s theoretical gigawatts of new compute aren’t producing tens of billions of dollars of new revenue.  Microsoft’s $37 billion in annualized AI run rate (sigh) is mostly taken up by OpenAI’s voracious demands for its :compute , and only ever seems to expand based on OpenAI’s compute demands and the now 20 million lost souls paying for Microsoft 365 Copilot . There’s supposedly incredible, unstoppable demand for AI compute, and Microsoft is apparently sitting on gigawatts’ worth , but somehow those gigawatts don’t seem to be translating into gigabillions , likely because they don’t fucking exist. All of this makes me wonder what Google infrastructure head Amin Vahdat meant last November when he said that Google needed to double its capacity every six months to meet demand . Many took this to mean “Google is doubling its capacity every six months,” but I think it’s far more likely that Google is taking on capacity requests from Anthropic that are making said capacity demands necessary. Similarly, I think CEO Sundar Pichai’s comment that it would have made more money had it had more capacity to sell was a manifestation of a distinct lack of new capacity rather than a result of bringing on swaths of new data centers that immediately got filled. I also need to be blunt on two things: Look, I know it sounds crazy, but I’m telling you: I don’t think very many data centers are coming online! While I keep wanting to hedge my bets and say “I bet a few gigawatts came online,” I cannot actually find any compelling literature that backs up that statement. I’ve spent hours and hours looking, and I’ve come up with a few hundred megawatts delivered in the past two years. Every major project is stuck in the mud, a phase or two in, or facing mounting opposition from locals that don’t want a Godzilla-sized cube making a constant screaming sound 24/7 so that somebody can generate increasingly-bustier Garfields.  I’m not even being a hater! It’s just genuinely difficult to find actual data centers that have been announced that have also been fully turned on.   So, humour me for a second: if hyperscalers are bringing on hundreds of megawatts of capacity a year, then that means that the ever-growing quarterly chunks of depreciation ripped out of their net income are just a taste of what’s to come. Last quarter, Google’s depreciation jumped $400 million to $6.482 billion, with Microsoft’s jumping nearly a billion dollars from $9.198 billion to $10.167 billion, and Meta’s from $5.41 billion to $5.99 billion. While Amazon’s technically dropped quarter-over-quarter, it still sat at an astonishing $18.94 billion. Remember: depreciation only increases when an item is actually put into service. If Microsoft, Google, Amazon and Meta are sitting on tens of billions of yet-to-be-installed GPUs, and said GPUs are only being installed at a snail’s pace every quarter, that means that these depreciation figures are set to grow dramatically. In fact, year-over-year, Google’s depreciation has jumped 30.7%, Amazon’s 24.7%, Microsoft’s 23.9%, and Meta’s an astonishing 34.9% .  And that’s with an extremely slow pace of deployment.  I do kind of see why the hyperscalers are sinking capex into these big AI infrastructure gigaprojects now, though. Shareholders are currently tolerating the capex because they think stuff is coming online, and that’s where the “incredible value” is. When a $20 billion or $30 billion a quarter depreciation bill first rears its head — as I said, Amazon is close, reporting $18.945bn in depreciation and amortization expenses in the most recent quarter — it’ll become obvious that the only people seeing value from AI are Jensen Huang and one of the massive construction firms slowly building these projects.  Actually, it’s probably important to state that I don’t think the majority of these projects are doing anything untoward I just don’t think any of them realized how difficult it is to build a data center, and unlike basically any other problem the tech industry has ever faced, simply throwing as much money as possible at it doesn’t really change the limits of physical construction.  I think every one of these data center projects is its own individual construction nightmare, and thanks to the general market psychosis around the AI bubble, nobody has thought to question the core assumption that these things are actually getting built. With all that being said , I’m not sure that anyone building these things is moving with much urgency either. Perhaps they don’t need to — perhaps hyperscalers are happy, because they can continually string out both the AI narrative and put off those massive blobs of depreciation. But we really do need to reckon with the fact that nearly two years in, Stargate Abilene has only two buildings’ worth of actual, operational, revenue generating capacity, and nobody has given me an answer as to how it doesn’t have even a quarter of the 1.7GW of power it’ll need to turn everything on , if it ever gets fully built. Maybe they can really pick up the pace, but as of early April, barely any actual gear was in the third building.  And then we get to the other problem: Oracle. As I’ve discussed before, Oracle is building 7.1GW of total capacity for OpenAI , and keeps — laughably! — saying 2027 or 2028, when at this rate, Stargate Abilene won’t be done until mid-2027, and the rest either never get finished or are done in 2030 or later.  This is setting up a horrifying situation where Oracle desperately needs OpenAI to pay it for capacity that doesn’t exist, and if it ever gets built, it’s likely to be years after OpenAI has run out of money, which is the same problem that Microsoft, Google, and Amazon have with their $748 billion of deals with Anthropic and OpenAI, though thanks to the $340 billion or more necessary to build the Stargate data centers, Oracle’s problems are far more existential. I’ve repeatedly — and correctly! — said that the problem is that these companies didn’t have the money to pay for their capacity, but Oracle lacks Microsoft or Google’s existing profitable businesses to fall back on if these data centers are delayed, with its existing business lines plateauing and its only real growth coming from theoretical deals with OpenAI and GPU compute with negative 100% margins .  Anthropic’s desperation for new sources of  compute also suggests that it’s bonking its head against the limits of its capacity, and will continue to do so as long as it continues to subsidize its users . I also think that the slow pace of construction will eventually lead to OpenAI facing similar problems. These companies need to continue growing to continue to raise the hundreds of billions of dollars in funding necessary to pay Oracle, Google, Microsoft, and Amazon their respective pounds of flesh.  It’s now very clear that the whole “inference is profitable” and “most compute is being used for training” myths are dead, because if they weren’t, Anthropic would either need way more compute or way higher-quality compute. Colossus-1 was specifically built as a training cluster, yet its current use is “reduce rate limits for our subsidized AI subscriptions,” which is most decidedly inference provided by three-year-old hardware . Despite writing over 9000 words and driving myself slightly insane trying to find out, I still haven’t got an answer as to how much actual data center capacity has come online. Hyperscalers have clearly been retrofitting old data centers to fit their new chips, and based on my research, I can find no compelling evidence that they’ve added more than a few hundred megawatts a piece since 2023.  What I do know is that, across the board, a data center of anything above 50MW (or lower, in some cases) takes anywhere from 18 to 36 months to complete, and nobody has actually built a gigawatt data center despite how many people discuss them. For example, Kevin O’Leary — known as “Mr. Dogshit” to his friends — is allegedly building a 9GW data center in Utah , but he may as well say that he’s building a unicorn that shits Toyota Tacomas, as doing so is far more realistic than a project that will likely cost $396 billion, assuming that locals and bankers don’t drag him to The Other Side like Dr. Facilier .  Nobody has built a 1GW data center, so I severely doubt Mr. Dogshit will be able to do anything other than create another scandal and lose a bunch of people’s money. In other words, any time you hear about a “new data center project,” add a year or two to whatever projection they give. If it’s 2027, assume 2029, or that it never gets built. Anything being discussed as “finished in 2030” may as well not exist. In any case, what I’m suggesting is that very, very few data centers are actually getting finished, and if that’s true,  NVIDIA has sold years worth of chips that are yet to be digested.  And if that’s true, somebody is sitting on piles of them.  I’m trying to be fair, so I’ll assume that an unknown amount of data centers got retrofitted to fit Blackwell GPUs. But I also refuse to believe that even half of the three million Blackwell GPUs that got shipped have actually been installed. Where would they go? You can’t use the same racks for them that you would with an H100 or H200, because Blackwell requires so much god damn cooling. Another sign that these things aren’t actually getting installed is Supermicro’s $1.4 billion or so of B200 GPUs left in inventory from a canceled order from Oracle .  Why not? Isn’t this meant to be a chip that’s extremely valuable? Isn’t there infinite demand? Is there not a place to put them? Apparently Oracle wanted to use faster GB200 GPUs from Dell , but why aren’t there other customers lining up to buy these things?  Also… how was Oracle able to cancel an order of over a billion dollars’ worth of GPUs?  Can anybody do that? Because if they can, one has to wonder if this doesn’t start happening as people realize these data centers aren’t getting built. Pick a data center. It’s probably barely under construction, or if it’s “finished” it’s actually “partly done” with no real guide as to when the rest will finish.  Remember that $17 billion deal with Microsoft and Nebius signed ? The one that’s a key reason why Nebius’ stock is on a tear? Well, its existence is based on the continued construction of a data center out in Vineland, New Jersey facing massive local opposition, and multiple sources now confirm that construction has been halted due to local planning issues. The data center is horribly behind schedule already, and Microsoft has the option to cancel its entire contract if Nebius fails to meet milestones . That data center is a major reason that people value Nebius’ stock! It cannot make a dollar of revenue without its existence! It has the funds and blessing of Redmond’s finest — the Mandate of Heaven! — and it can’t get things done! This is bad, and indicative of a larger problem in the industry — that it’s really difficult to build data centers, and for the most part, they’re not being fully built! You’ve heard plenty about data centers getting opposed and canceled — how about ones that fully opened? No, really, if you’ve heard about them please get in touch, because it’s really difficult to find them. Why don’t we know? This is apparently the single most important technology movement since whatever the last justification somebody made up was, shouldn’t we have a tangible grasp? Because the way I see it, if these things aren’t coming online at the rate that people think, we have to start asking for fundamental clarity from NVIDIA about where the GPUs are, and when they’re coming online.  NVIDIA’s continually-growing valuation is based on the conceit that there is always more demand for GPUs, and perhaps that’s true, but if this demand is based on functionally selling chips two years in advance. That makes NVIDIA’s yearly upgrade cadence utterly deranged. Buy today’s GPUs! They’re the best, for now, at least. By the time you plug them in they’re gonna be old and nasty. But don’t worry, it’ll take two years for you to install the next one too! To be clear, Blackwell GPUs are absolutely being installed! But three million of them?  People love to use “enough to power two cities” to illustrate these points, but I actually think it’s better to illustrate in real data center terms.  Stargate Abilene has taken two years to build two buildings of around 103MW of critical IT load. 3 million B200 GPUs works out to about 3.6GW of IT load. Do you really think that nearly thirty five Stargate Abilene-scale buildings were built in 2025? If so, where are they, exactly? You may argue that other data centers are smaller, and thus it would be easier to build. So why can’t I find any examples of where they’ve done so?  By all means prove me wrong! It’s so easy! Just show me a data center announced or that broke ground in 2023 and find obvious proof it turned on. I’ll even give you credit if it’s partially open! The problem is that I keep finding examples of “partially complete” and those are the only examples of “finished” data centers.  Isn’t this a little insane? This is all we’ve heard about for years, everybody is ACTING like these things exist at a scale that I’m not sure is actually true!  I expect a fair amount of huffing and “well of course they’re coming online” from the peanut gallery, but come on guys, isn’t this all kind of weird? Even if you want to marry Sandisk and name your children “Western” and “Digital,” why can’t you say with your whole chest several data centers that got finished? We have macro level “proof” but when you try and look at even a shred of the micro you find a bunch of guys with their hands on their hips saying “sorry mate that’ll be another $4 million.”  Something doesn’t line up, and it’s exactly the kind of misalignment that happens in a bubble — when infrastructural reality disconnects from the financials. NVIDIA is making hundreds of billions of dollars and it’s unclear how much of it is from GPUs installed in operational data centers. It feels like Jensen Huang might have run the largest preorder campaign of all time.  This has massive downstream consequences. Sandisk, Samsung, SK Hynix, Broadcom, AMD, Microsoft, Google, Oracle, and Amazon’s remaining performance obligations total [find] and are dependent on being *able* to sell gigawatts worth of computing gear or compute access. If data centers are not getting built in anything approaching a reasonable timeline, that makes the future of these companies only as viable as the construction projects themselves. Even if you truly believe Anthropic will be a $2 trillion company and a $200 billion customer of Google, the compute capacity has to exist to be bought, and it does not appear to be built or, in many cases, anywhere further than the earliest stages of construction.  If they don’t get built in the next few years, there’s no space for that solid state storage or those instinct GPUs. There’s no reason for NVIDIA to have reserved most of TSMC’s capacity , either. There’s also no reason to get excited about Bloom Energy, as it’s not making real revenue on those until Oracle finishes its data centers sometime between the next two years and never .  And if they don’t get built, hundreds of billions of dollars have been wasted, with large swaths of those billions funded by private credit, which in turn is funded by pensions, retirements and insurance funds . I’ve got a bad feeling about this.  Microsoft claims to have brought around 4GW of data center capacity online in the last two years, but it’s unclear how much actually got built. In an analysis of all announced groundbreakings and land acquisitions, it appears that Microsoft has only finished the first phase of its Atlanta and Wisconsin data centers.  It is unclear where this capacity could be. When Mr. Nadella said on his most-recent earnings call that Microsoft had (and I quote) "added another gigawatt of capacity this quarter," did he mean active, revenue-generating capacity?  In the event he did not, what did he mean? How much active, revenue-generating capacity has Microsoft brought online in FY2026 so far? Outside of Fairwater Wisconsin and Atlanta, where has that capacity been built?  Microsoft’s latest update on the Hickory/Stover site is that it “will” begin “initial site setup and earthwork activities” as of February 2026, and it appears the contractor has changed from Ames Construction to Clayco. The latest Microsoft update on the Boyd Farms site is that it started construction on April 1, 2024. A February 2026 piece from the Charlotte Observer claimed it had started construction again after a 10 month (!) delay. The latest Microsoft update on the Lyle Creek site — which it adds began construction in March 2024 — is that its contractor, Whiting-Turner, “will begin initial site preparation once weather conditions allow” as of February 2026.  A press release from a Canadian satellite firm from February 2026 said that it had “identified renewed construction activity at all three of Microsoft’s permitted data center campuses in Catawba County North Carolina.” Novva’s 60MW data center in Reno, Nevada. Announced in May 2023, operational as of July 2025 , or around 26 months. Edged Energy’s 36MW Phoenix, Arizona data center that broke ground in August 2024 and opened in April 2026 , or around 20 months. Duos Edge AI’s 450KW (lol) data center in Corpus Christi, Texas that was announced in July 2025 and opened in May 2026 , or around 10 months. Edge Energy’s 24MW, Columbus, Ohio-based data center that broke ground in August 2024 and opened in September 2025 , or around 13 months. American Tower’s 1MW (scalable to 4MW!) Raleigh, North Carolina data center that broke ground in June 2024 and came online in May 2025 , or around 11 months. EdgeCore’s 36MW Santa Clara, California data center campus that broke ground in January 2023, said it would be “energized in Q1 2024,” and opened in September 2025 , or around 32 months . Edged Energy’s “180MW” data center in Atlanta broke ground in July 2023 , and around 33 months later in April 2026 ,  it managed to top off a single 42MW building . EdgeCore’s two-building, 216MW campus that broke ground in August 2023 with plans to complete “as early as late 2025” is, as of March 2026, still under construction. Edged Energy broke ground on a 100MW data center in Aurora, Illinois in May 2023 , and has, as of February 2025, successfully opened (per DataCenterDynamics) “phase 1” — 24MW of capacity — but in its own press release from the same day referred to it as 96MW , choosing not to refer to any phases or separate buildings, something it has done since before the 24MW phase was complete.  CyrusOne’s 40MW Aurora, Illinois data center broke ground in October 2024 , which was apparently so significant that CyrusOne would announce that it had broken ground a second time on January 28 2025 . Confusingly, CyrusOne has another campus it’s linking to the Bilter Road one on Diehl Road, which may or may not be the same one, and as of May 2026 is still very much under construction . As of March 2026, locals were still opposing the data centers , slowing down the process further. Vantage’s “192MW” OH1 data center in New Albany Ohio broke ground in October 2024 , with its first phase to be due live sometime in 2025. As of August 2025, Vantage had topped off the second building , and per its own website about OH1 , the first building was meant to be operational in December 2025, but it’s unclear whether it actually opened. PowerHouse’s 65MW data center campus in Reno, Nevada broke ground in October 2024 , and its website states that “delivery” will happen in April 2026, with “construction/delivery” due “Q3 2024 to Q2 2026.” Oppidan’s Carol Stream, Illinois data center broke ground in November 2024 , with the “first phase” due live in 2026. Per Clearview, it is still “ planned .” Databank’s 20MW Ashburn, Virginia “IAD4” data center that broke ground in July 2024 was “set to go live in Q1 2026,” and as of May 2026 is still referred to in the future tense on Databank’s website . Aligned’s 96MW “NEO-01” Ohio-based data center that broke ground in May 2024 was “scheduled to be opened by end of this year” as of March 2026 . Aligned’s 72MW Hillsboro. Oregon data center campus broke ground in October 2023 , topped off the first building in July 2024 (Aligned also plans a separate building, too!), and as of May 2026, Cleanview still marks the first one as “planned.” Flexennial broke ground on a Denver-based 22.5MW data center in October 2024 , and as of April 8. 2026, a local Facebook group has said that it will be operational by January 2027 .   Flexennial, on the other hand, has been referring to it as “ the new build ” — in terms that make it sound like it was built — as far back as February 2025. If hyperscalers are truly not bringing on that much capacity, they cannot make those hundreds of billions of dollars from Anthropic and OpenAI. The current “AI compute demand is insatiable” narrative is utterly false , and a direct result of a lack of capacity coming online.

0 views

Long Running Agent Engineering

What does it take for an agent to keep working after you leave? Not "answer a long question." Not "use a big context window." I mean actually keep working. Hours. Days. Maybe weeks. Wake up in a fresh session, understand what happened before, choose the next useful thing, make progress, verify it, leave the workspace cleaner than it found it, and do it again. For the last few years we have mostly talked about agents as if the hard thing was autonomy inside one conversation. Give the model tools. Put it in a loop. Let it call bash, edit files, search the web, open a browser, run tests. That loop is real, and it is already enough to change how software gets built. But long running agents expose a different problem. The agent loop is not the product. The harness is. The model does not naturally persist across turns, context windows, sandboxes, process crashes, or days of work. A fresh session is born with amnesia. It has no idea what the last session tried, which tests failed, which files were half edited, which plan is stale, which shortcut was tempting but wrong, or whether the thing it is about to mark done was already marked done three runs ago and later discovered broken. That is the real long running agent problem: handoff across amnesia. The answer emerging across Anthropic, Cursor, OpenAI, Claude Code, Addy Osmani's survey of long running agents , and the Ralph Wiggum community is surprisingly consistent. It is not one magical always awake model. It is not stuffing the whole history into a bigger window. It is a harness that externalizes state into the workspace, restarts agents with fresh context, uses machine verifiable checks as backpressure, and assigns completion judgment to something other than the worker that wants to be done. Here is the punchline up front: Long running agents are not long conversations. They are recoverable workflows. The model is one worker inside that workflow. The durable artifacts are the real continuity layer. It also helps to separate three ideas people collapse into one phrase: long horizon reasoning, long running execution, and persistent agency. A model can reason through a deep task without running for days. A process can run for days without remembering anything useful. An agent can remember the user without owning one large task. Production systems blur the three, but the engineering problems are different. Here's what I'll cover: The naive version of a long running agent is a single agent in a single conversation with a very large context window. This works for small tasks. It fails exactly where long running agents are supposed to matter. The failure is not just that the context window fills. A 200K or 1M token window still becomes a junk drawer if you keep pushing tool outputs, diffs, plans, screenshots, stack traces, and half obsolete reasoning into it. The model does not get a clean working memory. It gets an archaeological site. Anthropic's effective harnesses post frames this cleanly: complex tasks span multiple context windows, but each new agent session begins with no memory unless the environment itself tells the story. They describe two predictable failures. First, the agent tries to one shot too much, runs out of context, and leaves a half implemented mess. Second, a later session looks around, sees progress, and decides the whole project is done. That second failure is the one I keep seeing. The agent is not lazy. It is locally rational. It sees a repo with code, some tests, maybe a UI that loads, maybe a checklist with many items checked. In the absence of a crisp external completion contract, "looks basically done" becomes an attractive stopping point. Long running work makes this worse because every session inherits ambiguity from the previous one. Compaction helps, but compaction is not continuity. A summary can preserve some facts, but it cannot replace a workspace that is structured for recovery. This is the same lesson as agent memory engineering, just at task scale. Memory that lives only in the context window dies when the window dies. Work that lives only in the agent's chain of thought dies when the session dies. If you want continuity, put it somewhere the next worker can read. The architecture that keeps recurring looks like this: There are variations, but the spine is stable. Anthropic uses an initializer agent plus repeated coding agents. The initializer creates the environment future agents need: an , a progress file, a feature list, and a first git commit. Subsequent agents read the state, pick one not yet passing feature, implement it, test it end to end, update the progress log, and commit. The community Ralph Wiggum pattern is the minimal version: The important thing is not the loop. The important thing is what the loop forces. Every iteration starts with fresh context. Every iteration rehydrates from disk. Every iteration must leave disk in a state the next iteration can understand. Blake Crosley's Ralph Loop writeup describes the same pattern through stop hooks: intercept exit attempts, persist state to the filesystem, and restart with a fresh context window until machine verifiable completion criteria are met. Geoffrey Huntley's community guide reduces it to a beautiful primitive: a shell loop feeding a prompt file to the agent, with the implementation plan on disk acting as shared state between otherwise isolated runs. That is the thing people keep underestimating. The loop can be dumb if the workspace is smart. No blackboard server. No bespoke orchestration database. No vector store. No "agent society" with vibes based coordination. Markdown files, git, tests, and a process supervisor. Annoyingly simple. Annoyingly effective. The Ralph loop works because it replaces one degrading conversation with many clean attempts. The agent is not continuous. The workspace is. This flips the unit of autonomy. You stop asking, "Can this one conversation survive for ten hours?" You ask, "Can each session leave enough evidence that the next session can continue without asking me?" That means the agent's job is not only to build. It has to maintain the run state. A good Ralph prompt usually contains four contracts: This is not glamorous. It is project management for an amnesiac coworker. The loop also gives you a natural escape hatch. If the agent goes off track, you edit the plan. If the prompt is too loose, you add a guardrail. If the tests are weak, you strengthen the oracle. If the agent keeps duplicating work, you make completed work more visible. If it keeps touching unrelated files, you narrow the write scope. The prompts you start with are never the prompts you end with. Long running harnesses are tuned by watching failure patterns. That is why Ralph is more than a meme. It is the first pattern that made the correct abstraction obvious: the human sits outside the loop and engineers the environment, not inside the loop approving every step. The roles keep converging: Sometimes these are separate prompts. Sometimes separate models. Sometimes separate processes. Sometimes the judge is a test suite. Sometimes it is a small evaluator model. But the roles are conceptually different, and mixing them is where harnesses get mushy. The initializer is the first agent that touches the task. Its job is not to implement the product. Its job is to make implementation possible across many future sessions. Anthropic's initializer writes a comprehensive feature list. In their clone example, the feature list expanded the user's high level prompt into hundreds of end to end feature requirements, all initially marked failing. This prevents the later worker from inventing a tiny definition of done. A good initializer creates: The initializer is where you spend tokens to save tokens later. Every future worker starts faster because the workspace already has a map. The worker should not be asked to "finish the project." That is how you get giant diffs, brittle code, and fake completion. The worker should be asked to make one bounded unit of progress. The stop matters. A worker that never stops slowly turns into the bad single session architecture. Fresh starts are not overhead. Fresh starts are the mechanism that keeps drift from compounding. The worker should not be the final judge of completion. Workers want to be done. Not emotionally, obviously, but statistically. The completion token is attractive. The model has a strong prior toward wrapping up once the output looks coherent. On long horizon tasks this creates false positives. Claude Code's productizes this separation. You give Claude a completion condition. After each turn, a separate evaluator model checks whether the condition has been met. If the answer is no, the evaluator's reason becomes guidance for the next turn. The worker model is not the only judge of its own success. That one design detail is huge. OpenAI's harness engineering post describes a similar review loop: Codex writes code, reviews its own changes, requests additional agent reviews locally and in the cloud, responds to feedback, and iterates until reviewers are satisfied. They explicitly call this a Ralph Wiggum loop. The pattern generalizes: The judge does not have to be smarter than the worker. It just has to be fresh, narrower, and less invested in the worker's local narrative. Long running agents need durable state, but not all state is the same. If this state lives only in the transcript, the next session has to reconstruct it. If it lives on disk, the next session can read it. Anthropic's scientific computing post is the cleanest non web app example. Claude worked over multiple days on a differentiable cosmological Boltzmann solver and reached sub percent agreement with the reference CLASS implementation. The interesting part is not that the model wrote numerical code. The interesting part is the harness discipline around it: reference implementation, test oracles, persistent notes, git history, and quantifiable progress. Scientific computing makes the verification problem unusually crisp. You can compare your solver to CLASS or CAMB. You can plot error over time. You can watch the agent get closer to a reference implementation. That gives the run a real gradient. Most coding tasks have weaker oracles, so you have to build them. Long running agents magnify weak specs. A human can carry fuzzy intent across a week because humans have common sense, memory, and the ability to ask clarifying questions. An unattended agent will happily optimize the wrong proxy for hours. The more autonomy you grant, the more literal the state layer has to become. A long running agent without verification is just a text generator with file permissions. Verification is what turns motion into progress. This is why end to end tests matter so much. Anthropic observed that Claude would often mark features complete after shallow checks. Once explicitly prompted to use browser automation and test as a human user would, performance improved. That matches my experience. Unit tests are useful, but they are often too close to the implementation. Browser tests force the agent to confront the product surface. The right verification depends on the domain: The best verification is machine checkable and hard to game. The worst verification is asking the same model, in the same context, "are you sure?" That does not mean model judges are useless. They are useful when they judge surfaced evidence against a narrow condition. Claude Code's docs are careful about this: the evaluator does not run commands or read files independently. It judges what Claude has surfaced in the conversation. So the completion condition has to include how the worker should prove it. The judge cannot save you from a vague goal. It can enforce a crisp one. Single worker loops are enough for many tasks. But the moment you want to run hundreds of agents on one codebase for weeks, coordination becomes the whole game. Cursor's scaling agents post is useful because it talks about what failed. Their first approach let agents coordinate as peers through a shared file. Agents would check what others were doing, claim a task, update status, and use locks to prevent duplicate claims. This sounds reasonable. It is also exactly the kind of distributed system that gets weird fast. The problem is not that agents cannot coordinate. The problem is that peer to peer coordination asks every worker to think about the global project while also doing local implementation. That is too much. Cursor moved toward a planner worker judge hierarchy: This is the same role separation again, just scaled out. Workers should not coordinate with other workers if you can avoid it. They should receive a task with a bounded write scope, complete it, and report back. The planner should own the global dependency graph. The judge should decide whether the current state is good enough to continue, merge, or stop. This has a strong human engineering analogue. You do not ask every engineer on a large project to constantly negotiate the whole roadmap with every other engineer. You create ownership boundaries. You run reviews. You integrate. You keep the shared state legible. The hard part is choosing the grain size. Cursor's product follow up, Expanding our long running agents research preview , says long running agents produced substantially larger PRs while keeping merge rates comparable to other agents. That is the product significance. The harness lets agents take on work that previously exceeded the practical size of a single agent session. But "larger PRs with comparable merge rates" is not magic model dust. It is the result of better state, better delegation, better judges, and better recovery. Long running agents need a computer. That computer should be disposable. An agent that can run commands, install packages, edit files, open browsers, and call APIs is powerful enough to be useful and powerful enough to be dangerous. If you run it on your laptop with all your cookies, SSH keys, cloud credentials, and private files, the blast radius is ugly. The long running version makes this worse. A five minute agent can do damage. A five day agent can do creative damage. So the production architecture increasingly separates durable harness state from disposable compute. OpenAI's Agents SDK update points in this direction: model native harnesses, sandbox execution, filesystem tools, memory, manifests, and state rehydration. The key idea is that the agent gets a controlled workspace with the files, tools, and dependencies it needs, while credentials and durable orchestration live outside the sandbox. If the sandbox dies, the run should not die. The harness should rehydrate a fresh sandbox from the last checkpoint, mount the workspace, hand the worker the current state, and continue. This is the same principle again: state must outlive the worker. Sandboxing also changes how you think about tools. In a local interactive agent, giving bash broad access is convenient. In a long running cloud agent, every tool is a capability grant. Network, filesystem, credentials, browser profile, package installation, deploy keys, issue tracker access, email access. Each one needs scope. The Ralph community guide makes this point bluntly: assume the agent environment will be popped at some point, then ask what the blast radius is. That is the right mental model. The best long running harnesses will feel boring operationally: Boring is good. Boring means the agent can be weird without the system becoming weird. There are two product directions converging. The first is the practitioner loop: prompt files, plans, hooks, shell scripts, git commits. This is how power users run agents overnight today. It is messy, flexible, and close to the metal. The second is the productized loop: , cloud agents, background tasks, research previews, SDK harnesses, managed sandboxes. This turns the same patterns into a UX that normal teams can use. The underlying mechanics are more similar than they look. Claude Code's is basically a session scoped Ralph loop with a model judge. Cursor's long running agents are a cloud product built from planner worker judge orchestration. OpenAI's Agents SDK is standardizing the sandbox and filesystem substrate. Anthropic's harness posts are turning the workflow into repeatable environment design. The abstraction is moving up the stack. In 2024, you wrote your own while loop. In 2025, you wrote prompt files and hooks. In 2026, the loop is becoming a product primitive. But the product primitive still has to answer the same questions: The UI can hide the loop. It cannot remove the harness. Long running agents fail differently from short running agents. Short running agents fail by making a bad tool call, hallucinating an answer, editing the wrong file, or stopping too soon. Long running agents fail by accumulating drift. Each failure suggests a harness feature. This is why long running agent engineering looks less like prompt hacking and more like operating a tiny software organization. You need task intake, planning, execution, QA, review, release, rollback, observability, and security. The agent is the worker. The harness is the company. Here are the questions every long running agent system has to answer. My current bias: Fresh sessions beat giant sessions. A fresh context window that reads good state from disk is better than a stale context window carrying ten hours of tool output. Restarting is not giving up. Restarting is garbage collection. The workspace is the memory bus. Plans, progress logs, feature lists, tests, screenshots, git commits, and benchmark outputs are not side effects. They are the continuity layer. If the next worker cannot understand the run from disk, the harness is broken. Judges should be separate from workers. The worker can propose done. Something else should decide done. Ideally tests. Sometimes a model evaluator. Often both. The judge should inspect evidence, not vibes. External verification matters more than longer reasoning. A mediocre plan with a strong oracle will often beat an elegant plan with no backpressure. The agent needs reality to push back. Keep worker scope small. A long running system does not require each worker to do a long task. It requires the whole system to sustain progress across many bounded tasks. Make state disposable and regenerable. Plans rot. Progress logs bloat. Specs change. A good harness can regenerate the plan from the current repo and goal. Treat planning artifacts as useful scaffolding, not sacred truth. Sandbox by default. Long running agents should assume hostile inputs, accidental exfiltration, bad generated code, and runaway loops. Least privilege is not paranoia. It is table stakes. The human's job moves up a level. You stop micromanaging tool calls and start designing the environment: better specs, better evals, better prompts, better ownership boundaries, better recovery points. That last point is the real mindset shift. When code was scarce, the human wrote code. When code became cheap, the human reviewed code. When agents became persistent, the human designs the system in which code keeps getting written after they leave. OpenAI calls this harness engineering, and I think that phrase is going to stick. Harness engineering is the work around the model that makes the model useful over time: This is different from traditional software engineering. You are not only writing deterministic code paths. You are designing an environment that a non deterministic worker can repeatedly enter, understand, act inside, and leave in a better state. That is why the best long running agent harnesses feel weirdly old fashioned. Git. Markdown. Shell scripts. JSON checklists. Test suites. Logs. Small commits. Clear ownership. These are not legacy habits. They are the primitives that survive context death. The future of long running agents is not one immortal session thinking forever. It is many mortal sessions, each with a clean context window, waking up inside a workspace that remembers. So back to the original question: what does it take for an agent to keep working after you leave? Not a bigger prompt. Not just a better model. A durable state layer. A crisp goal. A fresh worker loop. A judge that is not the worker. Tests that push back. Git history that tells the story. Sandboxes that can die without killing the run. Logs that let the human tune the system when it fails. The model is the engine. The harness is the vehicle. And the companies that get this right will not merely have "agents that run longer." They will have agents that can be trusted with larger units of work because the work is recoverable, inspectable, and verifiable. That is the threshold that matters. Not autonomy as theater. Autonomy with a receipt. Why Long Sessions Fail - Context windows rot, agents declare victory early, and half finished work becomes invisible The Architecture That Won - Fresh worker sessions plus durable workspace artifacts The Ralph Loop - Why a dumb restart loop beats a single heroic conversation Initializer, Worker, Judge - The three roles that keep showing up State Outside the Model - Feature lists, progress logs, plans, git history, tests, and notes Verification As Backpressure - Why test oracles matter more than better pep talks Multi Agent Coordination - Why peer to peer locks break and planner worker hierarchies survive Sandboxing and Rehydration - Why long running execution needs disposable compute and durable state What This Means For Agent Design - The checklist every long running harness has to answer Where does state live? What does a new worker read first? How does it choose work? How does it prove progress? Who decides it is done? How do you recover from a bad turn? What happens when the sandbox dies? What is the budget? What is the blast radius?

0 views
Ludicity 3 weeks ago

The Worlds Left To Conquer

It has been a year and a half since I quit my job to start a consultancy. It took me years to build up to quitting, and I had not only a chip on my shoulder, but to quote Seth Sentry, “the guac and the dip and the salsa.” The people that read this blog probably understand what I’m talking about. I looked around at how organizations are run, at the people that told me what to do, and thought “Surely I could do a better job than this.” This feels like a dangerous train of thought. On one hand, that arrogance is precisely one of the mechanisms that makes someone incompetent. If you’ve learned everything, there’s no real reason to open up another book, and even that is rather generously assuming that the person providing a service to you has bothered to crack the spine on even one . On the other hand, how else are we to make sense of the world? If you walk out the door, you will be immediately clotheslined by institutions failing to achieve the most basic of tasks with any reliability. Almost every office I’ve walked into as an employee has been a decrepit nest populated by the beaten-down working class, a sickly ooze of self-important managers amongst whom a Gladwell reader ranks as a towering intellect, and executives that are feverishly muttering the word “AI” to credulous journalists as they blindly cut headcount. So many of these institutions seem to be held together by either regulatory capture or writhing clients bound by enterprise contracts like so much barbed wire. I’ve lost track of the number of times that someone has looked at work from a company like KPMG and gone “Ha ha ha, maybe we should all be consulting – then we can do terrible work and bill at two thousand dollars a day.” This joke is so overused that you can see the person saying it is reluctantly dispensing the cliché. So when I kicked off the company, some traitorous part of me was hoping that it would be difficult, as horrible as that would be for me personally. If it was hard, yes, perhaps I’d have to go back to some miserable office and be beset on all sides by smiling imbeciles talking about innovation, but it would make sense . It simply can’t be that easy to be free of those structures. Surely there’s a reason for it that isn’t simply “Wow, we’re systematically producing people that are terrible at their jobs and they can’t even see it.” Unfortunately, that really is most of the explanation. In late 2025, I said I’d write more after admitting how awkward it is to say the business is going well. I haven’t written anything for five months, and there’s no delicate way to put this, I drastically understated how well we’re doing. I'm ripping off the bandaid: in February 2026, I realised that we had already generated enough revenue to last us until 2027. On some engagements, I split my income several ways with teammates that weren’t on the job and still exceeded my corporate salary. For forty hours in 2025, I broke a thousand dollars an hour on tasks with measurable success metrics, an amount of money that would have seemed like some sort of sick joke two years ago, and both customers asked for a repeat engagement because the service quality was higher than what specialised firms were doing – I had spent about ten hours thinking about the engagement model. And we still have seven months left in the year. All of this is to say two things. The first, I’m not going to pretend that everyone would find it as easy as I do 1 , but it’s easy enough that basically anyone that can read both a book in software and the humanities will be fine. 2 The other is that this was all so easy that I’m going mad with boredom. Crept to their door, opened it slowly and tip-toed but, shit Somebody set the bar too low and I tripped over it Whoops, jumped up, tried to throw in a quick ultimate Just hopin' to scare 'em but, oh, it just killed both of 'em Bodies with slit throats on the linoleum I just throw 'em in dumpsters, the shit's appropriate Blue Shell , Seth Sentry I wish that I could say it was difficult to make things work. It would make sense of the world. I could have fun talking about going extremely overboard with machinations . The reality is that all of it, from service delivery to sales, has been more-or-less trivial. Closing and delivering a deal for twenty thousand dollars takes less time and energy than one sprint in a regular office. Nothing even feels high stakes – the global economy is so large that, for an efficient team, you can roll the sales conversation dice over and over until it turns up a 20. I personally blundered hundreds of thousands of dollars in sales over our first six months, and we’re fine . As a company, there are many things that I'd like to improve – it might sound silly given that we’re doing well and all our customers are happy (or lying to me), but the places where we're falling short of my expectations are extremely visible to me. By virtue of having a sizable following on this blog, I have extensive exposure to programmers that are better than me and people that are smarter than me. Every Thursday, I have a call with Efron Licht , and frankly I can scarcely grasp why someone that competent spends time talking to me 3 . The problem is that I’m not competing with Efron. If I was, I'd either have to study for five hours every day for the rest of my life, or shut the company down tomorrow. I’m competing with people that don’t have functional literacy. And it’s not just incompetence at programming, it’s everything. The world has phoned it in, leaving us with no pressure to push for excellence. Last year, I was unable to put clients on both Evidence and Prefect because the former failed to attend a sales meeting booked through their website and the latter failed to book a meeting after the ex-real estate agent they hired failed to actually schedule a meeting following outreach also through their website . Our (excellent) accounting team is Hales Redden , who managed my co-founder Jordan Andersen’s old physiotherapy business… because the people I tried in Melbourne don’t check their sales inbox. Our lawyer is reader Iain McLaren 4 because the firms I initially tried also don’t respond to their sales inbox . I cannot state this clearly enough – the bar is so low that it is hard to give people money . There are competent actors on the market, but at least in software, there are simply so few of them that you’re more likely to be allies than enemies. This was infuriating at first, comical later, and has now lapsed into depressing. As an employee, these people were an unending source of frustration, the same six-figure delinquents that would forget to renew my contracts when I was on a temporary visa. As an independent operator, they’re babies that have yet to develop executive function and I’m taking their candy. I’ll do it – candy is delicious and babies are weak – but it's hard to feel good about it after the thrill of being right wore off. Some days, I get to 5PM after pitching to fix a competitor's work, put my head in my hands, and go “There is no way you dumb motherfuckers can’t stand up a database. We’ve been on the moon. We’ve been on the fucking moon . There’s no way you dipshits cannot operate Google.” Nonetheless, there is money in my bank account and I’m in a house with three bedrooms, and we must all reckon with this dreadful portent. Is this it? I’m just going to stand up data platforms for the next forty years, a task so easy for us that we could do it drunk out of our minds, then die? As much as I enjoy having free time, the whole affair has been oddly unsatisfying. Every day, I wake up and feel like I’ve opted out of society. I don’t have the same problems as my peers anymore. Daily stand-up is a hazy memory that I remember with faint queasiness. And the very nature of consulting, even though we make the majority of our money on technical delivery rather than pure advice, is that we’re simply adding efficiency to clients. We’ve had the luxury of firing a few for bad vibes 5 , leaving us only with clients that we’re very happy to work with – but at the end of the day, they‘re doing the thing worth being proud of, and we’re simply an instrument. They do the admirable thing, and we make them better at it. It’s better than continuing to be an ultra-coward and getting paid to let people Do Scrum at me, but I dunno. Part of the reason that we’ve done so well to begin with is that we haven’t worried about scaling at all. I still think that is the obviously correct decision when you’re starting off and don’t want to take on debt. But at the same time, when a reader asks me if I’m hiring, my answer is essentially, “The whole business is designed for the team to be comfortable, and we didn’t build in the leeway to take care of other people.” My largest expenses outside of housing over the past year have been donations to a local writer’s group, Meridian Australis , and various bits to other causes, but this amounts to a few thousand dollars per year. I’m probably supposed to be content with that, but I’ve already quit my job, so what’s a bit more risk? Why am I always reading about unreflective narcissists and tedious bootlickers funding things? Why can’t the causes I care about have resources thrown at them without them having to contort their value systems for the money? At any rate, the passage is crystal clear in both cases: Alexander is not weeping in sorrow that there are no more throats to cut. This is not a picture of a man at the end of a career of world conquest; he’s at the beginning. “Look at all these throats—and I haven’t even cut one!” And Alexander Wept , Anthony Madrid We still run into problems all the time that aren’t solvable by simple efficiency – perverse incentives from sloppy legislation, places where buyers can’t understand enough to avoid exploitation, gambling companies run by vile degenerates, things that make me want to throw up. I am fully engaged with capitalism every day, and despite the fact that I’m winning for some definition of winning, much of it is grotesque. Sometimes I wonder whether I should have gone into medicine, like most of my family, but at the same time someone has to keep the databases running. So here’s what’s going to happen for now. We have seven months left in the year. Around the start of June, we’ll be done with our most complex work, and ready to try something new, where by “something new” I mean we’re going to pick some nerds (pejorative) and cut their throats. The area that we’ve picked out specifically is technical recruiting, if only because it is the most accessible area that is most densely populated with easy prey. It should take us a little bit to knock out a small platform 6 , then I’ll broadcast that here for readers to sign up. We’ve done some work in the space, and all I can say is that software recruiters are defenseless money piñatas incapable of serving the competent sectors of the market, and I am going to beat them with a large stick and then loot the wallets from their corpses. Is this it? I’m just going to stand up data platforms for the next forty years, a task so easy for us that we could do it smashed out of our fucking minds, then die? At a rough estimate, every time we place someone that would otherwise have had to go through the hellish experience of conventional recruiting, we could plausibly knock one individual recruiter out of the market because of their slim margins (due to all the incompetence), which will temporarily satisfy my never-ending lust for blood. Then we’re going to take that money and use it to knife someone else that's causing negligent misery, and funnel some of the excess into things we care about. If we do a really good job, I really believe we can meaningfully distort some section of the market, even if that’s just “Ugh, everyone knows you can't recruit software engineers in the A$180K band in Melbourne. Those Hermit Tech folks have destroyed all the margin and established themselves as supreme dictators, and also their CEO will bully you online if you do a bad job.” I’m going to commit economic violence for the next forty years, and get so good at it that we can do that smashed out of our minds, teach other people how to do it, then die, and some of you will pick up the work where we left off. I’ve had a sale for $100,000 fall through, and twenty minutes later said “Easy come, easy go” and moved on with my life. I’m sure this is trainable, but I can’t take credit for this because I think I’m just a weirdo.  ↩ It is unbelievable how much of a competitive advantage “Responds to emails from paying clients within 24 hours” is. The bar is subterranean.  ↩ Incidentally, the two largest influences on my company’s culture are Jesse Alford and Efron Licht, on team culture and programming fundamentals respectively. I don’t think Jesse has written anything particularly friendly for mass-consumption, but Efron has an amazing series called Starting Systems Programming that has been transformative for my practice. It might seem obvious to some of the most talented programmers in the audience, but I cannot recommend it highly enough for everyone else. If you enjoy it, I’m sure he’d get a huge kick out of an email, as I don’t think he has analytics. I’ll do a writeup on all my influences at some point, as the list is long and they all write quite a bit.  ↩ Certified Cool Dude, by the way.  ↩ To no one’s surprise, they’re mostly startups.  ↩ Think “limited window for candidate signups and extreme pickiness about employers, no CVs, and a hard limit on interview stages, and so on”, not Seek. I don’t think Seek has done anything wrong , they’re just the inevitable result of the state of letting the entire market use their service.  ↩ I’ve had a sale for $100,000 fall through, and twenty minutes later said “Easy come, easy go” and moved on with my life. I’m sure this is trainable, but I can’t take credit for this because I think I’m just a weirdo.  ↩ It is unbelievable how much of a competitive advantage “Responds to emails from paying clients within 24 hours” is. The bar is subterranean.  ↩ Incidentally, the two largest influences on my company’s culture are Jesse Alford and Efron Licht, on team culture and programming fundamentals respectively. I don’t think Jesse has written anything particularly friendly for mass-consumption, but Efron has an amazing series called Starting Systems Programming that has been transformative for my practice. It might seem obvious to some of the most talented programmers in the audience, but I cannot recommend it highly enough for everyone else. If you enjoy it, I’m sure he’d get a huge kick out of an email, as I don’t think he has analytics. I’ll do a writeup on all my influences at some point, as the list is long and they all write quite a bit.  ↩ Certified Cool Dude, by the way.  ↩ To no one’s surprise, they’re mostly startups.  ↩ Think “limited window for candidate signups and extreme pickiness about employers, no CVs, and a hard limit on interview stages, and so on”, not Seek. I don’t think Seek has done anything wrong , they’re just the inevitable result of the state of letting the entire market use their service.  ↩

0 views
Kaushik Gopal 3 weeks ago

Agents are the new compilers. Specs are the new code.

Linus Torvalds recently said 1 AI will be to code what compilers were to assembly — freeing us from writing it by hand. Around the same time, I talked with Jesse Vincent (creator of one of the most popular agent skills out there — superpowers ). Something he said stuck with me: Specs are going to be the new code . I realize those two ideas snap together a little too neatly. Agents are compilers 2 and specs will become code. Software engineering is moving up another level of abstraction and we’ve seen this play out before. I saw this first-hand with my tiny USB-C cable checker — . It started as a shell command over macOS’s , then became Go when I wanted a proper binary, then Rust because I wanted to practice Rust, and later a version. The code kept changing. The thing I cared about did not: parse the USB tree, identify the attached devices, report the speed, and make bad cables obvious. , my voice track sync program, followed the same pattern. It started in Python because the audio libraries were there. Then I moved it to Rust because I didn’t want to ship a Python runtime or care which Python version happened to be on a machine. Again, the implementation changed. The behavior stayed boringly stable: take a master track and local tracks, find the offset, pad or trim each file, and drop aligned audio into the DAW. Compilers freed us from writing assembly. Agents may free us from writing code because it becomes an artifact the spec produces. The somewhat recent push around detailed exec plans could be an early signal of the looming shift at bigger scale. Push that thought further. We might get comfortable rebuilding whole modules instead of patching and refactoring them. We preserved the old shape of a system because throwing it away cost too much. Even when you know the module is wrong, you sand it down: extract an interface, migrate one caller at a time, add tests around behavior nobody fully understands. You keep moving because the alternative is a rewrite, and rewrites have a well-earned reputation for eating companies alive. But agents change that cost curve. If an agent can read the spec, understand the tests, inspect production traces, and rebuild a module in an afternoon, the sensible move may be to replace the entire module altogether. Push that even further and the unit of work changes. You stop asking an agent to patch one function or file. You ask it to rebuild the entire payment module against the tweaked spec. Heck, swap out the auth layer with a new library. Or regenerate the API boundary, now that the domain model is clearer. This is the part I cannot stop thinking about. Each rebuild can start from what we now understand about the whole module, not from what we believed the first time someone shipped it. Tech debt the old code carried (because it grew one patch at a time) can finally come off. The spec can absorb what we learned from the old implementation: the weird edge case in billing, the migration path nobody wrote down, the customer whose workflow depends on a “bug”, the batch job that only fails on the first day of the month. Specs become the place where the system’s memory lives. Once those lessons move into the spec, the implementation becomes replaceable. We are becoming Spec Writers. starts at the 1:48 mark  ↩︎ Yes, agents aren’t deterministic the way compilers are — same prompt tomorrow may give different code. But that may be the wrong bar moving forward. What has to stay stable is behavior under the spec; the code can vary. Also my dude, are you seriously nitpicking with Linus Torvalds?  ↩︎ Each rebuild can start from what we now understand about the whole module, not from what we believed the first time someone shipped it. Tech debt the old code carried (because it grew one patch at a time) can finally come off. starts at the 1:48 mark  ↩︎ Yes, agents aren’t deterministic the way compilers are — same prompt tomorrow may give different code. But that may be the wrong bar moving forward. What has to stay stable is behavior under the spec; the code can vary. Also my dude, are you seriously nitpicking with Linus Torvalds?  ↩︎

0 views
Susam Pal 3 weeks ago

From RSS to Atom

Yesterday, I switched my website from RSS feeds to Atom feeds. In case you are wondering whether you have somehow landed on an ancient post from 2010, no, you have not. Yes, this is the year 2026, and I have finally switched from RSS feeds to Atom feeds. Yes, I am fifteen, or perhaps twenty, years too late. I have always wanted to do this but could never make the time for it. Finally, it happened while I was giving my brain some rest from my ongoing algebraic graph theory studies. That's when I felt like spending a little time on my website and doing a little Lisp to change the feeds from RSS to Atom. I suppose this was impulse coding , a bit like impulse buying, except that I ended up with an Atom feed instead of a new book. I find it quite surprising that when I have plenty of time, it usually does not occur to me to do these things, but when I am too busy and really short of time, these little ideas possess me during the short breaks I take. My personal website is one of my passion projects. Common Lisp is one of my favourite programming languages. So any time spent on this passion project using my favourite programming language is a very relaxing experience for me. It serves as an ideal break between intense study sessions. It took about an hour to implement the changes needed to make the switch from RSS to Atom. In the end, I could go back to my studies reinvigorated. In case you are curious, here is the Git commit where I implemented the change from RSS to Atom: 596e1dd . As you might notice, a large portion of the change consists of replacing the attribute in each post with the attribute. The attribute value was used as the value of the element in the RSS feeds. While an arbitrary short string could serve as the element for the items in an RSS feed, the element of the entries in an Atom feed needs to be a URI. It turns out UUID URNs are a common choice for such a URI. I ran the following shell command to replace all occurrences of the attribute with : The rest of the changes went into the feed templates and the Common Lisp program that statically generates the feeds along with the website. For examples of the resulting feeds, see feed.xml and absurd.xml . The first is the main website feed and the second is an example of a tag-specific feed. Yes, the aforementioned Common Lisp program generates a feed for each tag . As of today, the main feed at feed.xml contains only two entries even though this website has over 200 pages . I explain the reason later in Temporary Workaround . Here is an example Atom entry from my feeds: The ellipsis ( ) denotes content I have omitted for the sake of brevity. I like how each entry in the feed now has its own UUIDv4. I also like that timestamps in an Atom feed are in the format specified in RFC 3339 , which also happens to be a profile of ISO 8601. Further, I like that I can explicitly declare the content type to be HTML. Commonly used values for the content type attribute are , and . If it is , the content should be escaped HTML. If it is , the content should be an XHTML element containing valid XHTML. Explicit content type support is likely the biggest advantage of Atom over RSS. In comparison, RSS 2.0 does not specify any way to declare the content type. So feed readers have to inspect the content and guess what the content type might be. As I mentioned before, as of today, the main feed contains only two entries. That's because only new posts published since the migration to Atom are now included in the feed. This was done to avoid spamming subscribers. The Atom specification's requirement that each entry's ID must be a URI has caused the IDs of every entry to change. If I were to include the older posts from before the change in the feed, then those posts would appear as new unread items. Subscribers can find this quite annoying. In fact, I have received a few complaints about this in the past. So I was careful this time. I have a little one-liner workaround in my site generator to exclude posts published before this change from the feed. That was the only workaround I had to implement. Fortunately, my feed file had a neutral name like , rather than a format-specific name like , so I could avoid a URL change and the subsequent overhead of setting up redirects. Does any of this matter today? I think it does. Contrary to the recurring claim that RSS and Atom are dead, most of the traffic to my personal website still comes from web feeds, even in 2026. Every time I publish a new post, I can see a good number of visitors arriving from feed readers. From the referrer data in my web server logs (which is not completely reliable but still offers some insight), the three largest sources of traffic to my website are web feeds, newsletters and search engines, in that order. On the topic of newsletters, I was surprised to discover just how many technology newsletters there are on the Web and how active their user bases are. Once in a while, a newsletter picks up one of my silly or quirky posts, which then brings a large number of visits from its followers. Back to the topic of web feeds, there is indeed a decent user base around RSS and Atom feeds. A good number of visitors to my website arrive by clicking a feed entry that shows up in their feed reader. I know this with some confidence by looking at the (sic) headers of visits to my HTML pages and the subsequent browsing of the website, as opposed to the isolated and automated fetches of the XML feeds. So there must be a reasonably active base of users around web feeds. It is a bit like being part of an invisible social network that we know exists and that we can measure through indirect evidence. I found these three resources useful while switching to Atom feeds: Read on website | #web | #technology Impulse Coding Atom Entries Temporary Workaround Does It Matter? W3C Introduction to Atom W3C Feed Validation Service RFC 4287 : The Atom Syndication Format

0 views
Manuel Moreale 1 months ago

11 down, 33 more to go. Plus a cave.

We had another lovely, sunny weekend last week, and that means I walked the second of the ten segments of the 44 votive churches loop. This time around, I didn’t have to mess with the route in order to hit all the churches in one go because there were no variants. And, like last time, I was not alone. I had a friend coming with me, which is always nice. Don’t get me wrong, I enjoy walking solo, but I also enjoy walking in good company. The plan was the same: meeting at the arrive, leaving my car there, driving back to the starting point and take off from there. And that’s exactly what we did. The last time we parked some 600 meters away from the actual end—because there was no parking there—so the first chunk of today’s walk is the final part of segment number 1. Clearly visible on the left, up on the hills, is the small village of Antro where we’re headed. One of the six churches we’ll visit on this walk is waiting for us right there, and it’s a good one. But first, without even realising it, we’re already at the site of the church of San Luca Evangelista (7/44). I’ll be honest with you, this is quite an uninspiring one. It’s also not in a nice location, very close to the street. I’d have completely missed it if it weren’t for my watch. And this post is sponsored by Suunto… just kidding. It is quite handy to have the whole route planned on the watch though, because it vibrates when I’m near one of the churches since are stored as POIs. No pictures of the inside since the windows were boarded and the door was locked. All of them are locked, quite annoying if you ask me. But that’s modern society for you. The church was likely first built around the year 1250, but it was for sure consecrated in 1568 by the Bishop of Cattaro, also governor of the Patriarchate of Aqui leia . We leave the first church behind us, we turn left, we cross the Natisone, and we start climbing up, heading towards Antro. The first part of this walk is not super inspiring since it’s on paved roads, but it is what it is. One day, I might attempt to make a modified version where I only walk on asphalt when absolutely necessary. Could be fun. We pass through Biacis and next to the Antro Bank Slab , an old artefact symbol of the self-government of the Friulian Slavia, developed around the end of the XI century. The path takes us behind the stone and out of the village, and we’re headed in the direction of the church of San Giacomo Apostolo (8/44) next to the “castle” of Ahrensperg. I put it in quotes because it’s more like a nice cottage with a tower than an actual castle, but the whole place is lovely, I have to say. Dual bells, like most of these churches, and I had to resist the temptation to make them ring since the ropes were dangling right there, out in the open. I can be quite the mischief, but I also don’t like to bother people, so we didn’t touch anything. Also no way to take pictures of the inside, it was way too sunny. The church dates back to the mid-12th century, and the stone we saw earlier was kept under the outside portico. Church behind us, the trail is taking us around it and the castle and up through the woods. Two unexpected sights, one after the other, are awaiting us. The first is this concrete monstrosity, which I have absolutely no clue about what it actually is. It’s a very odd-looking structure, quite tall, I’d say 15 or 20 meters tall, with three tunnels going through underneath. It’s clearly something industrial, but I have never seen something similar in my life. Plus, it’s now covered in vegetation, which makes it even harder to get a sense of what it actually is. Reminded me of Horizon Zero Dawn, if you played that game, you know what I’m talking about. The next unexpected sight was a shrine. Very neglected, it’s quite literally falling apart, with a tarp on its roof put there just to prevent water from doing even more damage. As always, it’s dedicated to Mary, which is not unusual here since the iconography of Mary is way more presente than Jesus for some reason. There are Marys everywhere in the valleys if you start paying attention to them. Up the forest we go, and we have finally reached Antro. If you suffer from OCD, don’t look at its bell tower with the off-centre clock face. It’s driving me nuts. We have some time to wait here because we have booked a tour of the caves for 11 am, and we’re way too early. So we spend some time chilling in the shade of the trees with a nice view of the village. It’s all very relaxing, and there’s a small number of people who are also waiting to go see the church and the cave. It’s now time to go, so off the path we go to reach the ticket stand. The ticket to visit the church is 8€, and there’s an app you can download that serves as a guide. But to visit the cave, you need to book a visit with a guide for 10€. On the app, you’re asked to use headphones, and yet some people were obviously blasting it on their speakers. Again, that’s society in 2026 and the main reason why I want to go live into the woods. Up the 86 steps of the old stairs we go, and we have reached the very unique church of San Giovanni Battista (9/44) nested inside the cave. The current church got rebuilt in the mid 1500 after the quakes of the beginning of the century—like many of the 44 churches—and it’s quite unique. It’s also sometimes used as a venue for events. The most fun part is that right behind the altar, you can see the cave unfolding. And it’s right behind the altar that the guided tour starts. Sadly, only the first 300 or so meters of the cave is accessible to the public, and the rest is only accessible if you’re a speleologist. The whole cave is quite big, some 4 or 5kms and there are apparently rooms that are bigger than the opening one, where the church is located. I’d love to visit it, but I think I’m too tall for this type of stuff. One fun aspect of this cave is that apparently twenty-thousands years ago it was inhabited by the ursus spelaeus , the cave bear. One less cool aspect was all the writings on the walls of the cave. Why are people so fucking obsessed with writing on everything? Also, why can’t we have nice things? Anyway, the guided visit is done, it’s now time to get back on track since we have most of the walk still in front of us. So out the cave we go and down the stair, to then take a sharp right turn and walk below the entrance of the cave. There’s a nice view of the whole area from down here. Definitely worth visiting if you’re ever in this corner of the world for some random reason. We’re almost 3 hours into this walk (even though we have spent most of the time either waiting or inside the cave), and it’s now time to gain some elevation since most of it is spread on this next chunk that will take us pretty much to the highest point of the walk and also the next church. Unsurprisingly, after some twists and turns, what do we find? Another random Virgin Mary, this time in a shell. After some more walking inside the forest, we are back on paved road for a little while. We are high enough to have a nice view of Mount Matajur, the peak that dominates the area. That is also gonna be the target of the next hike since the third chunk of this walk goes from down the valley up to that mountain. Not to the very top, but come on, there’s no way I get all the way up there, and I also don’t reach the summit. So you’ll get to see it up close soon enough. We’re now almost at the site of the church of Santo Spirito (9/44), but before we walk up the final 50 or so meters, we need to cross path with guess what? You’re right, another Virgin Mary. We’re roughly 4 hours into this walk, and the location of the church of Santo Spirito is perfect to take a break and eat something. I mean, just look how relaxing this place feels: So far, this might be my favourite location, even though the church itself is probably the ugliest one. And also the youngest. The original one was built probably before the year 1000, but then everything got destroyed during bombardments in WW2 and the current building dates back to 1949. So it’s not even a century old, and it’s in rough shape already. It’s nice to take a break and relax for a bit. It’s a lovely day, perfect weather, and there’s no rush. Plus, we have company! Ok, lunch is done, shirt is dry, it’s mostly downhill from now on, so off we go through the forest again. After a little while, we pass next to the ruins of the old Church of San Nicolò, which, if it wasn’t for my watch vibrating, I’d have completely missed because this thing is barely visible even if you are paying attention. We also stumble across whatever—or whoever—this guy is. I had to take a picture and send it to my brother since that’s his name. Through the forest, across the fields, back into the forest again, out of the forest yet again we’re now almost at the point where we can see the new location of the church of San Nicolò Vescovo (10/44). I have to say, it’s a lot easier to spot compared to the old one, which is completely covered by vegetation and in total ruin. But it’s also quite big, and I don’t know, I guess I’m more of a fan of the tiny ones hidden inside the forest. This one feels like a normal church to me. Only one church is left, and then the final descent to the end of this hike. But first, I need to stop and take a picture of something, and by now you might have an idea of what it is. And here we are, we have reached the location of the final church of today’s hike, the church of San Donato, hidden inside the forest, with its missing bell and its lovely appearance. Now, fun fact: the door has a hole in it with a cover you can swipe aside. Is this a glory hole? We’ll never know. What we do know is what’s inside it because I did peek inside that hole. What a fun experience this was! The only thing left for us to do now is to walk down the forest, take a wrong turn because the GPS messed up, do some bushwhacking, find the correct trail again, walk some more, pass next to a bunch of other Marys—there are always more Marys—cross the Natisone once again and reach our final destination. And here we are, arrived at the park where we left my car, some 7 hours and 16kms later . This was a very relaxing walk, it can easily be done in probably 3 and a half hours. But why rush when you can spend some time outside and enjoy nature? I did update the iCloud album with the new pictures, so if you want to see more from this walk, click that link. You love the outdoors and RSS. You're one of the special ones.

0 views
Evan Schwartz 1 months ago

Your Clippy Config Should Be Stricter

“If it compiles, it works.” This feeling is one of the main things Rust engineers love most about Rust, and a reason why using it with coding agents is especially nice. After debugging some code that compiled but mysteriously stopped in production, I realized that it’s useful to enable more Clippy lints to catch bugs that the compiler won't prevent by itself. It's especially useful as guardrails for coding agents, but stricter linting can make your code safer, whether or not you’re coding with LLMs. Scour is the personalized content feed that I work on. Every Friday, Scour sends an email digest to each user with the top posts that matched their interests. On a recent Friday, the email sending job mysteriously stopped. This was puzzling because I had already put in place multiple type system-level safeguards and tests to ensure that it would continue with a log on all types of errors. After digging into the logs, I found the culprit to be . A function naively truncated article summaries without checking for UTF-8 character boundaries, which caused a panic and stopped the Tokio worker thread running the email sending loop. The solution for this particular bug was a safer method for truncating article summaries that respects UTF-8 character boundaries. However, this problem was reminiscent enough of the 2025 Cloudflare bug that "broke the internet" that I wanted some more general solution. Rust's compiler prevents many types of bugs but there are still production problems it can't catch. Panics will either crash your program or quietly kill Tokio worker threads. Deadlocks and dropped futures can make work silently stop. And plenty of numeric operations can silently cause incorrect behavior. We can stave off many of these types of bugs by making Clippy even stricter than it already is. This is especially relevant in the age of coding agents. A seasoned Rust engineer might naturally avoid patterns that could cause problems. An agent or a junior colleague might not. Stricter Clippy rules make it easier to rely on code you didn't personally write. Also, enabling new lints on an existing codebase is tedious, and exactly the kind of task that is good to hand to a coding agent. Clippy ships with hundreds of lints that are disabled by default. Some are disabled because they might have false positives and some are style choices which you might reasonably not want. Which lints should we enable to help us get back the "if it compiles [and passes Clippy], it works" feeling? Clippy's lints are grouped into categories : Correctness, Suspicious, Complexity, Perf, Style, Pedantic, Restriction, Cargo, Nursery, and Deprecated. Unfortunately, none of these categories cleanly map onto "don't let this panic or do the wrong thing in production". In fact, the Clippy docs say that "The category should, emphatically , not be enabled as a whole." Clippy even includes a dedicated lint, , to discourage you from enabling this category. While the category includes many useful lints, it also includes some that directly contradict one another. For example, it contains lints to enforce both and . The docs say "Lints should be considered on a case-by-case basis before enabling". Of course, you can enable whole categories like and and then specific ones you want to disable, but I'm outlining a selective opt-in here. Even if you don't use a certain pattern in your code base today, it's not bad to enable the lint anyway. Inapplicable lints serve as cheap tripwires in case the given pattern is ever added later, whether by you, a colleague, or a coding agent. Every project is different and you should look through the available lints to see which ones make sense for your project. Also, check when lints landed in stable if your Minimum Supported Rust Version predates 1.95, as some of these may have been added after your MSRV. With those caveats out of the way, here are the lints I enabled, roughly categorized by what kind of behavior they prevent. You can skip to the bottom if you just want to copy my config . This group prevents panics from unwraps and unsafe slicing or indexing into arrays and strings. Note that some of these, like and may produce many warnings throughout your code base. That may be annoying to fix. However, using safe methods like and iterators instead of slicing prevents pretty severe footguns, so I would argue that it's worth it. You might or might not want to enable . Calling on an or can result in a panic. However, the message you pass to should already document why that thing shouldn't happen. Enabling the lint and then selectively disabling it throughout your code with may end up duplicating the same rationale for using it in the first place. Another lint that is a real judgement call is . This can prevent overflows and division by zero. However, it will cause Clippy to warn you about every place you use math operators: , , , , , and . I tried enabling it in my code base and would estimate that around 15% of the warnings caught real issues and 85% was just noise. These prevent various concurrency bugs and deadlocks: The lints , , effectively force you to document invariants when doing lossy casts between numeric types. You might or might not find that useful. These two are especially useful if you're using a coding agent. Instead of letting the agent write , it should provide a reason wherever it's disabling a lint. If you're using a Cargo workspace, you'll want to enable these lints in the workspace Cargo.toml. Unfortunately, each workspace crate needs to opt in to inheriting lints with , rather than inheriting the lints by default. On nightly, there's a lint that specifically checks for this. If you're using stable Rust, you can use or a simple shell script run on CI to make sure you don't forget to make a workspace crate inherit the lints. When enabling lints, you can either set Clippy to or them. Either works but I personally prefer setting these to and running Clippy with before committing and on CI. This makes local iteration marginally easier because you can compile your code initially without fixing all the lints right away. Ultimately, as Clippy's docs say, "You can choose how much Clippy is supposed to annoy help you." But especially in the age of coding agents, I think it's worth tightening the guardrails so you end up with even fewer mysterious bugs in production and more code where you can say "if it compiles and lints, it should work." Discuss on r/rust , Lobsters , or Hacker News . - on (UTF-8 boundary panic). This would have caught my initial bug. / / - placeholder-panic macros - inside functions that return a - panics if the second is larger - / inside a function that returns a - drops without awaiting - swallows errors - silently drops - loses source error - discards the error message (only relevant if you're using an earlier edition than 2024) - deadlock pattern. The scoping was fixed in the 2024 edition so this is no longer an issue. - a that is too large can cause a stack overflow - every needs a comment - one unsafe op per block (one comment per op) / - only document safety where it belongs - on floats - stricter, also flags comparisons against constants - silently-rounded float literals ( ) - wraps to - always false - ( is single-threaded) - differs in debug vs release - method named returning non- - manual impl that disagrees with - impl whose error is should be - calls should be removed after debugging - every becomes - every requires a reason

0 views
Langur Monkey 1 months ago

Local TTS is getting very capable and accessible

Around 2007 I spent half a year in the University of Aberdeen working on my final year project involving NLP . The project consisted of an interactive game that was controlled by language input. It also had to produce speech. At that time, we managed to partner with a group at La Salle University that were working on a TTS system for Catalan. It was a closed system that was accessible via a web API, but it was far too slow for real time use. I ended up preprocessing the audio of all dialog in the project. At that time, I was amazed that a computer could so easily convert text to an understandable audio file. The voice was very robotic, and the results were hit or miss, but it worked . Fast forward to today, TTS systems are everywhere. Several groups have released low-parameter TTS models that run very well on consumer hardware. I have been using the lightweight Kitten TTS for a while with fantastic results. The models are so lightweight that some websites are heavier than entire Kitten TTS models: Projects like streamline and trivialize Kitten TTS inference. I have a shell script in one of my directories that does everything in a single command: This clones the project, pulls dependencies and models, and plays the audio. It is quite fast, especially when using cached data. Kitten TTS produces acceptable results, though the output usually lacks emotion and nuance. For simple use cases (reading notifications, generating voiceovers for scripts) it’s more than sufficient. Qwen3-TTS , which I’ve been recently testing, represents a step-up in quality. It’s extremely good, and local inference is practical even on modest hardware given the model sizes. It offers three interesting variants: The voice design models are particularly clever: you describe the voice you want alongside the text to convert. Want a deep, gravelly voice with a Scottish accent? Or an excited teenager talking about a video game? Just describe it. It’s remarkable that you can run this locally so easily. However, as far as I know there’s no off-the-shelf CLI tool that handles dependencies, downloads the model, and runs inference out of the box. That’s why I created QwenSay . With it, you can clone the repository and convert text to speech locally from your terminal without wrestling with dependencies or writing any code. Here’s how it works. First, set it up: Now, you are ready to convert your text to speech with Qwen3-TTS: This uses the default 1.7B voice design model. You can also specify the model with . There are many other CLI arguments that you can use to tune your output. Check out the repository documentation for more details. Whether you’re building accessibility features, creating voiceovers for projects, or just experimenting, this is worth a try. I’ve made QwenSay my go-to TTS tool because it produces high-quality results and is genuinely fast.

0 views
Brain Baking 1 months ago

Nostalgia Always Includes a Temporal Context

Last year, Forrest wrote a long and thoughtful commentary on the mysterium called nostalgia . In a desperate attempt to recreate the experience of playing The Elder Scrolls: Oblivion for the first time, he spent rebuying original Xbox 360 hardware expecting to be propelled back into his childhood: I spent $236.74 to go back to 2006, to my old room, with the Easy Mac and the Diet Cherry Coke and the perfect lighting. I wanted to boot up the old 360 again, feel the smoothness of that perfectly curved controller in my hands while I navigated that glorious Blades user interface. I wanted to feel that little hit of dopamine whenever that green-and-gray badge popped up for unlocking an achievement or whenever a friend logged in or whatever. It worked. For a while. But then the effect started to wear off. Even more desperate methods were employed to try and relive those golden past moments, but after the 100th time of reliving it, the soft edges disappeared and even the best memories got blurry. It’s funny what our minds do: recreate past events by putting together piece by piece—often in the wrong order, or by picking the wrong piece. We don’t replay the exact past event in our heads: we re-create it, including a big margin for error. I’ve been reading Alex Custodio’s Who Are You? Nintendo’s Game Boy Advance Platform where Custodio argues that the GBA is not just the GBA—that is, the piece of hardware that you hold in your hand with a Game Pak cartridge. Instead, it is the GBA, the cart, Nintendo’s story, the players, the modding community, and also something very important: the unique time period in which all of this unfolds. Trying to recreate sweet childhood memories of a robust Nintendo handheld clutched in hands, fighting for the best spot outside with indirect sunlight, will always end in disappointment. It’s no longer that time . Friends no longer show up with their Link Cables. Some software no longer works. Regardless of whether you did: the world has moved on. Any modern retro-inspired Nintendo knock-off handheld is now capable of rendering more than four shades of grey. A screen that lights up apparently is a thing. The electrical devices we love to carry around now apparently are focused on locking us in and stealing our privacy instead of just doing one thing and doing it right. My dad often says that every era has its own charm. My response to that usually is frustration: frustration of being unable to get out of the current stressful time and frustration of being unable to go back to that time when things were still joyful and I was still oblivious to the meaning and effects of capitalism on this world. I often want to flee from the current situation and world I’m stuck in, and the number one place I want to flee to is that place this younger fellow inhabited: Hello there, Younger Self. Watcha Doing? Doesn’t he look happy? Game Boy pouch nearby, four spare AA batteries and a Turtles II: Back from the Sewers cartrdige no doubt inside, he found the easiest way to ignore his sisters and parents: flip that switch, hear *PLING*, and just play. Ever since I got a Game Boy, we were best friends. The above photo is taken somewhere in Spain on a summer holiday where I met locals on the poolside that were curious as to what I was playing. We didn’t understand each other. But we did. We did. Donde esta Link Cable? Mortal Kombat? Si? and to my sisters, foolish enough to engage in the act of swimming: Hey! No splashing here! I’ve been exploring falling block puzzle games on the GB that I might have missed in my youth such as Yoshi’s Cookie . Even though I miss the physical flip switch, I nowadays play these on my Analogue Pocket for obvious reasons: my eyes aren’t what they used to be, and that backlit screen estate and resolution is just amazing. No perfect childhood memory recreation here but I would be lying if I wrote “I don’t care at all”. Otherwise I would just play a few of them on an emulator on my laptop—or just completely ignore the GB library. In a way, I too nearly spent to relive my childhood: the Pocket is ridiculously expensive, especially if you take shipping to Europe into account. Additionally, the Pocket requires physical carts: which poor soul still scours flea markets for GB carts when any GB(C)(A) ROM can be downloaded in a whim? But in 2026, I can’t really play Yoshi’s Cookie the way Nintendo intended it to be played in 1993. I don’t have any friends who still own Game Boy cartridges, let alone a Link Cable. Yoshi’s Cookie ’s single player mode is dull: the game begs for a local competitive play. That time window has passed: the temporal context has changed drastically. I have fond memories of afternoon GB sessions with friends, but I’m afraid they will stay just that: a memory. A nostalgic one with the potential of being frustrating if I don’t force myself to look at the now and into the future instead of only at the past. More similar fond memories include blindly swapping Game Boy carts with a random local kid’s mom in the hope of getting a new Kirby game without having to ask my parents for it, exchanging fruit with American strangers at night in bed with my DS Lite by visiting their Animal Crossing: Wild World village, playing Mario Kart DS/7 with a (3)DS during lunch break with colleagues, … If everybody—especially time—has moved on, then why haven’t I? Why do I call myself a retro gamer? I do play new games now and then but let’s not fool ourselves: every single one of these “new” titles is either a remaster/remake or a game that heavily draws from its nostalgic ancestors. The ones I love the most are the pixelated gritty ones with gameplay mechanics firmly rooted into the past seasoned with a bit of modern ease-of-use features, and that last one has a practical reason since as a parent of young kids I’m often pressed for time. Custodio’s Who Are You and Forrest’s commentary on nostalgia made me realise a few things. First, I am not alone in longing to go back to a simpler time when things weren’t enshittified as badly as they are now and when we weren’t yet made aware of how dark this world can be. Second, the temporal aspect that plays a critical role in all this, disabling the ability to perfectly reenact a happy memory, acts as a pressing reminder for me: that I more often need to look to the here and now. Yesterday afternoon, our family made a short trip to a nearby town looking for a nice place to walk. Our eldest of course discovered a playground in the process. In twenty years, her childlike joy in discovering ants, playgrounds, waving at strangers, and blowing dandelion seeds into the air will become just like my current nostalgic yearning. We brought along her tricycle and a passerby smiled and said to us Enjoy It! Undoubtedly, most people who throw heartwarming smiles were young parents once. I guess they mean we should “capture the moment” because before you know it, the moment is gone? But how do you enjoy the moment when you’re exhausted (there’s that word again), the toddler won’t shut up for a single second, the youngest is yelling again because he lost his pacifier, at home more chores like sweeping, mopping, cooking, folding clothes, … are only piling up? These current hectic and stressful moments now will become nostalgic moments later. Without a doubt, my wife and I will look back as these moments and sigh: wasn’t it great when they were little and only uttered oh and ah! instead of shut up dad, I’m not doing that ? Without a doubt, our mind will have filtered out the most stressful and depressing moments. We’ll have blissfully forgotten these. Just as I now blissfully forgot the many frustrating moments of a boy and teenager living with their parents and sisters, not truly being free. I do not regret moving out as soon as I could. If I had the choice to go back in time with some of my current wisdom still intact, I would deliberately Enjoy It more often. Caress the durable grey plastic shell. Slide in and out the carts a few more times just to hear that unique clicking sound again. Bug my friends way more about bringing their GB and Link Cables. A growing body of research on nostalgia identifies two major types: reflective nostalgia and restorative nostalgia . The former is the kind you are grateful for, the bittersweet moments you’ve enriched your live with. The latter is the desperate and longing part that increases the feeling of loss. I might just be on the wrong side of nostalgia. Attempts to relive that nostalgic moment—and thus moving from reflective to restorative—always end in a failure. The games I play together with the DOS Game Club are another example of this effect: Jazz Jackrabbit suddenly feels like a cheap Sonic rip-off that mechanically barely holds together, while back in 1994 it was the most mesmerising thing I ever saw on my granddad’s 486. Fortunately the soundtrack still slaps, but perhaps that’s probably because of how auditory stimuli recreate these nostalgic moments differently? In addition to trying to escape the bigger responsibility and wanting to go back to that ignorance-is-a-bliss state of mind, my second biggest reason for being nostalgic is mourning the loss of my previous selves. I often go back to the past to try and understand the current me but perhaps that’s more a bad than a good thing. Some of the things a past version of myself liked or did no longer fits with my current self. I don’t fully understand the reasoning behind this yet, but I hope I will. In the meantime, I’ve added a few nostalgic-oriented works on my reading list to gain a bigger insight into how all this theoretically is supposed to work. Perhaps then I can start to save myself from my past self. Related topics: / nostalgia / By Wouter Groeneveld on 27 April 2026.  Reply via email .

0 views

Premium: How OpenAI Kills Oracle

Soundtrack — Brass Against — Karma Police   It was January 21, 2025. Per The Information , Larry Ellison, CEO of Oracle, had just flown to Washington DC from Florida, and had to borrow a coat “...so he wouldn’t freeze during an interview he did on the White House lawn, according to two people who were involved in the event.” He was there to announce a very big — some might even say huge — new project standing next to SoftBank CEO Masayoshi Son and OpenAI CEO Sam Altman. “Together, these world-leading technology giants are announcing the formation of Stargate, so put that name down in your books, because I think you’re gonna hear a lot about it in the future. A new American company that will invest $500 billion at least in AI infrastructure in the United States and very, very quickly, moving very rapidly, creating over 100,000 American jobs almost immediately,” said President Donald Trump . After he was done, Ellison stepped to the podium. “The data centers are actually under construction, the first of them are under construction in Texas. Each building’s a half a million square feet, there are ten buildings currently being built, but that will expand to 20.” Following Ellison, SoftBank’s Masayoshi Son added that Stargate would “...immediately start deploying $100 billion dollars, with the goal of making $500 billion dollars within [the] next four years, within your town!” turning to Donald Trump with his hands extended. It was unclear what town he was referring to. Altman added that it would be “an exciting project” and that “...we’ll be able to do all the wonderful things that these guys talked about, but the fact that we get to do this in the United States is I think wonderful,” though it’s unclear what “the wonderful things” or “this” refers to. It’s been 15 months, and Stargate LLC has never been formed. SoftBank and OpenAI have contributed no capital to the project, other than SoftBank’s own acquisition of a former electric vehicle manufacturing plant in Lordstown, Ohio that it intends to turn into a data center parts manufacturing plant with Foxconn, which is best known for effectively abandoning a $10 billion factory in Wisconsin back in 2021 . Oh, and Project Freebird, a SoftBank-built project that exists to funnel money to its subsidiary SB Energy , though I can’t imagine how SoftBank actually funds it. No government money was ever involved, no funding ever left anyone’s bank account, no "initiative" ever existed, and OpenAI, Oracle and SoftBank have, in my opinion, conspired to mislead the general public about the existence and validity of a project for marketing purposes.  The “data centers actually under construction” referred to a 1.2GW project in Abilene Texas that had been under construction since the middle of 2024 , and had originally been earmarked by Elon Musk and xAI, except Musk pulled out because he felt that Oracle was moving too slow . While Ellison said that there were ten buildings under construction with plans to expand to twenty, only eight were actually being built ( each holding around 50,000 GB200 GPUs across NVL72 racks ), with the extension up in the air until March 2026, when Microsoft agreed to lease 700MW — so another seven buildings — that were meant to go to OpenAI. These buildings will not make Oracle any money, as Oracle is, despite spending so much money, leasing whatever land it uses from Crusoe. As far as those eight buildings go, only two are actually online and generating revenue, though sources with direct knowledge of Oracle’s infrastructure have informed me that work is still being done on both buildings despite CNBC reporting that they were “ operational ” in September 2025.  Let’s break this down. Based on a presentation by landowner Lancium from May 2025 , the Stargate Abilene campus was meant to have 1.2GW of AI data centers online by year-end 2025. Based on reporting from DatacenterDynamics, the first 200MW of power was meant to be energized “ in 2025 .” As time dragged on, occupancy was meant to begin in the first half of 2025 , had “ potential to reach 1GW by 2025 ,” complete all 1.2GW of capacity by mid-2026 , be energized by mid-2026 , have 64,000 GPUs by the end of 2026 , as of September 30, 2025 had “ two buildings live ,” and as of December 12, 2025, Oracle co-CEO Clay Magouyurk said that Abilene was “on track” with “more than 96,000 NVIDIA Grace Blackwell GB200 delivered,” otherwise known as two buildings’ worth of GPUs.  Four months later on April 22, 2026, Oracle tweeted that “...in Abilene, 200MW is already operational, and delivery of the eight-building campus remains on schedule.” It is unclear if that’s 200MW of critical IT capacity or the total available power at the Abilene campus, and in any case, this is only enough power for two buildings, which means that Oracle is most decidedly not “on schedule.”  Sources familiar with Oracle infrastructure have confirmed that while construction has finished on building three, barely any actual tech has been installed. It also appears that while construction has begun on a power plant of some sort, it’s unclear whether it’s the 360.5MW gas power plant or 1GW substation. In any case, Abilene needs both to turn on the GPUs, if they ever get installed. Abilene is, for the most part, the only part of the Stargate project that’s anywhere near complete. I say that because the other data centers — Shackelford, Texas, Port Washington, Wisconsin, Doña Ana County, New Mexico, Saline, Michigan, and Milam County, Texas — are patches of land with a few steel beams, if that . To be explicit, every single Stargate data center is funded by Oracle and its respective financial backers. Oracle is taking on a massive amount of debt to build these data centers, working with a labyrinthine network of financiers and construction partners to pull together the capacity necessary to get paid for its five-year-long $300 billion compute deal with OpenAI .  Oracle has also, per Bloomberg , deliberately raised money using “ project financing ” loans that are repaid using the projected cashflow, allowing it to keep the massive amount of debt off of its balance sheet. This is remarkable — and offensive! — because it’s borrowing over $38 billion to fund construction of its Wisconsin and Shackelford data centers (the largest debt deal of its kind on record) and said debt will now effectively not exist despite its massive drag on Oracle’s cashflow, which sat at negative $24.7 billion in its last quarterly earnings . Based on estimates ($30 million in critical IT and $14 million in construction per megawatt) from TD Cowen’s Jerome Darling, the total cost of Oracle’s 7.1GW of data center capacity will be somewhere in the region of $340 billion to build. All of these data centers are being built for a single tenant — OpenAI — which expects, per The Information , to lose over $167 billion (assuming it hits annual revenues of over $100 billion) by the end of 2028, and as a result does not actually have the money to pay Oracle for its compute on an ongoing basis. In addition to its commitments to Oracle, OpenAI has also made commitments to spend $138 billion on Amazon over eight years , $250 billion on Microsoft Azure over an unspecific period , $20 billion with Cerebras over three years , $22.4 billion with CoreWeave over five years , and a non-specific amount with Google Cloud .  All of this is happening as Oracle’s core businesses plateau, even after Oracle reshuffled them in Q3 FY25 to represent Cloud, Software, Hardware and Services segments, the latter three of which have barely moved in the last 9 months as low-to-negative-margin cloud compute revenue grows.  In other words, Oracle’s only growth comes from a segment requiring hundreds of billions of dollars of compute.  To make matters worse, every single one of these data centers is behind schedule. Stargate Abilene was meant to be done at the beginning, middle, and now the end of this year, yet sources tell me there’s no way it’s finished before April 2027. Bloomberg also reported late last year that Oracle had delayed several data centers from 2027 to 2028 , but here in reality , every other Stargate data center is somewhere between a patch of dirt, a single steel beam , multiple steel beams , or less than half of a shell of a single building . Considering it’s taken two years for Stargate Abilene to build two buildings, I don’t see how it’s possible that these are built before the beginning of 2029. And at that point, where exactly will we be in the AI bubble? What GPUs will be available? What other kinds of silicon will exist? What will the demand be for AI compute? I don’t think that OpenAI exists for that long, and even if it does, it will have to raise at least $200 billion in the space of three years to possibly keep up with its commitments. I’m surprised that nobody ( outside of JustDario , at least) has raised the seriousness of this situation. Stargate, as it stands, will kill Oracle, outside of OpenAI becoming the literal most-profitable and highest-revenue-generating company of all time within the next two years. Even then, by the time that Abilene is built, its 450,000 GB200 GPUs will be two-years-old, and entirely obsolete far before its debts are repaid. A similar fate awaits whatever GPUs are put in the other Stargate data centers. Today’s newsletter is a thorough review and analysis of the ruinous excess of Stargate, a name that only really means “data centers being built for OpenAI in the hopes that OpenAI will pay for them.” Oracle is mortgaging its entire future on their construction, and even if it gets paid, I see no way that the cashflow from OpenAI’s compute spend can recover the cost before its GPU capex is rendered obsolete, let alone whether it can cover the debt associated with the buildout. I’m Larry Ellison — Welcome To Jackass. Welcome to the end of Oracle, or Sell The Compute To Who, Larry? Fucking Aquaman ? The total estimated cost of Oracle’s Stargate capacity is around $340 billion. OpenAI needs to make, in total, $852 billion in both revenue and funding through the end of 2030 to keep up with its compute costs with Oracle, Amazon, Google, CoreWeave and Microsoft. Oracle cannot afford to pay for the cost of construction and equipment out of cashflow, and has had to take on over $100 billion in debt and sell $20 billion in shares . Across a potential 7.1GW of planned Stargate capacity, Oracle stands to make around $75 billion in annual revenue. Abilene is expected to generate around $10 billion a year in revenue on completion for a project that will likely cost in excess of $58 billion. Stargate Abilene is extremely behind schedule, and likely won’t be finished until Q2 2027. Oracle estimated in 2024 that Abilene would cost it $2.14 billion a year in colocation and electricity fees. Oracle has spent over $5 billion in construction costs on the first two buildings of Abilene, with sources saying that it will likely spend over $10 billion to finish them, suggesting an overall cost of around $48-per-megawatt. Oracle’s remaining Stargate sites are barely under construction, and will likely not be finished before the end of 2028. Even if Oracle builds the data centers and OpenAI pays for them, the incredible upfront cost and NVIDIA’s yearly upgrade cycle will render much of the GPU capacity worthless within the next ten years.  And if OpenAI fails to pay, Larry Ellison likely has over $20 billion in personal loans collateralized by over $60 billion in Oracle shares, meaning that margin calls will follow with the collapse of Oracle's stock.

0 views
Brain Baking 1 months ago

Hello Again, SuSE Linux

It’s good to see you again, old friend. It’s been a while. Twenty-three years, you say? How come we managed to drift apart that far? I know, I know, I betrayed you. But my room was cold at night and Gentoo offered me the ability to keep on compiling. And then I betrayed GNU/Linux for FreeBSD. And then I switched the demon for the apple. I’ve been on an apple diet for so long now, I can barely remember the tux. What is it you say? Oh, it’s openSUSE now. Sure, you’re a chameleon, you can take on any colour you’d like. Great to see it’s still green. I like green. How’s YaST doing these days? ? What’s that, no more ? That’s cool, it looks like you’ve made some progress! Let’s make a screenshot the proper nerdy way and do some in a terminal! Oh, that’s no longer cool? ? So first and then that command? Let’s try that: openSUSE Tumbleweed running on the HP work laptop. My last experience with the Linux desktop was indeed about twenty years ago. Since 2012, I’ve been a macOS user. I’m no longer proud of it: I miss Linux and I think macOS is boring and full of bloat . Yet the rise of the Apple silicon made me buy another one in 2020, which is still the one I’m using right now. The hardware is amazing, the screen is amazing, and the weight and fanless features are amazing. But I still miss customisation features—the ability to truly make the desktop mine—and I stopped updating the OS as a protest to ever increasing bloatware. This laptop is still running Sonoma which is bad enough as it is. I have no intentions to go out and buy another machine any time soon; this one’s still doing fine; but I did start to wonder. What if… I got a ThinkPad and installed Linux back onto it? Would the hardware match the high standards I’m accustomed to now? Would I still be able to make my way around the OS? I ran GNOME 2 and KDE 3 (and then got nerdy with Fvwm). I compiled my own Linux 2.6 kernel patches back when that was brand new. I have no idea what’s happening now. That’s not to say that I don’t touch Linux: I use it daily to host this website, to run the NAS at home, and in virtual containers. But that’s not the Linux Desktop Experience . My main motivation for moving away from Linux was my frustration with endless configuration and compilation. Back in the day, hibernate didn’t just work out of the box, the fan speed had to be configured depending on the type of the laptop, nVidia drivers were a pain (still are), etc. Work and life started getting in the way: I no longer had endless seas of time on my hand to go nuts with Gentoo. With two young kids, that times has dwindled even more, so NixOS or even Arch is out of the question. Being fed up with the crappy Windows 11 installation on my work laptop, I wiped that partition and remembered my old friend The European Chameleon. So here I am, testing the waters yet again. Thanks to Valve, Lutris is amazing . KDE Plasma feels mature (even though some configuration settings seem sluggish). I don’t want to dive into the rabbit hole of AwesomeWM (but I do). I don’t want to try and live without systemd or have to hurt my brain about X11 vs Wayland. I want the thing to “just work”. I want my chameleon to be an apple. A proper one, like a “back in the day” apple one. I haven’t had the time to give openSUSE a proper trail. I’m mainly fighting my muscle memory with versus that strange location which is somewhat diminished by Toshy that then doesn’t work well in combination with my Emacs configuration. What I did notice is that hibernation/suspend is still ugly: if I close the lid for a night without putting the laptop in true hibernate mode (with its dedicated swap partition), the battery drain is ridiculous, especially coming from a MacBook Air that I just jam shut and open up again a hundred times a day. This made me realise I will probably have to give up on the hardware quality part if my next laptop is going to be a non-Apple one. Which I don’t really want to? Seb and I discussed which laptop to get when ours would break down. The Framework is an obvious one as are the System76 ones that specifically support Linux. Alex White’s everyday carry post made me realise the build quality of these is average at best. It’s going to be a painful experience migrating from that. I know Kev is happy with his Framework , but I’m not yet fully convinced. The fact that this HP EliteBook 6 G1a 16 work laptop’s screen and overall build quality is terrible is not helping either. The touchpad palm detection experience is horrible on KDE. Let’s first give the chameleon another chance to see if on an OS level I could live without macOS and my usual mac-exclusive power tools. The ones I’ll miss the most might be Alfred and DEVONthink . My recent migration to do-everything-in-Emacs does make the transition a lot easier. I also moved from iTerm2 to Ghostty last year and am now trying out Kitty with the Fish shell. My RSS feed now lives inside my FreshRSS server making me less dependent on NetNewsWire. Software-wise, I’m getting there. I’m sure I’ll get there. But what about hardware-wise? Related topics: / linux / By Wouter Groeneveld on 22 April 2026.  Reply via email .

0 views
Alex White's Blog 1 months ago

Linux Apps Starter Kit (Gnome Edition)

I find beautiful, well-designed, native applications to be a source of inspiration when using my computer. I've posted on Mastodon about the native Mac applications that were hard to leave when switching to Linux. Now that I've fully made the switch, I figure it's only fitting to do the reverse post on the Linux applications that I've fallen in love with. For this post, I'll be focusing on Gnome/GTK/Adwaita applications. Why? Two reasons. First off, I use Fedora with Gnome 49 so I'm most familiar with this territory. Second, Gnome has a very well defined HIG (Human Interface Guidelines), resulting in a strong visual identity. Applications enhance the operating system in a consistent, fluid way, rather than serving a jarring experience (ie an electron app with radically different UI/UX). This is key to me for finding inspiration and joy when using an application. With all that said, let's dig into the apps I consider essential! Internet radio is awesome and Shortwave is the best application I've found on any platform for listening to it. Search for stations, add to your library and jam! It also has a DVR-like function (okay I get it, I'm getting old) so you can download tracks you've listened to. Finally, there's an amazing skeuomorphic mini-player (I'm a sucker for skeuomorphic design). 📦 Shortwave on Flathub ♥️ Support Shortwave 👤 Meet the Developer "Plays music, and nothing else" is the tagline of this beautiful audio player. For those of us still rocking local media collections, Amberol is the way to go. I mean, just look at it! Point it at a folder, play the music inside, easy! I have my NAS mounted as a bookmark in Nautilus, so I just point Amberol to my network music folder. Who needs streaming?!? 📦 Amberol on Flathub ♥️ Support Amberol 👤 Meet the Developer I've been using Blanket longer than most apps on this list, long before I made the full switch to Linux (and heck, it's probably one of the reasons I eventually made the switch). It's a no-frills ambient noise machine. Comes with a large selection of high-quality samples that can individually be toggled and adjusted. You can save preset configurations (ie coffee shop in a thunder storm), and add your own audio samples. On any other platform this would cost $15 or more, but here it is on Linux, free and open-source. 📦 Blanket on Flathub ♥️ Support Blanket 👤 Meet the Developer Need to quickly edit an image or make a thumbnail? Pinta to the rescue! It's fast and has a familiar UX. Sure, it's not as powerful as GIMP, but I find myself reaching to it more often. 📦 Pinta on Flathub 👤 Meet the Developers This app right here should be a default Gnome app, it's that good! Hands down the most powerful and user friendly screenshot tool I've used (and yeah, I've tried the popular Mac OS ones). Bind Gradia to a shortcut (I use Super + Shift + S) and it'll open after you take a screenshot. Gradia lets you add arrows, drawings, blur text, perform OCR, crop, add backdrops and more. It's honestly an essential application, and performs better than apps I paid $15+ for on Mac. 📦 Gradia on Flathub ♥️ Support Gradia 👤 Meet the Developer There's a lot of single purpose, well-built applications for Gnome, and Switcheroo is a great one I use daily. It takes an image in, and outputs in a different format. You can add on compression, resizing, strip metadata and replace transparency. I use it to optimize images for web. 📦 Switcheroo on Flathub ♥️ Support Switcheroo 👤 Meet the Developer I don't use social media beyond Mastodon, but Tuba makes me glad I'm at least on that platform. Tuba is well designed, fast and filled with thoughtful features (like a custom emoji picker and the ability to schedule posts). I've tried the best on Mac (Ice Cubes), and it doesn't get close to comparing with Tuba. 📦 Tuba on Flathub ♥️ Support Tuba 👤 Meet the Developer Mmmm RSS, my favorite (and probably how you're reading this article)! Newsflash is a great excellent, way to stay on top of your feeds. It's got categories, tags, OPML import/export, themes, and more. My favorite feature is the "Today" tab filtered by unread, great to catch up on what's new. 📦 Newsflash on Flathub 👤 Meet the Developer Here it is, my top pick. You don't even need to read this, just go download Planify, it's incredible. Alain took todos and added a bucketload of thoughtfully designed microinteractions. Labels, scheduling, today view, sections, kanban board, natural text to date parsing, the list goes on. When you hover the "Add" button, it does a little animation. When you complete a task, it gives a little sound. There's so many thoughtfully designed pieces in here! 📦 Planify on Flathub ♥️ Support Planify 👤 Meet the Developer Markdown based note taking, done very well. Notes are organized into notebooks and paired with a pleasant, minimalist markdown editor. 📦 Folio on Flathub 👤 Meet the Developer Distraction free markdown editor for writing long form content. Basically, the Linux alternative to iA Writer on Mac. It's beautiful, fast and has just enough features. I use it to write most of my blog posts! 📦 Apostrophe on Flathub ♥️ Support Apostrophe 👤 Meet the Developer Another excellent, single-purpose application that I use on a daily basis. Sessions is an egg/pomodoro timer that beeps when time's up. You just drag the slider and the timer starts. Great for keeping yourself focused! 📦 Sessions on Flathub ♥️ Support Sessions 👤 Meet the Developer Holy crap this app looks good! John did an incredible job building the best ebook reader on Linux. You can bring your own books, or use the catalogs feature to discover public domain literature. There's support for annotations (with import/export), bookmarks, text to voice and theming. 📦 Foliate on Flathub ♥️ Support Foliate 👤 Meet the Developer Got a sqlite database and want to know what's inside? Bobby to the rescue! Drag and drop your database file in and see the data. Simple, well designed and useful! 📦 Bobby on Flathub ♥️ Support Bobby 👤 Meet the Developer There's so much value packed into this app! Replace random sketchy websites you found on Google by using Dev Toolbox to generate a QR code, check contrast ratios, parse CRON strings and so much more. There's too much in here to cover, but it's become an essential part of my toolkit. 📦 Dev Toolbox on Flathub 👤 Meet the Developer Bazaar is a faster, more reliable and visually more appealing alternative to the default Gnome Software application. It's one of my first installs on a new system and another application that should be a default Gnome app. 📦 Bazaar on Flathub ♥️ Support Bazaar 👤 Meet the Developer The absolute best way to discover, install and update Gnome shell extensions! 📦 Extension Manager on Flathub ♥️ Support Extension Manager 👤 Meet the Developer Copyous is a shell extension, and it's the best clipboard manager out there. Visually browse and search your clipboard history. Supports image previews, syntax highlighting, color previews (ie copy a hex code and it shows the color) and so much more! 📦 Copyous on Gnome Extensions There's so many amazing applications on Linux that I definitely missed some! Feel free to shoot me an email at [email protected] with recommendations. I'll do a separate post in the future for KDE applications! I mentioned a few times in this article that some applications on Linux provide better value than alternatives I paid for on Mac OS. There's not a single paid application on this list, but that does not mean you shouldn't support the developers! These developers work hard to design, build, test and support the software that makes Linux great. If you like their work, show them some love!

0 views
Martin Alderson 1 months ago

Has Mythos just broken the deal that kept the internet safe?

For nearly 20 years the deal has been simple: you click a link, arbitrary code runs on your device, and a stack of sandboxes keeps that code from doing anything nasty. Browser sandboxes for untrusted JavaScript, VM sandboxes for multi-tenant cloud, ad iframes so banner creatives can't take over your phone or laptop - the modern internet is built on the assumption that those sandboxes hold. Anthropic just shipped a research preview that generates working exploits for one of them 72.4% of the time, up from under 1% a few months ago. That deal might be breaking. From what I've read Mythos is a very large model. Rumours have pointed to it being similar in size to the short lived (and very underwhelming) GPT4.5 . As such I'm with a lot of commentators in thinking that a primary reason this hasn't been rolled out further is compute. Anthropic is probably the most compute starved major AI lab right now and I strongly suspect they do not have the compute to roll this out even if they wanted more broadly. From leaked pricing, it's expensive as well - at $125/MTok output (5x more than Opus, which is itself the most expensive model out there). One thing that has really been overlooked with all the focus on frontier scale models is how quickly improvements in the huge models are being achieved on far smaller models. I've spent a lot of time with Gemma 4 open weights model, and it is incredibly impressive for a model that is ~50x smaller than the frontier models. So I have no doubt that whatever capabilities Mythos has will relatively quickly be available in smaller, and thus easier to serve, models. And even if Mythos' huge size somehow is intrinsic to the abilities (I very much doubt this, given current progress in scaling smaller models) it has, it's only a matter of time before newer chips [1] are able to serve it en masse. It's important to look to where the puck is going. As I've written before, LLMs in my opinion pose an extremely serious cybersecurity risk. Fundamentally we are seeing a radical change in how easy it is to find (and thus exploit) serious flaws and bugs in software for nefarious purposes. To back up a step, it's important to understand how modern cybersecurity is currently achieved. One of the most important concepts is that of a sandbox . Nearly every electronic device you touch day to day has one (or many) layers of these to protect the system. In short, a sandbox is a so called 'virtualised' environment where software can execute on the system, but with limited permissions, segregated from other software, with a very strong boundary that protects the software 'breaking out' of the sandbox. If you're reading this on a modern smartphone, you have at least 3 layers of sandboxing between this page and your phone's operating system. First, your browser has (at least) two levels of sandboxing. One is for the JavaScript execution environment (which runs the interactive code on websites). This is then sandboxed by the browser sandbox, which limits what the site as a whole can do. Finally, iOS or Android then has an app sandbox which limits what the browser as a whole can do. This defence in depth is absolutely fundamental to modern information security, especially allowing users to browse "untrusted" websites with any level of security. For a malicious website to gain control over your device, it needs to chain together multiple vulnerabilities, all at the same time. In reality this is extremely hard to do (and these kinds of chains fetch millions of dollars on the grey market ). Guess what? According to Anthropic, Mythos Preview successfully generates a working exploit for Firefox's JS shell in 72.4% of trials. Opus 4.6 managed this in under 1% of trials in a previous evaluation: Worth flagging a couple of caveats. The JS shell here is Firefox's standalone SpiderMonkey - so this is escaping the innermost sandbox layer, not the full browser chain (the renderer process and OS app sandbox still sit on top). And it's Anthropic's own benchmark, not an independent one. But even hedging both of those, the trajectory is what matters - we're going from "effectively zero" to "72.4% of the time" in one model generation, on a real-world target rather than a toy CTF. This is pretty terrifying if you understand the implications of this. If an LLM can find exploits in sandboxes - which are some of the most well secured pieces of software on the planet - then suddenly every website you aimlessly browse through could contain malicious code which can 'escape' the sandbox and theoretically take control of your device - and all the data on your phone could be sent to someone nasty. These attacks are so dangerous because the internet is built around sandboxes being safe. For example, each banner ad your browser loads is loaded in a separate sandboxed environment. This means they can run a huge amount of (mostly) untested code, with everyone relying on the browser sandbox to protect them. If that sandbox falls, then suddenly a malicious ad campaign can take over millions of devices in hours. Equally, sandboxes (and virtualisation) are fundamental to allowing cloud computing to operate at scale. Most servers these days are not running code against the actual server they are on. Instead, AWS et al take the physical hardware and "slice" it up into so called "virtual" servers, selling each slice to different customers. This allows many more applications to run on a single server - and enables some pretty nice profit margins for the companies involved. This operates on roughly the same model as your phone, with various layers to protect customers from accessing each other's data and (more importantly) from accessing the control plane of AWS. So, we have a very, very big problem if these sandboxes fail, and all fingers point towards this being the case this year. I should tone down the disaster porn slightly - there have been many sandbox escapes before that haven't caused chaos, but I have a strong feeling that this is going to be difficult. And to be clear, when just AWS us-east-1 goes down (which it has done many , many , times ) it is front page news globally and tends to cause significant disruption to day to day life. This is just one of AWS's data centre zones - if a malicious actor was able to take control of the AWS control plane it's likely they'd be able to take all regions simultaneously, and it would likely be infinitely harder to restore when a bad actor was in charge, as opposed to the internal problems that have caused previous problems - and been extremely difficult to restore from in a timely way. Given all this it's understandable that Anthropic are being cautious about releasing this in the wild. The issue though, is that the cat is out of the bag. Even if Anthropic pulled a Miles Dyson and lowered their model code into a pit of molten lava, someone else is going to scale an RL model and release it. The incentives are far, far too high and the prisoner's dilemma strikes again. The current status quo seems to be that these next generation models will be released to a select group of cybersecurity professionals and related organisations, so they can fix things as much as possible to give them a head start. Perhaps this is the best that can be done, but this seems to me to be a repeat of the famous "obscurity is not security" approach which has become a meme in itself in the information security world. It also seems far fetched to me that these organisations who do have access are going to find even most of the critical problems in a limited time window. And that brings me to my final point. While Anthropic are providing $100m of credit and $4m of 'direct cash donations' to open source projects, it's not all open source projects. There are a lot of open source projects that everyone relies on without realising. While the obvious ones like the Linux kernel are getting this "access" ahead of time, there are literally millions of pieces of open source software (nevermind commercial software) that are essential for a substantial minority of systems operation. I'm not quite sure where the plan leaves these ones. Perhaps this is just another round in the cat and mouse cycle that reaches a mostly stable equilibrium, and at worst we have some short term disruption. But if I step back and look how fast the industry has moved over the past few years - I'm not so sure. And one thing I think is for certain, it looks like we do now have the fabled superhuman ability in at least one domain. I don't think it's the last. Albeit at the cost of adding yet more pressure onto the compute crunch the AI industry is experiencing ↩︎ Albeit at the cost of adding yet more pressure onto the compute crunch the AI industry is experiencing ↩︎

0 views
Simon Willison 1 months ago

Anthropic's Project Glasswing - restricting Claude Mythos to security researchers - sounds necessary to me

Anthropic didn't release their latest model, Claude Mythos ( system card PDF ), today. They have instead made it available to a very restricted set of preview partners under their newly announced Project Glasswing . The model is a general purpose model, similar to Claude Opus 4.6, but Anthropic claim that its cyber-security research abilities are strong enough that they need to give the software industry as a whole time to prepare. Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser . Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. Project Glasswing partners will receive access to Claude Mythos Preview to find and fix vulnerabilities or weaknesses in their foundational systems—systems that represent a very large portion of the world’s shared cyberattack surface. We anticipate this work will focus on tasks like local vulnerability detection, black box testing of binaries, securing endpoints, and penetration testing of systems. There's a great deal more technical detail in Assessing Claude Mythos Preview’s cybersecurity capabilities on the Anthropic Red Team blog: In one case, Mythos Preview wrote a web browser exploit that chained together four vulnerabilities, writing a complex  JIT heap spray  that escaped both renderer and OS sandboxes. It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses. And it autonomously wrote a remote code execution exploit on FreeBSD's NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain over multiple packets. Plus this comparison with Claude 4.6 Opus: Our internal evaluations showed that Opus 4.6 generally had a near-0% success rate at autonomous exploit development. But Mythos Preview is in a different league. For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla’s Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more. Saying "our model is too dangerous to release" is a great way to build buzz around a new model, but in this case I expect their caution is warranted. Just a few days ( last Friday ) ago I started a new ai-security-research tag on this blog to acknowledge an uptick in credible security professionals pulling the alarm on how good modern LLMs have got at vulnerability research. Greg Kroah-Hartman of the Linux kernel: Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us. Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real. Daniel Stenberg of : The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good. I'm spending hours per day on this now. It's intense. And Thomas Ptacek published Vulnerability Research Is Cooked , a post inspired by his podcast conversation with Anthropic's Nicholas Carlini. Anthropic have a 5 minute talking heads video describing the Glasswing project. Nicholas Carlini appears as one of those talking heads, where he said (highlights mine): It has the ability to chain together vulnerabilities. So what this means is you find two vulnerabilities, either of which doesn't really get you very much independently. But this model is able to create exploits out of three, four, or sometimes five vulnerabilities that in sequence give you some kind of very sophisticated end outcome. [...] I've found more bugs in the last couple of weeks than I found in the rest of my life combined . We've used the model to scan a bunch of open source code, and the thing that we went for first was operating systems, because this is the code that underlies the entire internet infrastructure. For OpenBSD, we found a bug that's been present for 27 years, where I can send a couple of pieces of data to any OpenBSD server and crash it . On Linux, we found a number of vulnerabilities where as a user with no permissions, I can elevate myself to the administrator by just running some binary on my machine. For each of these bugs, we told the maintainers who actually run the software about them, and they went and fixed them and have deployed the patches patches so that anyone who runs the software is no longer vulnerable to these attacks. I found this on the OpenBSD 7.8 errata page : 025: RELIABILITY FIX: March 25, 2026 All architectures TCP packets with invalid SACK options could crash the kernel. A source code patch exists which remedies this problem. I tracked that change down in the GitHub mirror of the OpenBSD CVS repo (apparently they still use CVS!) and found it using git blame : Sure enough, the surrounding code is from 27 years ago. I'm not sure which Linux vulnerability Nicholas was describing, but it may have been this NFS one recently covered by Michael Lynch . There's enough smoke here that I believe there's a fire. It's not surprising to find vulnerabilities in decades-old software, especially given that they're mostly written in C, but what's new is that coding agents run by the latest frontier LLMs are proving tirelessly capable at digging up these issues. I actually thought to myself on Friday that this sounded like an industry-wide reckoning in the making, and that it might warrant a huge investment of time and money to get ahead of the inevitable barrage of vulnerabilities. Project Glasswing incorporates "$100M in usage credits ... as well as $4M in direct donations to open-source security organizations". Partners include AWS, Apple, Microsoft, Google, and the Linux Foundation. It would be great to see OpenAI involved as well - GPT-5.4 already has a strong reputation for finding security vulnerabilities and they have stronger models on the near horizon. The bad news for those of us who are not trusted partners is this: We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale—for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring. To do so, we need to make progress in developing cybersecurity (and other) safeguards that detect and block the model’s most dangerous outputs. We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview. I can live with that. I think the security risks really are credible here, and having extra time for trusted teams to get ahead of them is a reasonable trade-off. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
Ahead of AI 1 months ago

Components of A Coding Agent

In this article, I want to cover the overall design of coding agents and agent harnesses: what they are, how they work, and how the different pieces fit together in practice. Readers of my Build a Large Language Model (From Scratch) and Build a Large Reasoning Model (From Scratch) books often ask about agents, so I thought it would be useful to write a reference I can point to. More generally, agents have become an important topic because much of the recent progress in practical LLM systems is not just about better models, but about how we use them. In many real-world applications, the surrounding system, such as tool use, context management, and memory, plays as much of a role as the model itself. This also helps explain why systems like Claude Code or Codex can feel significantly more capable than the same models used in a plain chat interface. In this article, I lay out six of the main building blocks of a coding agent. You are probably familiar with Claude Code or the Codex CLI, but just to set the stage, they are essentially agentic coding tools that wrap an LLM in an application layer, a so-called agentic harness, to be more convenient and better-performing for coding tasks. Figure 1: Claude Code CLI, Codex CLI, and my Mini Coding Agent . Coding agents are engineered for software work where the notable parts are not only the model choice but the surrounding system, including repo context, tool design, prompt-cache stability, memory, and long-session continuity. That distinction matters because when we talk about the coding capabilities of LLMs, people often collapse the model, the reasoning behavior, and the agent product into one thing. But before getting into the coding agent specifics, let me briefly provide a bit more context on the difference between the broader concepts, the LLMs, reasoning models, and agents. An LLM is the core next-token model. A reasoning model is still an LLM, but usually one that was trained and/or prompted to spend more inference-time compute on intermediate reasoning, verification, or search over candidate answers. An agent is a layer on top, which can be understood as a control loop around the model. Typically, given a goal, the agent layer (or harness) decides what to inspect next, which tools to call, how to update its state, and when to stop, etc. Roughly, we can think about the relationship as this: the LLM is the engine, a reasoning model is a beefed-up engine (more powerful, but more expensive to use), and an agent harness helps us the model. The analogy is not perfect, because we can also use conventional and reasoning LLMs as standalone models (in a chat UI or Python session), but I hope it conveys the main point. Figure 2: The relationship between conventional LLM, reasoning LLM (or reasoning model), and an LLM wrapped in an agent harness. In other words, the agent is the system that repeatedly calls the model inside an environment. So, in short, we can summarize it like this: LLM: the raw model Reasoning model : an LLM optimized to output intermediate reasoning traces and to verify itself more Agent: a loop that uses a model plus tools, memory, and environment feedback Agent harness: the software scaffold around an agent that manages context, tool use, prompts, state, and control flow Coding harness: a special case of an agent harness; i.e., a task-specific harness for software engineering that manages code context, tools, execution, and iterative feedback As listed above, in the context of agents and coding tools, we also have the two popular terms agent harness and (agentic) coding harness . A coding harness is the software scaffold around a model that helps it write and edit code effectively. And an agent harness is a bit broader and not specific to coding (e.g., think of OpenClaw). Codex and Claude Code can be considered coding harnesses. Anyways, A better LLM provides a better foundation for a reasoning model (which involves additional training), and a harness gets more out of this reasoning model. Sure, LLMs and reasoning models are also capable of solving coding tasks by themselves (without a harness), but coding work is only partly about next-token generation. A lot of it is about repo navigation, search, function lookup, diff application, test execution, error inspection, and keeping all the relevant information in context. (Coders may know that this is hard mental work, which is why we don’t like to be disrupted during coding sessions :)). Figure 3. A coding harness combines three layers: the model family, an agent loop, and runtime supports. The model provides the “engine”, the agent loop drives iterative problem solving, and the runtime supports provide the plumbing. Within the loop, “observe” collects information from the environment, “inspect” analyzes that information, “choose” selects the next step, and “act” executes it. The takeaway here is that a good coding harness can make a reasoning and a non-reasoning model feel much stronger than it does in a plain chat box, because it helps with context management and more. As mentioned in the previous section, when we say harness , we typically mean the software layer around the model that assembles prompts, exposes tools, tracks file state, applies edits, runs commands, manages permissions, caches stable prefixes, stores memory, and many more. Today, when using LLMs, this layer shapes most of the user experience compared to prompting the model directly or using web chat UI (which is closer to “chat with uploaded files”). Since, in my view, the vanilla versions of LLMs nowadays have very similar capabilities (e.g., the vanilla versions of GPT-5.4, Opus 4.6, and GLM-5 or so), the harness can often be the distinguishing factor that makes one LLM work better than another. This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code. That said, some harness-specific post-training is usually beneficial. For example, OpenAI historically maintained separate GPT-5.3 and GPT-5.3-Codex variants. In the next section, I want to go more into the specifics and discuss the core components of a coding harness using my Mini Coding Agent : https://github.com/rasbt/mini-coding-agent . Figure 4: Main harness features of a coding agent / coding harness that will be discussed in the following sections. By the way, in this article, I use the terms “coding agent” and “coding harness” somewhat interchangeably for simplicity. (Strictly speaking, the agent is the model-driven decision-making loop, while the harness is the surrounding software scaffold that provides context, tools, and execution support.) Figure 5: Minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python) Anyways, below are six main components of coding agents. You can check out the source code of my minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python), for more concrete code examples. The code annotates the six components discussed below via code comments: This is maybe the most obvious component, but it is also one of the most important ones. When a user says “fix the tests” or “implement xyz,” the model should know whether it is inside a Git repo, what branch it is on, which project documents might contain instructions, and so on. That’s because those details often change or affect what the correct action is. For example, “Fix the tests” is not a self-contained instruction. If the agent sees AGENTS.md or a project README, it may learn which test command to run, etc. If it knows the repo root and layout, it can look in the right places instead of guessing. Also, the git branch, status, and commits can help provide more context about what changes are currently in progress and where to focus. Figure 6: The agent harness first builds a small workspace summary that gets combined with the user request for additional project context. The takeaway is that the coding agent collects info (”stable facts” as a workspace summary) upfront before doing any work, so that it’s is not starting from zero, without context, on every prompt. Once the agent has a repo view, the next question is how to feed that information to the model. The previous figure showed a simplified view of this (“Combined prompt: prefix + request”), but in practice, it would be relatively wasteful to combine and re-process the workspace summary on every user query. I.e., coding sessions are repetitive, and the agent rules usually stay the same. The tool descriptions usually stay the same, too. And even the workspace summary usually stays (mostly) the same. The main changes are usually the latest user request, the recent transcript, and maybe the short-term memory. “Smart” runtimes don’t rebuild everything as one giant undifferentiated prompt on every turn, as illustrated in the figure below. Figure 7: The agent harness builds a stable prompt prefix, adds the changing session state, and then feeds that combined prompt to the model. The main difference from section 1 is that section 1 was about gathering repo facts. Here, we are now interested in packaging and caching those facts efficiently for repeated model calls. The “stable” “Stable prompt prefix” means that the information contained there doesn’t change too much. It usually contains the general instructions, tool descriptions, and the workspace summary. We don’t want to waste compute on rebuilding it from scratch in each interaction if nothing important has changed. The other components are updated more frequently (usually each turn). This includes short-term memory, the recent transcript, and the newest user request. In short, the caching aspect for the “Stable prompt prefix” is simply that a smart runtime tries to reuse that part. Tool access and tool use are where it starts to feel less like chat and more like an agent. A plain model can suggest commands in prose, but an LLM in a coding harness should do something narrower and more useful and be actually able to execute the command and retrieve the results (versus us calling the command manually and pasting the results back into the chat). But instead of letting the model improvise arbitrary syntax, the harness usually provides a pre-defined list of allowed and named tools with clear inputs and clear boundaries. (But of course, something like Python can be part of this so that the agent could also execute an arbitrary wide list of shell commands.) The tool-use flow is illustrated in the figure below. Figure 8: The model emits a structured action, the harness validates it, optionally asks for approval, executes it, and feeds the bounded result back into the loop. To illustrate this, below is an example of how this usually looks to the user using my Mini Coding Agent. (This is not as pretty as Claude Code or Codex because it is very minimal and uses plain Python without any external dependencies.) Figure 9: Illustration of a tool call approval request in the Mini Coding Agent. Here, the model has to choose an action that the harness recognizes, like list files, read a file, search, run a shell command, write a file, etc. It also has to provide arguments in a shape that the harness can check. So when the model asks to do something, the runtime can stop and run programmatic checks like “Is this a known tool?”, “Are the arguments valid?”, “Does this need user approval?” “Is the requested path even inside the workspace?” Only after those checks pass does anything actually run. While running coding agents, of course, carries some risk, the harness checks also improve reliability because the model doesn’t execute totally arbitrary commands. Also, besides rejecting malformed actions and approval gating, file access can be kept inside the repo by checking file paths. In a sense, the harness is giving the model less freedom, but it also improves the usability at the same time. Context bloat is not a unique problem of coding agents but an issue for LLMs in general. Sure, LLMs are supporting longer and longer contexts these days (and I recently wrote about the attention variants that make it computationally more feasible), but long contexts are still expensive and can also introduce additional noise (if there is a lot of irrelevant info). Coding agents are even more susceptible to context bloat than regular LLMs during multi-turn chats, because of repeated file reads, lengthy tool outputs, logs, etc. If the runtime keeps all of that at full fidelity, it will run out of available context tokens pretty quickly. So, a good coding harness is usually pretty sophisticated about handling context bloat beyond just cutting our summarizing information like regular chat UIs. Conceptually, the context compaction in coding agents might work as summarized in the figure below. Specifically, we are zooming a bit further into the clip (step 6) part of Figure 8 in the previous section. Figure 10: Large outputs are clipped, older reads are deduplicated, and the transcript is compressed before it goes back into the prompt. A minimal harness uses at least two compaction strategies to manage that problem. The first is clipping, which shortens long document snippets, large tool outputs, memory notes, and transcript entries. In other words, it prevents any one piece of text from taking over the prompt budget just because it happened to be verbose. The second strategy is transcript reduction or summarization, which turns the full session history (more on that in the next section) into a smaller promptable summary. A key trick here is to keep recent events richer because they are more likely to matter for the current step. And we compress older events more aggressively because they are likely less relevant. Additionally, we also deduplicate older file reads so the model does not keep seeing the same file content over and over again just because it was read multiple times earlier in the session. Overall, I think this is one of the underrated, boring parts of good coding-agent design. A lot of apparent “model quality” is really context quality. In practice, all these 6 core concepts covered here are highly intertwined, and the different sections and figures cover them with different focuses or zoom levels. In the previous section, we covered prompt-time use of history and how we build a compact transcript. The question there is: how much of the past should go back into the model on the next turn? So the emphasis is compression, clipping, deduplication, and recency. Now, this section, structured session memory, is about the storage-time structure of history. The question here is: what does the agent keep over time as a permanent record? So the emphasis is that the runtime keeps a fuller transcript as a durable state, alongside a lighter memory layer that is smaller and gets modified and compacted rather than just appended to. To summarize, a coding agent separates state into (at least) two layers: working memory: the small, distilled state the agent keeps explicitly a full transcript: this covers all the user requests, tool outputs, and LLM responses Figure 11: New events get appended to a full transcript and summarized in a working memory. The session files on disk are usually stored as JSON files. The figure above illustrates the two main session files, the full transcript and the working memory, that usually get stored as JSON files on disk. As mentioned before, the full transcript stores the whole history, and it’s resumable if we close the agent. The working memory is more of a distilled version with the currently most important info, which is somewhat related to the compact transcript. But the compact transcript and working memory have slightly different jobs. The compact transcript is for prompt reconstruction. Its job is to give the model a compressed view of recent history so it can continue the conversation without seeing the full transcript every turn. The working memory is more meant for task continuity. Its job is to keep a small, explicitly maintained summary of what matters across turns, things like the current task, important files, and recent notes. Following step 4 in the figure above, the latest user request, together with the LLM response and tool output, would then be recorded as a “new event” in both the full transcript and working memory, in the next round, which is not shown to reduce clutter in the figure above. Once an agent has tools and state, one of the next useful capabilities is delegation. The reason is that it allows us to parallelize certain work into subtasks via subagents and speed up the main task. For example, the main agent may be in the middle of one task and still need a side answer, for example, which file defines a symbol, what a config says, or why a test is failing. It is useful to split that off into a bounded subtask instead of forcing one loop to carry every thread of work at once. (In my mini coding agent, the implementation is simpler, and the child still runs synchronously, but the underlying idea is the same.) A subagent is only useful if it inherits enough context to do real work. But if we don’t restrict it, we now have multiple agents duplicating work, touching the same files, or spawning more subagents, and so on. So the tricky design problem is not just how to spawn a subagent but also how to bind one :). Figure 12: The subagent inherits enough context to be useful, but it runs inside tighter boundaries than the main agent. The trick here is that the subagent inherits enough context to be useful, but also has it constrained (for example, read-only and restricted in recursion depth) Claude Code has supported subagents for a long time, and Codex added them more recently. Codex does not generally force subagents into read-only mode. Instead, they usually inherit much of the main agent’s sandbox and approval setup. So, the boundary is more about task scoping, context, and depth. The section above tried to cover the main components of coding agents. As mentioned before, they are more or less deeply intertwined in their implementation. However, I hope that covering them one by one helps with the overall mental model of how coding harnesses work, and why they can make the LLM more useful compared to simple multi-turn chats. Figure 13: Six main features of a coding harness discussed in previous sections. If you are interested in seeing these implemented in clean, minimalist Python code, you may like my Mini Coding Agent . OpenClaw may be an interesting comparison, but it is not quite the same kind of system. OpenClaw is more like a local, general agent platform that can also code, rather than being a specialized (terminal) coding assistant. There are still several overlaps with a coding harness: it uses prompt and instruction files in the workspace, such as AGENTS.md, SOUL.md, and TOOLS.md it keeps JSONL session files and includes transcript compaction and session management it can spawn helper sessions and subagents However, as mentioned above, the emphasis is different. Coding agents are optimized for a person working in a repository and asking a coding assistant to inspect files, edit code, and run local tools efficiently. OpenClaw is more optimized for running many long-lived local agents across chats, channels, and workspaces, with coding as one important workload among several others. I am excited to share that I finished writing Build A Reasoning Model (From Scratch) and all chapters are in early access yet. The publisher is currently working on the layouts, and it should be available this summer. This is probably my most ambitious book so far. I spent about 1.5 years writing it, and a large number of experiments went into it. It is also probably the book I worked hardest on in terms of time, effort, and polish, and I hope you’ll enjoy it. Build a Reasoning Model (From Scratch) on Manning and Amazon . The main topics are evaluating reasoning models inference-time scaling self-refinement reinforcement learning distillation There is a lot of discussion around “reasoning” in LLMs, and I think the best way to understand what it really means in the context of LLMs is to implement one from scratch! Amazon (pre-order) Manning (complete book in early access , pre-final layout, 528 pages) Figure 1: Claude Code CLI, Codex CLI, and my Mini Coding Agent . Coding agents are engineered for software work where the notable parts are not only the model choice but the surrounding system, including repo context, tool design, prompt-cache stability, memory, and long-session continuity. That distinction matters because when we talk about the coding capabilities of LLMs, people often collapse the model, the reasoning behavior, and the agent product into one thing. But before getting into the coding agent specifics, let me briefly provide a bit more context on the difference between the broader concepts, the LLMs, reasoning models, and agents. On The Relationship Between LLMs, Reasoning Models, and Agents An LLM is the core next-token model. A reasoning model is still an LLM, but usually one that was trained and/or prompted to spend more inference-time compute on intermediate reasoning, verification, or search over candidate answers. An agent is a layer on top, which can be understood as a control loop around the model. Typically, given a goal, the agent layer (or harness) decides what to inspect next, which tools to call, how to update its state, and when to stop, etc. Roughly, we can think about the relationship as this: the LLM is the engine, a reasoning model is a beefed-up engine (more powerful, but more expensive to use), and an agent harness helps us the model. The analogy is not perfect, because we can also use conventional and reasoning LLMs as standalone models (in a chat UI or Python session), but I hope it conveys the main point. Figure 2: The relationship between conventional LLM, reasoning LLM (or reasoning model), and an LLM wrapped in an agent harness. In other words, the agent is the system that repeatedly calls the model inside an environment. So, in short, we can summarize it like this: LLM: the raw model Reasoning model : an LLM optimized to output intermediate reasoning traces and to verify itself more Agent: a loop that uses a model plus tools, memory, and environment feedback Agent harness: the software scaffold around an agent that manages context, tool use, prompts, state, and control flow Coding harness: a special case of an agent harness; i.e., a task-specific harness for software engineering that manages code context, tools, execution, and iterative feedback Figure 3. A coding harness combines three layers: the model family, an agent loop, and runtime supports. The model provides the “engine”, the agent loop drives iterative problem solving, and the runtime supports provide the plumbing. Within the loop, “observe” collects information from the environment, “inspect” analyzes that information, “choose” selects the next step, and “act” executes it. The takeaway here is that a good coding harness can make a reasoning and a non-reasoning model feel much stronger than it does in a plain chat box, because it helps with context management and more. The Coding Harness As mentioned in the previous section, when we say harness , we typically mean the software layer around the model that assembles prompts, exposes tools, tracks file state, applies edits, runs commands, manages permissions, caches stable prefixes, stores memory, and many more. Today, when using LLMs, this layer shapes most of the user experience compared to prompting the model directly or using web chat UI (which is closer to “chat with uploaded files”). Since, in my view, the vanilla versions of LLMs nowadays have very similar capabilities (e.g., the vanilla versions of GPT-5.4, Opus 4.6, and GLM-5 or so), the harness can often be the distinguishing factor that makes one LLM work better than another. This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code. That said, some harness-specific post-training is usually beneficial. For example, OpenAI historically maintained separate GPT-5.3 and GPT-5.3-Codex variants. In the next section, I want to go more into the specifics and discuss the core components of a coding harness using my Mini Coding Agent : https://github.com/rasbt/mini-coding-agent . Figure 4: Main harness features of a coding agent / coding harness that will be discussed in the following sections. By the way, in this article, I use the terms “coding agent” and “coding harness” somewhat interchangeably for simplicity. (Strictly speaking, the agent is the model-driven decision-making loop, while the harness is the surrounding software scaffold that provides context, tools, and execution support.) Figure 5: Minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python) Anyways, below are six main components of coding agents. You can check out the source code of my minimal but fully working, from-scratch Mini Coding Agent (implemented in pure Python), for more concrete code examples. The code annotates the six components discussed below via code comments: 1. Live Repo Context This is maybe the most obvious component, but it is also one of the most important ones. When a user says “fix the tests” or “implement xyz,” the model should know whether it is inside a Git repo, what branch it is on, which project documents might contain instructions, and so on. That’s because those details often change or affect what the correct action is. For example, “Fix the tests” is not a self-contained instruction. If the agent sees AGENTS.md or a project README, it may learn which test command to run, etc. If it knows the repo root and layout, it can look in the right places instead of guessing. Also, the git branch, status, and commits can help provide more context about what changes are currently in progress and where to focus. Figure 6: The agent harness first builds a small workspace summary that gets combined with the user request for additional project context. The takeaway is that the coding agent collects info (”stable facts” as a workspace summary) upfront before doing any work, so that it’s is not starting from zero, without context, on every prompt. 2. Prompt Shape And Cache Reuse Once the agent has a repo view, the next question is how to feed that information to the model. The previous figure showed a simplified view of this (“Combined prompt: prefix + request”), but in practice, it would be relatively wasteful to combine and re-process the workspace summary on every user query. I.e., coding sessions are repetitive, and the agent rules usually stay the same. The tool descriptions usually stay the same, too. And even the workspace summary usually stays (mostly) the same. The main changes are usually the latest user request, the recent transcript, and maybe the short-term memory. “Smart” runtimes don’t rebuild everything as one giant undifferentiated prompt on every turn, as illustrated in the figure below. Figure 7: The agent harness builds a stable prompt prefix, adds the changing session state, and then feeds that combined prompt to the model. The main difference from section 1 is that section 1 was about gathering repo facts. Here, we are now interested in packaging and caching those facts efficiently for repeated model calls. The “stable” “Stable prompt prefix” means that the information contained there doesn’t change too much. It usually contains the general instructions, tool descriptions, and the workspace summary. We don’t want to waste compute on rebuilding it from scratch in each interaction if nothing important has changed. The other components are updated more frequently (usually each turn). This includes short-term memory, the recent transcript, and the newest user request. In short, the caching aspect for the “Stable prompt prefix” is simply that a smart runtime tries to reuse that part. 3. Tool Access and Use Tool access and tool use are where it starts to feel less like chat and more like an agent. A plain model can suggest commands in prose, but an LLM in a coding harness should do something narrower and more useful and be actually able to execute the command and retrieve the results (versus us calling the command manually and pasting the results back into the chat). But instead of letting the model improvise arbitrary syntax, the harness usually provides a pre-defined list of allowed and named tools with clear inputs and clear boundaries. (But of course, something like Python can be part of this so that the agent could also execute an arbitrary wide list of shell commands.) The tool-use flow is illustrated in the figure below. Figure 8: The model emits a structured action, the harness validates it, optionally asks for approval, executes it, and feeds the bounded result back into the loop. To illustrate this, below is an example of how this usually looks to the user using my Mini Coding Agent. (This is not as pretty as Claude Code or Codex because it is very minimal and uses plain Python without any external dependencies.) Figure 9: Illustration of a tool call approval request in the Mini Coding Agent. Here, the model has to choose an action that the harness recognizes, like list files, read a file, search, run a shell command, write a file, etc. It also has to provide arguments in a shape that the harness can check. So when the model asks to do something, the runtime can stop and run programmatic checks like “Is this a known tool?”, “Are the arguments valid?”, “Does this need user approval?” “Is the requested path even inside the workspace?” Figure 10: Large outputs are clipped, older reads are deduplicated, and the transcript is compressed before it goes back into the prompt. A minimal harness uses at least two compaction strategies to manage that problem. The first is clipping, which shortens long document snippets, large tool outputs, memory notes, and transcript entries. In other words, it prevents any one piece of text from taking over the prompt budget just because it happened to be verbose. The second strategy is transcript reduction or summarization, which turns the full session history (more on that in the next section) into a smaller promptable summary. A key trick here is to keep recent events richer because they are more likely to matter for the current step. And we compress older events more aggressively because they are likely less relevant. Additionally, we also deduplicate older file reads so the model does not keep seeing the same file content over and over again just because it was read multiple times earlier in the session. Overall, I think this is one of the underrated, boring parts of good coding-agent design. A lot of apparent “model quality” is really context quality. 5. Structured Session Memory In practice, all these 6 core concepts covered here are highly intertwined, and the different sections and figures cover them with different focuses or zoom levels. In the previous section, we covered prompt-time use of history and how we build a compact transcript. The question there is: how much of the past should go back into the model on the next turn? So the emphasis is compression, clipping, deduplication, and recency. Now, this section, structured session memory, is about the storage-time structure of history. The question here is: what does the agent keep over time as a permanent record? So the emphasis is that the runtime keeps a fuller transcript as a durable state, alongside a lighter memory layer that is smaller and gets modified and compacted rather than just appended to. To summarize, a coding agent separates state into (at least) two layers: working memory: the small, distilled state the agent keeps explicitly a full transcript: this covers all the user requests, tool outputs, and LLM responses Figure 11: New events get appended to a full transcript and summarized in a working memory. The session files on disk are usually stored as JSON files. The figure above illustrates the two main session files, the full transcript and the working memory, that usually get stored as JSON files on disk. As mentioned before, the full transcript stores the whole history, and it’s resumable if we close the agent. The working memory is more of a distilled version with the currently most important info, which is somewhat related to the compact transcript. But the compact transcript and working memory have slightly different jobs. The compact transcript is for prompt reconstruction. Its job is to give the model a compressed view of recent history so it can continue the conversation without seeing the full transcript every turn. The working memory is more meant for task continuity. Its job is to keep a small, explicitly maintained summary of what matters across turns, things like the current task, important files, and recent notes. Following step 4 in the figure above, the latest user request, together with the LLM response and tool output, would then be recorded as a “new event” in both the full transcript and working memory, in the next round, which is not shown to reduce clutter in the figure above. 6. Delegation With (Bounded) Subagents Once an agent has tools and state, one of the next useful capabilities is delegation. The reason is that it allows us to parallelize certain work into subtasks via subagents and speed up the main task. For example, the main agent may be in the middle of one task and still need a side answer, for example, which file defines a symbol, what a config says, or why a test is failing. It is useful to split that off into a bounded subtask instead of forcing one loop to carry every thread of work at once. (In my mini coding agent, the implementation is simpler, and the child still runs synchronously, but the underlying idea is the same.) A subagent is only useful if it inherits enough context to do real work. But if we don’t restrict it, we now have multiple agents duplicating work, touching the same files, or spawning more subagents, and so on. So the tricky design problem is not just how to spawn a subagent but also how to bind one :). Figure 12: The subagent inherits enough context to be useful, but it runs inside tighter boundaries than the main agent. The trick here is that the subagent inherits enough context to be useful, but also has it constrained (for example, read-only and restricted in recursion depth) Claude Code has supported subagents for a long time, and Codex added them more recently. Codex does not generally force subagents into read-only mode. Instead, they usually inherit much of the main agent’s sandbox and approval setup. So, the boundary is more about task scoping, context, and depth. Components Summary The section above tried to cover the main components of coding agents. As mentioned before, they are more or less deeply intertwined in their implementation. However, I hope that covering them one by one helps with the overall mental model of how coding harnesses work, and why they can make the LLM more useful compared to simple multi-turn chats. Figure 13: Six main features of a coding harness discussed in previous sections. If you are interested in seeing these implemented in clean, minimalist Python code, you may like my Mini Coding Agent . How Does This Compare To OpenClaw? OpenClaw may be an interesting comparison, but it is not quite the same kind of system. OpenClaw is more like a local, general agent platform that can also code, rather than being a specialized (terminal) coding assistant. There are still several overlaps with a coding harness: it uses prompt and instruction files in the workspace, such as AGENTS.md, SOUL.md, and TOOLS.md it keeps JSONL session files and includes transcript compaction and session management it can spawn helper sessions and subagents Build a Reasoning Model (From Scratch) on Manning and Amazon . The main topics are evaluating reasoning models inference-time scaling self-refinement reinforcement learning distillation Amazon (pre-order) Manning (complete book in early access , pre-final layout, 528 pages)

0 views
マリウス 2 months ago

Updates 2026/Q1

This post includes personal updates and some open source project updates. 안녕하세요 and greetings from Asia! Right now I’m in Seoul, Korea. I’ll start this update with a few IRL experiences regarding my time here and some mechanical keyboard related things. If you’re primarily here for the technical stuff, you can skip forward or even skip all of the personal things and jump straight to the open source projects . With that said, let’s dive straight into it. Seoul has been one of the few places that I genuinely love coming back to. I cannot pinpoint why that is, but there’s a particular rhythm to the capital that’s hard to explain until you’ve lived in it for a while. Not the tourist rhythm, where you tick off palaces and night markets to “complete your bucket list” but the deeper, slower one that makes the city truly enjoyable. The rhythm of picking a neighborhood, learning its backstreets, finding your morning coffee spot, and then finding a different one the following week. I spent my time here doing exactly that, and what follows are some honest reflections on a city that continues to surprise me. As some of you might know by now, I’m basically the Mark Wiens of coffee, because I travel for coffee , except that I don’t film myself and put it online. But I’ve surely had a lot of coffee, in a lot of cities. However, Seoul’s coffee scene operates on a completely different level. The sheer density of independently run coffee shops is staggering. Within a fifteen-minute walk in neighborhoods like Mangwon , Hapjeong , or Sangsu , you can pass dozens of places where someone is carefully dialing in their espresso, roasting their own beans, and serving a beautifully made Americano for usually around three or four thousand KRW . That’s roughly two to three US dollars for a genuinely excellent cup of coffee, which is a pretty solid value proposition. I’ve been in Seoul before, multiple times actually, and I had the chance to find genuinely great cafes which I kept on my list of places to revisit whenever I would happen to come back. And so I did. But as life moves forward, places change or, in more unfortunate circumstances, even close down for good. das ist PROBAT is one of the places that sadly closed just a few days before I arrived. In its spot is now a new Ramen restaurant that seemed fairly popular. A few other places I’d loved on previous visits and that are still operating left me genuinely disappointed this time around. Compile Coffee was one of the sharper letdowns. Two years ago, it was a highlight. This time, however, the experience felt rushed and careless. The barista hurried through the ordering process, despite no one else waiting in line, and the cappuccino that followed was a spectacle for all the wrong reasons. The milk was frothed to an almost comical extreme, the liquid poured in first, then the foam scooped in one spoonful at a time, and finally a thick layer of chocolate powder on top that I hadn’t asked for. It felt like watching a car accident happening slowly enough for every detail to remain stuck in one’s head, yet too fast to articulate anything about it. I gave the place another try a few weeks after this incident only to experience a similarly rushed and somewhat unloving execution. Another change that I hadn’t seen coming was Bean Brothers in Hapjeong . The coffee house converted from their old industrial-style space to a noticeably more polished and… well, “posh” one. The new spot is nice enough, but the vibe has shifted towards a more upscale, less alternative one. In addition, they also opened up a new location in Sangsu , which leans further in that direction, with wait times for walk-ins that suggest a clientele they’re specifically courting. Bean Brothers seems to be evolving into a streamlined, upscale chain, and while that’s not inherently bad, it’s a different thing from what originally made it special. And last but not least, there’s Anthracite Coffee Roasters , specifically the Seogyo location , which had been one of my absolute favorite spots back in 2023. It pains me to say this, but the place has become a ripoff, with this specific location charging eight thousand KRW for a hot (drip coffee) Americano to go. For context, the healthy food chain Preppers serves a full meal consisting of a big portion of rice and a protein, as well as some greenery, for 8,900 KRW. The cup of drip coffee at Anthracite is only halfway full, and most of the time it arrives already lukewarm, which makes it essentially useless as a to-go option, unless all you want is to gulp down around 120ml of coffee. You’d think a place charging premium prices would at least discount a thousand Won for takeaways, as many Seoul cafes do. The Seogyo location’s commitment to drip coffee not only makes it feel somewhat pretentious considering the prices, but also adds a whole other layer of issues. During peak hours, the wait is considerable, and the coffee menu is limited to a small rotation of options that, more often than not, skew toward the acidic side of the spectrum. If that’s your preference, there’s nothing wrong with that. But when combined with the pricing, the lukewarm temperatures, and the half-filled cups, the experience increasingly feels like you’re paying for a brand name rather than a good cup of coffee. However, the beautiful thing about Seoul’s coffee culture is that for every established spot that drifts toward becoming another Starbucks experience, ten new places pop up that more than make up for it. The ecosystem is relentlessly self-renewing. In the same neighborhood as Anthracite ’s Seogyo location, I discovered a handful of places that are not only better in the cup, but dramatically more affordable: These are only a handful of places that I think of off the top of my head, but rest assured that there are plenty more. The quiet confidence of people who care about the craft without needing to perform it is what makes these places special. No gimmicks, no inflated prices justified by whatever interior design. Just friendly people and good coffee that’s made well and respects the customer. The time in Seoul reinforced what I already knew from past visits. This city is one of the best places in the world to simply be in. The neighborhoods are endlessly walkable, the infrastructure works beautifully (with the exception of traffic lights and escalators, but more on that in a bit), and the coffee culture, despite the occasional disappointment from places that have lost their way, remains one of the richest and most dynamic I’ve encountered anywhere. The disappointments, if anything, make the discoveries sweeter. The food also deserves a mention. Seoul is one of those cities where even a quick, unremarkable lunch tends to be delicious and more often than not at a sane price, judging from a global perspective. Compared to other capital cities like London or, worse, Madrid , in which food prices are frankly absurd, especially when taking the generally low quality into account, the cost of food in Seoul still strikes me as overall reasonable. Unlike for example Madrid , which is an almost homogenous food scene, Seoul offers incredibly diverse options, ranging from traditional Korean food, all the way to Japanese, Thai, Vietnamese and even European and Latin American food. And while the Italian pasta in many places in Seoul might not convince an actual Italian gourmet, it suddenly becomes a very high bar to complain about dishes that originate as far as twelve thousand kilometers/seven thousand miles away and that have almost no local cultural influence . Another beautiful thing about Seoul, at least for keyboard -enthusiasts like I am, is the availability of actual brick-and-mortar keyboard stores. Seoul is home to three enthusiast keyboard shops: Funkeys , SwagKeys , and NuPhy . The first two are local vendors that have physical locations across Seoul, the latter is a Hong Kong-based manufacturer of entry-level enthusiast boards that just opened a showroom in Mapo-gu . I took the time to try to visit each of them and I even scooped up some new hardware. The Funkeys store is located in the Tongsan district, on the second floor of a commercial space. The store is relatively big and stocks primarily FL-Esports , AULA , and 80Retros boards, keycaps and switches, but you can also find a few more exclusive items like the Angry Miao CyberBoard . I seized the opportunity to test (and snap up) some 80Retros switches, but more on that further down below. SwagKeys is probably a name that many people in the keyboard enthusiast community have stumbled upon at least once. They are located in the Bucheon area and they used to have a showroom, which I tried to visit. Sadly, it wasn’t clear to me that the showroom was temporarily (permanently?) closed, so I basically ended up standing in front of locked doors of an otherwise empty space. Luckily, however, SwagKeys have popup stores in different malls, which I have visited as well. Unfortunately in those popup stores they only seem to offer entry-level items; Enthusiast products are solely available through their web shop and cannot be ordered and picked-up at any of their pop-up locations. I was curious to test and maybe get the PBS Modern Abacus , which SwagKeys had in stock at that time, but none of the pop-ups had it available. Exclusive SwagKeys pop-up. This is a shared space with plenty of other brands to choose from. The NuPhy showroom in the Mapo-gu area is a small space packed with almost all the products the brand offers, from keyboards, over switches and keycaps all the way to accessories and folios /bags. However, the showroom is exactly that: A showroom. There’s no way to purchase any of the hardware. As with almost everything in Seoul, your best bet is to order it from NuPhy’s official Korean store, which accepts Naver Pay . Apart from Funkeys , SwagKeys and NuPhy , there are various brands (like Keychron , Razer and Logitech ) that can be found across in-store pop-ups in different malls. It’s interesting to see a society like the one in Seoul, that has largely moved away from offline-shopping for almost everything but fashion (more on this in a moment) having that many shops and pop-ups selling entry-level mechanical keyboards. I guess with keyboards being something in which haptics and personal preference play a big role, it makes sense to have places for people to test the various boards and switches, even if most of them will ultimately only sell the traditional Cherry profiles. Speaking of mechanical keyboards, I happened to be in the right place at the right time this year to visit the Seoul Mechanical Keyboard Expo 2026 at the Seoul Trade Exhibition Center ( SETEC ) in the Gangnam area. It was an interesting experience despite being less of a traditional enthusiast community event and more of a manufacturer trade fair targeting average users. Because yes, the average user in Korea does indeed seem to have a soft-spot for mechanical keyboards. This, however, meant that most vendors would primarily showcase the typical mainstream products, like Cherry profile keycaps and boards that are more affordable. For example while Angry Miao were around, their Hatsu board was nowhere to be seen. And it made sense: Every vendor had little signs with QR codes that would lead to their store’s product page for people to purchase it right away. Clearly, the event was geared more toward the average consumer than the curious enthusiast. It was nevertheless interesting to see an event like this happening in the wild . Getting around is different in Seoul than it is in other cities. If you’re navigating Seoul with Google Maps , you’re doing it wrong. Naver Map is simply superior in every way that matters for daily life here, although this might soon change . Not only does Naver show you where the crosswalks are, something you don’t realize you need until you’ve jaywalked across six lanes of traffic because Google told you the entrance was “right there” , but it also shows last order times for restaurants and cafes, saving you from going to places only to find out they’re not serving anymore. And public transit arrival times? Accurate to a degree that feels almost unsettling. You trust Naver , because it earns that trust. Clearly, however, me being me , I only used Naver without an account and on a separate profile on my GrapheneOS phone . Also, I mostly use it for finding places and public transit; For everything else CoMaps works perfectly fine, and I take care to contribute to OSM whenever I can. Note: The jaywalking example isn’t too far-fetched. You’re very tempted to cross at red lights simply because traffic light intervals in Seoul are frankly terrible. As a pedestrian you age significantly waiting for the stoplight to finally turn green. If you’re unlucky, you’re at a large crossing that is followed by smaller crossings, which for reasons I cannot comprehend turn green for pedestrians at the exact same time. Unless you are Usain Bolt there is no way to make it across multiple crossings in one go, leading you to have to stop at every crossing for around three minutes. That doesn’t sound like much, until you’re out at -15°C/5°F. Seoul has too many pedestrian crossings with traffic lights, and too few simple marked crosswalks. This is however probably due to drivers often not giving a damn about traffic rules and almost running over people trying to cross at regular marked crossings. My gut feeling tells me that, because of the indifference of drivers, the government decided to punish every traffic participant by building traffic lights at almost every corner. However, this didn’t have the (supposedly) intended effect, as especially scooters, but also regular cars often couldn’t care less about their bright red stop light. Considering the amount of CCTVs (more on this in just a second) one could assume that traffic violations are being enforced strictly. However, judging by the negligence of drivers towards traffic rules I would guess that this is probably not happening. Circling back to the painfully long waiting times at crossings, that are only outrivalled by painfully slow escalators literally everywhere, a route for which CoMaps estimates 10 minutes can hence easily become a 20 minute walk. Naver , however, appears to be making time estimations based on average waiting times at crossings, leading to it being more accurate than CoMaps in many cases. With Naver being independent of Google , it works without any of the Google Play Services bs that apps often require for anything related to location. And don’t get me wrong, Naver is just as much of an E Corp as Google , but there’s something worth appreciating on a broader level here. Korea built and maintains its own mapping platform rather than ceding that ground to US big tech, and it shows. Naver Map is designed by people who actually navigate Korean cities, and that local knowledge is baked into every interaction. I would love to see more countries doing the same, especially European ones. While there is Nokia HERE Maps HERE WeGo in Europe, it’s as bad for public transport as you might expect from a joint venture between Audi , BMW and Mercedes-Benz , and it is not at all comparable to Naver Maps , let alone Naver as a whole. One big caveat with Naver , however, is that it will drain your battery like a Mojito on a hot summer evening, so it’s essential to carry a power bank . Even on a Pixel 8 , the app feels terribly clunky and slow. In addition, the swiping recognition more often than not mistakes horizontal swipes (for scrolling through photos of a place) for vertical swiping, making it really cumbersome to use. I assume that on more modern Samsung and Apple devices the app probably works significantly better, as the Korean market appears to be absolutely dominated by these two brands. As a matter of fact, the Google Pixel is not even being sold in Korea, which brings me to one important aspect of life in Seoul that might be interesting for the average reader of this site. As much as I enjoy Seoul, it is an absolute privacy disaster. CCTV cameras in Seoul are everywhere and the city government actively expands and upgrades them as part of its public-safety and smart city initiatives. The systems are “AI” -enabled and can automatically detect unusual behavior or safety risks . It’s hard to find a definitive number, but it’s estimated that Seoul is covered with around 110,000 to 160,000 surveillance cameras, with an ongoing expansion of the network. This makes Seoul one of the most surveilled major cities in the world. In addition to CCTV surveillance, Seoul is also almost completely cashless. Most places only accept card/NFC payments with cash payments being a highly unusual thing to do. While there are still ATMs around, getting banknotes is almost pointless. You can top up your transit card using cash, and you might be thinking that at least this way nobody knows who owns the card and you cannot be tracked, but with the amount of “AI” cameras everywhere, there’s no need to track people using an identifier as primitive as a transit card. Speaking of which, mobile connectivity is another thing. In Korea SIM cards are registered using an ID/Passport. From what I have found, there’s no way to get even just a pre-paid SIM without handing over your ID. In addition, with everything being cashless, your payment details are also connected to the SIM card. You could of course try to only use the publicly available WiFi to get around and spare yourself the need for a SIM card. However, the moment you’d want to order something online, you will need a (preferably Korean) phone number that can retrieve verification SMS and you might even need to verify your account with an ID. You might think that this doesn’t really matter because online shopping isn’t something vital that you have to do. But with Seoul being almost completely online in terms of shopping you cannot find even the most basic things easily in brick-and-mortar stores. For example, I was looking to upgrade my power brick from the UGREEN X757 15202 Nexode Pro GaN 100W 3-Port charger that I’ve been using for the past year to the vastly more powerful UGREEN 55474 Nexode 300W GaN 5-Port charger. I bought the 3-Port Nexode last year during my time in Japan , in a Bic Camera . However, in Seoul it was impossible to find any UGREEN product. In fact, I could not find any household name products, like Anker or Belkin , regardless of where I looked. Everyone kept telling me to look online, on Naver or Coupang . Short story long, to be able to live a normal life in Seoul you will unfortunately have to hand over your details at every corner. Note : Only one day before publishing this update, the popular Canadian YouTuber Linus Tech Tips uploaded a video titled “Shopping in Korea’s Abandoned Tech Mall” , which perfectly captures the sad state of offline tech stores in Seoul. What I found more shocking than this, however, is that it doesn’t seem like privacy concerns are part of the public discourse. The dystopian picture that people in the Western hemisphere paint in literature and movies, in which conglomerates run large parts of society and the general population are merely an efficient workforce and consumers isn’t far off from how society here appears to be working. At the end of February I ran into an issue that I had seen before : Back then, I attributed it to either alpha particles or cosmic rays, as I was unable to reproduce the issue nor reliably find bad regions in the RAM. This time, however, my laptop was crashing periodically, for seemingly no reason at all. After running the whole playbook of and to verify the filesystem, as well as multiple rounds of the , I found several RAM addresses that were reported faulty. I decided to seize the opportunity and publish a post on BadRAM . At this point, I removed one of the two 32GB RAM sticks and it appears to have helped at least somewhat: The device now only crashes every few hours rather than every twenty or so minutes. But with RAM and SSD prices being what they are, I’m not even going to attempt to actually fix the issue. After all, it might well be that whatever is causing the buzzing sound I’ve been hearing on my Star Labs StarBook has also had an impact on the RAM modules or even the logic board. I’m going to hold on to this hardware for as long as possible, but I’ve also realized that the StarBook has aged quicker than I anticipated. I have therefore been glancing at alternatives for quite a while now. I love what Star Labs has done with the StarBook Mk VI AMD in terms of form factor and Linux support. Back when I bought it , the Zen 3 Ryzen 7 5800U had already been on the market for almost 4 years and wasn’t exactly modern anymore. However, its maturity gave me hope that Linux support would be flawless (which is the case) and that Star Labs would eventually be able to deliver on their promises. When I purchased the device, Star Labs had advertised an upcoming upgrade from its American Megatrends EFI (“BIOS”) to Coreboot , an open-source alternative. Years later, however, this upgrade is still nowhere to be seen . At this point it is highly unlikely, that Coreboot on the AMD StarBook will ever materialize. As already hinted exactly one year ago I’m done waiting for Star Labs and I am definitely not going to look into any of their other (largely obsolete) AMD offerings, especially considering the outrageous prices. I’m also not going to consider any of their StarBook iterations, whether it’s the regular version, or the Horizon , given that none of them come with AMD CPUs any longer, and, more importantly, that their Intel processors are far too outdated for their price tags. Let alone all the quirks the Star Labs hardware appears to be having, and the firmware features that sometimes make me wonder what the actual f… the Star Labs people are smoking. Note : The firmware update lists the following update: * Remove the power button debounce (double press is no longer required) “Power button debounce” is what Star Labs calls the requirement to double-press the power button in order to power on the laptop when it is not connected to power. It is mind-boggling that this feature made it into the firmware to begin with. Who in their right mind thought “Hey, how about we introduce a new feature with the coming firmware update which we won’t communicate anywhere, which requires the user to press the power button quickly twice in a row for their device to power on, but only when no power cable is connected? And how about if they only press it once when no power cord is attached the device simply won’t boot, but it will nevertheless produce a short audible sound to make it seem like it tried to boot, but in reality it won’t boot?” …? Because this is exactly what the “power button debounce” was about. I believe it got introduced sometime around , but I can’t really tell, because Star Labs didn’t mention it anywhere. Short story long, instead of spending more money on obsolete and quirky Star Labs hardware, I have identified the ASUS ExpertBook Ultra as a potential successor. The ExpertBook Ultra is supposed to be released in Q2 in its highest performance variant, featuring the Intel Core Ultra X9 Series 3 388H “Panther Lake” processor, running at 50W TDP and sporting up to 64 GB LPDDR5x memory, which is the model that I’m interested in. I will wait out the reviews, specifically for Linux, but unless major issues are to be expected I’ll likely upgrade to it. “Wait, aren’t you Team Red?” , you might be wondering. And, yes, for the past decade I’ve been solely purchasing AMD CPUs and GPUs, with one exception that was a MacBook with Intel CPU. However, at this point I’m giving up on ever finding an AMD-based laptop that fits my specs, because sadly with AMD laptops it’s always something : Either the port selection sucks, or there’s no USB4 port at all, or if there is it’s only on one specific side, or the display and/or display resolution sucks, or the battery life is bad, or you can only get some low-TDP U variant, or the device is an absolute chonker, or or or. It feels like with an AMD laptop I always have to make compromises at a price point at which I simply don’t want to have to make these compromises anymore. So unless AMD and the manufacturers – looking specifically at you, Lenovo! – finally get their sh#t together to build hardware that doesn’t feel like it’s artificially choked, I’m going back to Team Blue . “Panther Lake” seems to have made enough of a splash, TDP-performance-wise, that it is worth considering Intel again, despite the company’s history of monopolistic business tactics, its anti-consumer behavior, its major security flaws, its quality control issues, and its general douchebag attitude towards everything and everyone. The ASUS ExpertBook Ultra appears to feature the performance that I want, with all the connectivity that I need, packaged in a form factor that I find aesthetically pleasing and lightweight enough to travel with. If the Intel Core Ultra X9 388H notably exceeds the preliminary benchmarks and reviews of the Intel Core Ultra X7 358H version of the ExpertBook Ultra , then I’m “happy” to pay the current market premium for a device that will hopefully hold up for much longer and with fewer quirks than I’ve experienced with the StarBook . With a Speedometer 3.1 rating of around 30 and reporting 11:25:05 hours for on my current device, however, I’m fairly certain that even the X7 358H will be a significant improvement. “Did you hear about the latest XPS 14 & 16 from Dell? They also come with Panther Lake!” , I hear you say. See here and there on why those are seemingly disappointing options. The tl;dr is that Dell only feeds them 25W (14") / 35W (16"), instead of the 45W that ASUS runs the CPU at. I can’t tell for sure how long I’ll be able to continue working on the StarBook . While I can do the most critical things, the looming threat of data-corruption and -loss is frightening. The continuous crashes also introduce unnecessary overhead. I’m hoping for ASUS to make the ExpertBook Ultra available rather sooner than later, but if there’s no clarity on availability soon I might have to go with a different option. Ultrabook Review luckily has a full list of Panther Lake laptops to help with finding alternatives. What’s the second best thing that can happen when your computer starts failing? Exactly: Your phone (slowly) dying. It appears that the infamous Pixel 8 green-screen-of-death hit my GrapheneOS device, making it almost impossible to use it. Not only does the display glitch terribly, but it appears that the lower bottom part of the phone gets abnormally hot. When the glitching began, it would be sufficient to literally slap the bottom part of the phone and it would temporarily stop glitching. Sadly, the effectiveness of this workaround has decreased so much over time that now I basically need to squeeze the bottom part of the phone for the glitching to stop. The moment I decrease force, the screen starts glitching again. My plan was to keep the Google Pixel 8 for the next few years and eventually move to a postmarketOS /Linux phone as soon as there will be a viable option. Sadly it seems that I’m going to have to spend more money on Google’s bs hardware to get another GrapheneOS device for the time being. Unfortunately Google is not selling the Pixel devices across Asia, making it hard to find an adequate replacement for the phone right now. I might just have to suck it up and wait until I’ll pass by a region in which Pixel devices are more widely available. Of course, I luckily brought backups , although those run malware and are hence less than ideal options. My Anker Soundcore Space Q45 have died on me during a flight, for absolutely no reason at all. I purchased them back at the end of May 2024 and now, after not even 2 years it appears that the electronics inside of them broke in a way in which the headphones cannot be turned off or on again. They seem to be in a sort-of odd state in between, in which pressing e.g. the ANC button does something and makes the LED light up, but there’s no Bluetooth connectivity whatsoever. When connecting them via USB-C to power or to another device, the LED changes dozens of times per second between white and red. Holding the power button makes the LED turn on (white) but nothing else. The moment the power button is let go, the LED turns back off. This is yet another Anker product that broke only shortly after its warranty expired and I’m starting to see a common theme here. Hence, I will avoid Anker products going forward, especially given the tedious support that I had experienced in the past with one of their faulty power banks. I still use the Soundcore headphones via audio jack, as this luckily works independently of the other electronics. To avoid anything bad happening, especially during flights, I opened the left earcup and removed the integrated battery. The USI 2.0 stylus that I had bought back in mid September of 2024 from the brand Renaisser is another hardware item that has pretty much died. It seems like the integrated battery is done, hence the pen doesn’t turn on anymore unless a USB-C cable is connected to it to power it externally. While I’m still using it, it is slightly inconvenient to have a relatively stiff USB-C cable pull on the upper end of the pen while writing or editing photos, which is what I use the pen primarily for. As mentioned in the Seoul part, I picked up a handful of mechanical keyboard-related items, namely MX switches for my keyboard(s) . KTT x 80Retros GAME 1989 Orange , 40g (22mm KOS single-stage extended, bag lubed with Krytox 105 ), lubed with Krytox 205G0 . 80Retros x HMX Monochrome , 42g (48g bottom out), LY stem, PA12 top housing, HMX P2 bottom housing, 22mm spring, factory lubed, 2mm pre-travel, 3.5mm total. I invested quite some time in pursuing my open source projects in the past quarter, hence there are a few updates to share. This quarter I have finally found the time to also update my feature and make it work with the latest version of Ghostty , the cross-platform terminal emulator written in Zig. You can use this commit if you want to patch your version of Ghostty with this feature. It is unlikely that the Ghostty team is ever going to include this feature in their official release, yet I’m happy to keep maintaining it as it’s not a lot of code. I have updated and it now supports a new flag (that does not support), which makes it possible to build a complete power management policy directly through command-line arguments. I have documented it in detail in the repository , but the idea is that the flag allows executing arbitrary shell commands when the battery reaches a specific percentage, either by charging or discharging. The flag takes three arguments: For , the command fires when the battery percentage drops to or below the given value. For , it fires when the percentage reaches or exceeds it. The command fires once when the condition is met and will only fire again after the condition has cleared and been met again. Additionally, the flag can be specified multiple times to define different rules. This makes it possible to build a complete power management policy, from low-battery warnings to automatic shutdown, without any external scripts or configuration files. The benefit this has over, let’s say, rules, is that script execution as the current user is significantly easier, less hacky and poses fewer overall security risks, as does not need to (read: should not ) be run in privileged mode. Another one of my Zig tools that got a major update is , the command line tool for getting answers to everyday questions like or more importantly . The new version has received an update to work with Zig 0.15.0+ and its command line arguments parser logic was rewritten from scratch to be able to handle more complex cases. In addition, is now able to do a handful of velocity conversions, e.g. . As a quick side note, alongside the Breadth-first search implementation that it is using, , has also been updated to support Zig 0.15.0+. I had some fun a while ago building an XMPP bot that’s connected to any OpenAI API (e.g. ) and is able to respond when mentioned and respond to private messages. It preserves a single context across all messages, which might not be ideal in terms of privacy, but it is definitely fun in a multi-user chat – hey, btw, come join ours! The code is relatively crude and simple. Again, this was a just a two-evening fun thing, but you can easily run the bot yourself, check the README and the example configuration for more info. The work on my new project, ▓▓▓▓▓▓▓▓▓▓▓, which I had announced in my previous status update sadly didn’t progress as quickly as I was expecting it to, due to (amongst other things) the RAM issues that I’ve had to deal with. It also turns out that when writing software in 2026, everyone seems to expect instant results, given all the Codexes and Claudes that are usually being employed these days to allow even inexperienced developers to vibe code full-blown Discord alternatives within shorts periods of time. However, because I don’t intend to go down that path with ▓▓▓▓▓▓▓▓▓▓▓, it will sadly take some more time for me to have a first alpha ready. To everyone who reached out to offer their help with alpha testing: You will be the first ones to get access as soon as it’s ready. Kauf Roasters : A roastery with a clear focus on simplicity and quality without pretension. Identity Coffee Lab : This one stunned me. A hot Americano to go for 3,000 KRW. That’s almost a third of what Anthracite charges. And the coffee isn’t just cheaper, it is significantly better! It’s a bigger cup, it’s notably less acidic, and, here’s the part that really got me, it comes out steaming hot and stays that way for a good twenty minutes. You can actually walk around and sip it casually, even in freezing cold temperatures, just the way a to-go coffee is meant to be enjoyed, instead of gulping it down before it turns into cold brew. Oscar Coffee Booth : This became a personal favorite. Another spot where the coffee is serious, the price is fair, and nobody is trying to impress you with anything other than a well-made drink. On top of that the owner is a genuinely kind person. : Either (aliases: , ) or (aliases: , ) : The battery level (number from 0 to 100) : The shell command to execute

0 views