Posts in C (20 found)
<antirez> 6 days ago

Distributing LLM inference in DwarfStar

High end NVIDIA cards, and the server and power needed to run them, cost a lot of money, especially if you plan to reach enough VRAM to run massive models. The alternative, so far, has been Apple hardware, or the DGX Spark that, even if severely limited because of memory bandwidth, still allows to run LLMs prompt processing (prefill) fast enough. The Mac Studio provided up to 512GB unified memory, a solution with modest memory bandwidth (but much better than the Spark) and compute at a price that was, after all, given the current situation, relatively fair. For instance, with DwarfStar the Mac Studio M3 Ultra 512GB can run DeepSeek v4 PRO at 150 t/s prefill and ~10-13 t/s decoding, not great but at a level that is usable for certain use cases. Even 2-bit quantized, DeepSeek v4 PRO resists very well, like Flash at the same quantization (today I made PRO write a C compiler, I'll publish the video soon). I would not consider a trivial fact to run a frontier model at home, with a ~12k total spending. One could expect this to get better and better, but the situation at the horizon appears cloudy. There is almost zero hope that NVIDIA setups will get less expensive, and even a small company can’t afford to easily purchase and handle a small data center for local inference. At the same time the RAM shortage is making it not exactly likely that we will see a Mac Studio with an M5 Ultra, maybe 1.2T/s memory bandwidth and more compute (the M5 Max is already faster, compute wise, and has the Neural Accelerators inside each GPU core that help with certain models). So the current situation for local inference is that the best machine is probably a laptop. The M5 Max 128GB can run DeepSeek v4 Flash and Mimo V2.5, 2-bit quantized, at very decent prefill and decoding speeds. We are talking of ~500 t/s prefill and ~35-40t/s decoding speed, with a performance slope as the context size increases which is very acceptable. At the cost of 6-7k depending on the configuration, this is currently one of the best deals. If this is the situation, for local inference projects in general, and for DwarfStart in particular, looking at distributed inference starts to be interesting. What we can do if we have two, three, four MacBook M5 Max systems? Or two M3 Ultra with 512 GB of RAM? Traditionally there are two main systems to run distributed inference. One is to duplicate memory by loading 50% of the transformer layers in computer A, the remaining 50% on computer B, and running the inference in a sequential way. In this case there is to send just the activations around, that’s very simple conceptually, and with some micro-batching magic it is possible to not just duplicate the memory but even in theory to increase substantially the prompt processing speed (but not the decoding: for a single token generation you have to wait the first layers on machine A, the remaining layers on machine B, and so forth — but at least less heat will be produced so it is possible to use a sustained load), which is not bad at all. This means, for example, that the lucky ones that have two Mac Studio 512GB machines could run full size DeepSeek v4 PRO (even if even the 2-bit quants are running very, very well) and with micro-batching even enjoy a faster prefill. Another approach is, using Apple RDMA, to parallelize the execution across the two machines, a vertical split basically. For instance one could try to load the same 2 bit quants on machine A and B, so that both fit, and each side has *all* the routed experts. Then for each layer we could try to do the coordination needed in order to execute half the experts in machine A, half in machine B, and so forth (note that both machines have all the experts, so whatever the router says, we can send 50% of the computation to the other machine, and the activations are tiny). This is more viable for the PRO that has much larger routed experts, so the communication penalty is less sensible. But if this could be made to work well, is all to be seen. There is also tensor parallelism, you are thinking, right? But I bet this is not viable at all with the communication speed we have among two Apple computers, two DGX Spark and so forth (go read the speed of NVLink). The magic about the above two models is that you have to send very little data. Ok, so far I bet you are thinking, this is the same shit everybody knows about running LLMs in a parallel fashion, and indeed this is true. But this post was conceived to reach this exact point. What about if we could, instead, parallelize two Mac or DGX in a completely different way? Open weights models are now in a golden age, we have plenty and many are very powerful. In the 128GB 2-bit quants classes there are many interesting: Minimax M2.7, Mimo V2.5, DeepSeek v4 Flash, and a few more. At the same time it was recently noted that LLMs ensemble (https://arxiv.org/abs/2502.18036) is an understudied possibility that allows two models to run in a completely shared-nothing way in two different machines, to only combine the logits or select the best continuation at the end. There are different ways to do that, and it works even if the two models have different vocabularies: you can pick the continuation where the perplexity is lower (that is, pick the model which is more sure: it’s like a two experts MoE where the routing is implicit), and it is even possible to combine the logits (with some complexities given by the different vocabularies) and sample from there. More recent papers suggest that mixing the two techniques is the best approach. Anyway: these techniques seem to really work, models appear to do better than alone. It’s like if their knowledge is improved because each one brings his POV on what to say next. Maybe this is one of the most logical third approach to try, other than the first two. I really hope to find the time to play more with all that, in the next months. Comments

0 views

How my minimal, memory-safe Go rsync steers clear of vulnerabilities

Back in January 2025, multiple different security researchers published a total of 6 security vulnerabilities in rsync , some of which allow arbitrary code execution and file leaks, so naturally I was wondering whether/how my gokrazy/rsync implementation was affected. Did implementing my own (compatible, but minimal) rsync in Go, a modern and memory-safe programming language, really rule out entire classes of security vulnerabilities? This deep dive article was in the making since January 2025, but was delayed because we uncovered more unpublished vulnerabilities in the process! The “Security Vulnerabilities” section now covers all 12 vulnerabilities from the January 2025 batch and the May 2026 batch. If you are running (upstream, samba) rsync in production, upgrade to version 3.4.3 or newer. If you are running gokrazy/rsync in production, upgrade to version v0.3.3 or newer. Feel free to skip over the nitty-gritty security issue details and jump directly to: For context, I blogged about rsync, how I use it, and how it works back in June 2022. See also all posts tagged “rsync” . The original motivation for writing my own rsync (back then only a server, today all directions are supported) was to provide the software packages of distri, my Linux distribution research project for fast package management , which I wanted to host on router7 , my small home Linux+Go internet router, which in turn is built on gokrazy , my Go appliance platform. I am still running multiple gokrazy/rsync servers for this original purpose, and also many others! Having rsync available as a primitive (that you can link into your Go programs!) is really nice. This article covers the following security vulnerabilities: The first batch of the vulnerabilities above was announced on the oss-security mailing list , but note that the original report has more detail compared to the oss-security summaries! The later vulnerabilities were announced via GitHub Security Advisories on the rsync project . When the checksums are read by the daemon, two different checksums are read: Most importantly, note that field is filled with bytes. always has a size of 16: rsync.h is an attacker-controlled value and can have a value up to bytes, as the next snipper shows: The problem here is that can be larger than 16 bytes, depending on the digest support the binary was compiled with: md-defines.h support is common and sets the value to 64. As a result, an attacker can write up to 48 bytes past the buffer limit. Upstream fix: The upstream fix for CVE-2024-12084 changes the field to a dynamically-allocated field, which is allocated with length, and fixes the bounds check to check against the (checksum length for this transfer’s algorithm). Can Go help prevent this? Yes: Missing or incorrect bounds checks will not result in a heap buffer overflow in Go! Instead, attempting to write out of bounds will result in a panic because the Go runtime performs bounds checks. How does gokrazy/rsync fare? gokrazy/rsync also had insufficient validation! Our issue was different, though: It wasn’t size confusion, we just were not doing any validation of the sum header at all — oops! We can confirm that the Go runtime’s bounds check triggers on an attempt to write out of bounds by changing the code like so and running the tests: As expected, the Go runtime panics with the following message: Of course, crashing the entire server is not the best failure mode, so I added the missing bounds checking to turn the panic into an error . Because of the same lack of validation as in the previous CVE-2024-12084 vulnerability, an attacker could select a checksum algorithm with short checksums (e.g. with 8 byte checksums), but then claim they were sending longer checksums (e.g. 9 bytes), making the victim leak one byte of uninitialized stack content in the response. Leaking one byte of stack content may seem benign, but as the Google Security report puts it: The first pair of vulnerabilities are a Heap Buffer Overflow and an Info Leak. When combined, they allow a client to execute arbitrary code on the machine a Rsync server is running on. The client only requires anonymous read-access to the server. The daemon matches checksums of chunks the client sent to the server against the local file contents in . Part of the function prologue is to allocate a buffer on the stack of bytes: The daemon then iterates over the checksums the client sent and generates a digest for each of the chunks and compares them to the remote digest: Notably, the number of bytes that are compared again are bytes. In this case, the comparison does not go out of bounds since can be a maximum of . However, the local buffer, not to be confused with the attacker-controlled , is a buffer on the stack that is not cleared and thus contains uninitialized stack contents. A malicious client can send a (known) checksum for a given chunk of a file, which leads to the daemon writing 8 bytes to the stack buffer . The attacker can then set to 9 bytes. The result of such a setup would be that the first 8 bytes match and an attacker-controlled 9th byte is compared with an unknown value of uninitialized stack data. An attacker can divide a file into 255 chunks and as a result leak one byte per file download. An attacker can incrementally repeat the process, either in the same connection or by resetting the connection. As a result, they can leak bytes of uninitialized stack data, which can contain pointers to Heap objects, Stack cookies, local variables and pointers to global variables and return pointers. With those pointers they can defeat ASLR. Upstream fix: There are two relevant upstream fixes: Can Go help prevent this? Yes: By design, Go initializes all variables to the zero value. Go programmers do not need to remember to explicitly initialize variables. How does gokrazy/rsync fare? gokrazy/rsync is not affected by this vulnerability: Variables are always initialized in Go. Additionally, selecting checksums other than MD4 was only introduced in protocol version 30 (gokrazy/rsync implements protocol version 27). Description: (quoting the Google Security report ) When the syncing of symbolic links is enabled, either through the or ( ) flags, a malicious server can make the client write arbitrary files outside of the destination directory. A malicious server can send the client a file list such as: Symbolic links, by default, can be absolute or contain characters such as . In practice, the client validates the file list and when it sees the entry, it will look for a directory called , otherwise it will error out. If the server sends as [both, a directory and a symbolic link], [the client] will only keep the directory entry, thus the attack requires some more details to work. In mode, which the server can enable for the client, the server sends the client multiple file lists. The deduplication of the entries happens on a per-file-list basis. As a result, a malicious server can send a client multiple file lists, where: As a result, the directory is created first and is considered a valid entry in the file list. Then, the attacker changes the type of to a symbolic link. When the server then instructs the client to create the file, it will follow the symbolic link and thus files can be created outside of the destination directory. Can Go help prevent this? No. This vulnerability is caused by a logic error: when multiple file lists are used, the merged file list needs to be re-verified. But see Defense in depth: Go’s Upstream fix: The upstream fix for CVE-2024-12087 adds the missing validation. How does gokrazy/rsync fare? gokrazy/rsync is not affected by this vulnerability: gokrazy/rsync does not implement the incremental recursion mode ( ). The trade-off here is implementation complexity vs. resource usage: the incremental recursion mode allows working with the file set in a “windowed” way, as opposed to having to scan the entire file set before any transfer can begin. See also my How does rsync work? blog post. Description: (quoting the Google Security report ) The CLI flag makes the client validate any symbolic links it receives from the server. The desired behavior is that symbolic links target can only be 1) relative to the destination directory and 2) never point outside of the destination directory. The function is responsible for validating these symbolic links. The function calculates the traversal depth of a symbolic link target, relative to its position within the destination directory. As an example, the following symbolic link is considered unsafe: As it points outside the destination directory. On the other hand, the following symbolic link is considered safe as it still points within the destination directory: This function can be bypassed as it does not consider if the destination of a symbolic link contains other symbolic links in the path. For example, take the following two symbolic links: In this case, foo would actually point outside the destination directory. However, the function assumes that is a directory and that the symbolic link is safe. Upstream fix: The upstream fix for CVE-2024-12088 makes stricter by not allowing anywhere within the path, except at the very beginning. Can Go help prevent this? No. This vulnerability is caused by a logic error: the validation function was incorrect. We could have implemented that same bug. But see Defense in depth: Go’s How does gokrazy/rsync fare? gokrazy/rsync is not vulnerable: The feature is not yet implemented in gokrazy/rsync. The rsync receiver (in client mode) did not sanitize file names provided by the rsync sender, or otherwise prevent opening files outside the destination tree. A malicious sender could instruct a receiver to compare checksums of arbitrary files outside the destination tree. By observing the receiver’s reaction to a provided one-byte checksum, a malicious sender can leak arbitrary files. When a client connects to a malicious server the server is able to leak the contents of an arbitrary file on the client’s machine. In the client will read type as well as the from the server if the server sets the appropriate flags. The flag will not be set for the client. The caller ( ) then uses the server provided values to determine a file to compare the incoming data with. In the contents of the file specified by are copied into the destination file. This can be achieved by the server sending a negative token. The server sends a checksum to compare. If they don’t match, a 0 is returned. When the return value is 0 the receiver will then send a to the generator. The generator will then write a message to the server. The server can use this as a signal to determine if the checksum they sent was correct. By starting off with a of 1 a malicious server is able to determine the contents of the target file byte by byte. Upstream fix: The upstream fix for CVE-2024-12086 prevents opening files outside the destination tree by verifying the sender-provided path. Can Go help prevent this? Yes, Go offers an API to prevent this, see Defense in depth: Go’s . How does gokrazy/rsync fare? gokrazy/rsync is not vulnerable: the fuzzy matching feature was introduced with rsync protocol version 29, but gokrazy/rsync implements protocol version 27. Description: (quoting the Red Hat Security Advisory ) A flaw was found in rsync. This vulnerability arises from a race condition during rsync’s handling of symbolic links. Rsync’s default behavior when encountering symbolic links is to skip them. If an attacker replaced a regular file with a symbolic link at the right time, it was possible to bypass the default behavior and traverse symbolic links. Depending on the privileges of the rsync process, an attacker could leak sensitive information, potentially leading to privilege escalation. Upstream fix: The upstream fix for CVE-2024-12747 changes calls in the rsync sender to use the option. The paths are not expected to be symlinks at that point in the algorithm (symlinks would be handled with ). Can Go help prevent this? Yes, Go offers an API to prevent this, see Defense in depth: Go’s . How does gokrazy/rsync fare? gokrazy/rsync was vulnerable before commit , which introduces the same mitigation that upstream rsync uses. To reproduce the issue, use the following steps: Check out gokrazy/rsync v0.2.7: Patch the code as follows to undo the fix and execute the attack: Running the test now shows that the server traversed the symlink: A surprising discovery When I shared a draft of this article with Damien Neil, member of the Go Security Team and the author of the traversal-resistant API , he pointed out: I believe the gokrazy fix for CVE-2024-12747 is insufficient. You’re calling with , but only prevents symlink traversal in the last path component. This is probably still vulnerable to replacing an earlier path component so can be redirected by symlinking to . We reported this to the rsync security contact address in April 2025. In December 2025 I learned that someone else had also independently discovered and reported this issue. Ultimately, this resulted in CVE-2026-29518, published on 2026-05-20. Description: (quoting the rsync 3.4.3 NEWS entry ) TOCTOU symlink race condition allowing local privilege escalation in daemon mode without chroot. An rsync daemon configured with is exposed to a time-of-check / time-of-use race on parent path components. A local attacker with write access to a module can replace a parent directory component with a symlink between the receiver’s check and its open(), redirecting reads (basis-file disclosure) and writes (file overwrite) outside the module. Under elevated daemon privilege this allows privilege escalation. Default is not exposed. Reach: local attacker on the daemon host, write access to a module path, daemon configured with . Upstream fix: The upstream fix for CVE-2026-29518 uses , which is similar to Go’s API. Can Go help prevent this? Yes, Go offers an API to prevent this, see Defense in depth: Go’s . How does gokrazy/rsync fare? gokrazy/rsync was vulnerable until I switched the sender and the receiver to the traversal-resistant API . Description: (quoting the GitHub Security Advisory ) Description: The receiver’s compressed-token decoder accumulated a 32-bit signed counter without overflow checking. A malicious sender can trigger an overflow that, with careful manipulation, leaks process memory contents to the attacker – environment variables, passwords, heap and library pointers – significantly weakening ASLR and facilitating further exploitation. Reach: authenticated daemon connection with compression enabled (the default for protocols >= 30 when both peers advertise it). Disabling compression on the daemon (“refuse options = compress” in rsyncd.conf) is the available workaround. Upstream fix: The upstream fix for CVE-2026-43618 introduces the missing checks. How does gokrazy/rsync fare? gokrazy/rsync is not vulnerable because it does not implement compression. See gokrazy/rsync issue #35 for details on why compression support sounds simple, but is non-trivial. Description: (quoting the GitHub Security Advisory ) The 2025 fix that added a guard in was not applied to the visually-identical block in . A malicious rsync server can drive any connecting client into a deterministic by setting in the compatibility flags, sending a flist whose first sorted entry is not a leading “.” directory (which causes to set ), then sending a transfer record with and a non- iflag word. The receiver reads and dereferences the result. On glibc x86-64 the dereferenced pointer is mmap chunk metadata that lands at an unmapped address, hence a clean ; non-glibc allocators have not been audited. Reach: any rsync client doing a normal pull from an attacker-controlled URL. Works for both rsync:// URLs and remote-shell pulls. is the protocol-30+ default; no special options are required on the victim. Workaround: on the client. Upstream fix: The upstream fix for CVE-2026-43620 adds the guard to as well. How does gokrazy/rsync fare? Just like for CVE-2024-12087 , gokrazy/rsync is not affected by this vulnerability: gokrazy/rsync does not implement the incremental recursion mode ( ). Description: (quoting the GitHub Security Advisory ) Description: Earlier fixes for symlink races on the receiver’s open() call (CVE-2026-29518) missed the same race class on every other path-based system call: chmod, lchown, utimes, rename, unlink, mkdir, symlink, mknod, link, rmdir, lstat. On rsync daemons with “use chroot = no” a local attacker with filesystem access on the daemon host can swap a symlink into a parent directory component between the receiver’s check and one of these syscalls, redirecting it outside the exported module. The fix routes each affected path-based syscall through a parent dirfd opened under RESOLVE_BENEATH-equivalent kernel-enforced confinement (openat2 on Linux 5.6+, O_RESOLVE_BENEATH on FreeBSD 13+ and macOS 15+, per-component O_NOFOLLOW walk elsewhere). Default “use chroot = yes” is not exposed. Reach: local attacker on the daemon host, write access to a module path, daemon configured with use chroot = no. Upstream fix: The upstream fix for CVE-2026-43619 uses the family of syscalls, just like Go’s . Can Go help prevent this? Yes, Go offers an API to prevent this, see Defense in depth: Go’s . How does gokrazy/rsync fare? gokrazy/rsync is not affected, because it uses Go’s API throughout. Description: (quoting the GitHub Security Advisory ) On an rsync daemon configured with the global rsyncd.conf setting, the reverse-DNS lookup of the connecting client was performed after the daemon had chrooted into . If did not contain the files glibc needs for resolution ( , , , NSS service modules), the lookup failed and the connecting hostname was set to “UNKNOWN”. Hostname-based deny rules (“hosts deny = *.evil.example”) therefore could not match, and an attacker controlling their PTR record could connect from a hostname the administrator had intended to deny. IP-based ACLs are unaffected. The per-module setting is unrelated to this issue. Reach: rsync daemon configured with AND hostname-based ACLs AND does not include the libc resolver fixtures. Upstream fix: The upstream fix for CVE-2026-43617 moves the DNS lookup to an earlier point in the protocol. How does gokrazy/rsync fare? gokrazy/rsync is not vulnerable because we only implement IP-based allow/deny lists, not hostname-based allow/deny lists. Description: (quoting the GitHub Security Advisory ) The rsync client’s HTTP proxy support contains an off-by-one out-of-bounds stack write in ( ). After issuing the request, rsync reads the proxy’s first response line one byte at a time into a 1024-byte stack buffer with the bound , so the loop only ever writes . If the proxy (or a man-in-the-middle in front of it) returns 1023+ bytes on the first response line without a terminator, the loop exits with — a slot the loop never wrote, so holds stale stack bytes left there by the earlier that formatted the outgoing request. The post-loop code then does: The lands one byte past the end of the on-stack , corrupting whatever lives in the adjacent stack slot. AddressSanitizer reports at in the frame. Upstream fix: The upstream fix for CVE-2026-45232 validates the attacker-supplied data. How does gokrazy/rsync fare? gokrazy/rsync does not implement such proxy support, so it is not vulnerable. Let’s summarize how Go fares: Aside from being written in Go, another key difference between gokrazy/rsync and the official upstream rsync is that the gokrazy implementation is minimal : Let’s have a look at whether gokrazy/rsync was affected by each CVE at the time of publishing: To be clear: all known vulnerabilities are fixed in gokrazy/rsync! The table above documents what the state was at the time when each CVE was published. In other words: When the January 2025 vulnerabilities were published, gokrazy/rsync panicked (CVE-2024-12084) and was vulnerable to a TOCTOU race (CVE-2024-12747). In the process of fixing the TOCTOU issue, we discovered CVE-2026-29518, which was fixed in gokrazy/rsync before the CVE was published. CVE-2026-43619 was discovered even later, but was also already fixed in gokrazy/rsync with the same fix: using Go’s everywhere. As I was reading the vulnerability reports, I noticed that the reports were slightly misleading by their choice of words: most reports just spoke of “server” and “client”. However, in an rsync transfer, both sides, the rsync client and the rsync server can assume either role: sender (upload files) or receiver (download files)! Some setups come with further restrictions that make certain attacks harder or impossible to pull off. For example, when running in daemon mode, file system access can be restricted to the pre-configured module paths (but not in command mode!). Here is a diagram to give you an overview of the 4 different setups and role/protocol layering: In the context of our vulnerability reports, I would say that the Arbitrary File Leak vulnerability (CVE-2024-12086)’s original title “Server leaks arbitrary client files” can easily be misunderstood. Instead, I would say: The rsync receiver will leak arbitrary files to a malicious sender . I have verified that a malicious client sender can make an unpatched remote rsync open files outside the destination tree (e.g. the system password database) when running in command mode, for example over SSH. (But, when running in daemon mode, the server enables additional path sanitization, which prevents this attack.) Similarly, the Symlink Path Traversal vulnerability (CVE-2024-12087) speaks about a “malicious server”, but again, it should be “malicious sender”, which can be either the client or the server. The OpenBSD project is known for its security focus, so how does openrsync compare? openrsync is not affected by the Heap Buffer Overflow (CVE-2024-12084) and Stack Info Leak (CVE-2024-12085) vulnerabilities because it validates the checksum length and only supports one checksum size/algorithm (MD4). openrsync is not affected by CVE-2024-12086, CVE-2024-12087 and CVE-2024-12088 because it does not implement the relevant features (like gokrazy/rsync). Even if it was vulnerable, openrsync’s defense-in-depth measures like using OpenBSD’s and to restrict file system access would have prevented successful exploitation — at least when running on OpenBSD. openrsync is not affected by CVE-2024-12747 because it used from the very moment they implemented symlink support . But, because is not a sufficient fix for this issue, openrsync is affected by CVE-2026-29518! The above covers the January 2025 batch of vulnerabilities; the May 2026 batch is similar in that most features just are not implemented. Overall, I say: Well done, Kristaps and contributors! By diligently implementing validation, restricting the attack surface and employing defense-in-depth measures, openrsync manages to not be affected by almost all of the reported vulnerabilities. Which APIs and environments can we use on Linux for defense-in-depth measures? I’ll go through the ones supports, ordered by traditional to modern. Within a few weeks after starting the project, I added support for dropping privileges and using mount/pid namespaces on Linux to restrict the file system objects that my rsync server could work with. This approach works very well to mitigate path traversal attacks, but requires privileges, meaning we need to run as or in a Linux user namespace (if enabled on your distribution / system). That limitation makes mount namespaces well-suited for server setups, but usually unavailable for interactive one-off transfers that are typically running under a human’s user account. In the same commit that introduced Linux mount/pid namespace support, I also included a systemd service file that restricted file system access to home directories and encouraged folks in the README to further restrict file system access, depending on what their use-case allows. These file system restrictions, if set up correctly, mitigate the File Leak (CVE-2024-12086) and Path Traversal (CVE-2024-12087) vulnerabilities. The Symlink Race Condition (CVE-2024-12747) relies on privilege escalation through the rsync process, but thanks to the DynamicUser feature, our process has fewer privileges than other users. Similarly to mount namespaces, these measures are great for server setups, but too cumbersome to set up for interactive one-off usages. I stumbled upon Justine’s blog post Porting OpenBSD pledge() to Linux (2022) and was reminded that Linux offers the Landlock API for unprivileged, per-process access control, similar to OpenBSD’s system call, which openrsync uses. The basic idea is that once your program knows the directory it works with, it makes a call like and no longer has access to other file system locations. I had previously heard of Landlock at a Go Meetup, so I knew there was Go support for Landlock. Back in 2022, I enabled Landlock support in the gokrazy kernel images. So I gave it a shot in March 2025 and implemented Landlock support to restrict file system access . It took me a few hours, which seems a little longer than one might expect at first. Making Landlock work (and/or skipping it) in our test environment ran into a couple of road blocks: Our tests had defined many functions that get run in the same process, but when repeatedly adding rulesets, we would exceed the limit of 16 (!) policy layers per process. Once I had it set up just right, it is a beautiful solution. Now we can restrict rsync transfers to their sources (read-only) or destination directories (read-write), even for unprivileged invocations of ! 🎉 The downside to Landlock is that Landlock operates at the process level. This means that Landlock policies must include the files that your program needs, e.g. needs to be able to read for user id lookup, so if the attacker is after the file, Landlock does not help. In February 2025, the Go 1.24 release introduced the API, which is resistant against path traversal, see The Go Blog: Traversal-resistant file APIs (by Damien Neil, March 2025). This API allows more fine-grained control (per file system operation) compared to Landlock. Go 1.25 (released in August 2025) added more methods to , making it a convenient choice for most file system usage. I have converted all of ’s file system usage to use , which is a great fit: users configure input/output directories, but the filenames received over the network are untrusted. That’s exactly what was designed for! When I first looked into using , I thought that some system calls could inherently not be made with this API, like for example to create device node files. Damien explained: It won’t support mknod, though. However, you should be able to use it to enable a safe mknod: If you’re curious how that looks in practice, check out ’s usage in , line 15-29 . Another stumbling block was when I realized that unlike with , Linux only implements , but no (as of Linux 7.0)! Luckily, Lennart Poettering pointed out that there’s a trick to skip path resolution without : you can probably bind to in the meantime… And indeed, this works! Path resolution is skipped because we only specify a basename (last component of a path) after the known-safe , not a path (see line 49-56 ). With these two tips, v0.3.1 and newer are fully using , meaning all file system access is traversal-safe! 🥳 Lacking validation causes vulnerabilities It is interesting to note that aside from the TOCTOU vulnerabilities (CVE-2024-12747, CVE-2026-29518 and CVE-2026-43619), all other vulnerabilities were caused by missing or incorrect input validation. In three cases, there was just no validation to begin with. In another case (CVE-2024-12088), the subject matter of file system path resolution is tricky enough that the existing validation did not cover all edge cases. As the Go verdict section explains in more detail, the most valuable structural fixes are to provide bounds checking (= always-on validation) and safe-by-default APIs like Go’s . Too much complexity A few of the vulnerabilities came from evolution of the rsync protocol: The code used to correctly perform sufficient validation, but then new features were added. For example, when checksum algorithm negotiation was added (protocol version 30), the validation was not correctly updated. When incremental recursion was added (also protocol version 30), the validation that made sense for individual file lists was not updated for the new processing approach of merging incremental file lists. Avoiding complexity avoids vulnerabilities! Both gokrazy/rsync and also openrsync were not vulnerable to 8 out of the 12 security vulnerabilities simply because they do not implement the feature with the vulnerability. Of course, these features were added to rsync because they were valuable to someone at some point, and of course I am not saying that we should just… not develop software any further, ever. But, I consider it ideal to use an implementation whose complexity is appropriate for and proportional to the complexity of the use-case . In other words: for simple use-cases, reach for a simple implementation. Only reach for the fully-featured implementation where needed. The verdict on whether using Go has helped . The verdict on whether a minimal re-implementation like gokrazy/rsync helps . My comparison with OpenBSD’s (written in C). Defense in depth mechanisms one can use on Linux. The conclusion . CVE-2024-12084 to 12088 (original report) CVE-2024-12747 (discovered separately by Aleksei Gorban “loqpa”) CVE-2026-29518 (discovered by Damien Neil and myself! and independently by Nullx3D ) CVE-2026-43617 to 43620 CVE-2026-45232 rsync performed insufficient validation: It read the (attacker-controlled) checksum length from the network and compared the length against . However, rsync’s data structures always declared a 16 byte buffer: is always 16 (bytes), which is sufficient to hold an MD4 or MD5 checksum. used to be 16 (bytes), but can be larger when rsync is compiled with SHA256 or SHA512 checksum support. Hence, the bounds check was ineffective! An attacker could write out of bounds. This issue was introduced with commit in September 2022 , which added SHA256/SHA512 checksum support. A 32-bit Adler-CRC32 Checksum A digest of the file chunk. The digest algorithm is determined at the beginning of the protocol negotiation. The corresponding code can be seen below: sender.c : The “Some checksum buffer fixes” commit prevents this attack because the attacker-controlled can no longer be larger than the transfer’s checksum length. The “prevent information leak off the stack” commit initializes the memory to zero, thereby making any stack leak through impossible. Check out gokrazy/rsync v0.2.7: Patch the code as follows to undo the fix and execute the attack: The Go runtime’s bounds checks turn more serious security issues into a panic. A panic is still a denial-of-service risk, but that’s much preferable. Go initializes memory to zero, making info leaks like CVE-2024-12085 impossible. Go’s API prevents most of the remaining vulnerabilities. Only one out of twelve vulnerabilities (CVE-2026-43617) is a proper bug in the application logic that using Go could not have prevented. gokrazy/rsync is unaffected by many vulnerabilities because it does not implement the feature in question, for example . Like all other wire protocol-compatible rsync implementations, gokrazy/rsync targets protocol version 27, because later protocol versions introduce significant complexity. In some cases, features that would be good to implement come with significant blockers, e.g. compression is tricky, see gokrazy/rsync issue #35 for details. os.Root.OpenFile the parent directory of the target, File.Fd to get the file descriptor for that directory, https://pkg.go.dev/golang.org/x/sys/unix#Mknodat to create the file.

0 views
Xe Iaso 1 weeks ago

"No way to prevent this" say users of only language where this regularly happens

In the hours following the release of CVE-2026-45584 for the project Microsoft Windows , site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix a memory safety vulnerability resulting in arbitrary code execution inside the virus scanner Windows Defender. This is due to the affected components being written in C++, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Dr. Annabelle Connelly, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."

0 views
Blargh 1 weeks ago

Everything in C is undefined behavior

If he had been a programmer, Cardinal Richelieu would have said “Give me six lines written by the hand of the most expert C programmer in the world, and I will find enough in them to trigger undefined behavior”. Nobody can write correct C, or C++. And I say that as someone who’s written C and C++ on an almost daily basis for about 30 years. I listen to C++ podcasts. I watch C++ conference talks. I enjoy reading and writing C++. C++ has served us well, but it’s 2026, and the environment of 1985 (C++) or 1972 (C) is not the environment of today. I’m definitely not the first to say this. I remember reading a post by someone prominent about a decade ago saying that a good case can be made that use of C++ is a SOX violation. And while I was not onboard with the rest of their rant (nor their confusion about “its” vs “it’s”), I never disagreed about that point. With time I found it to be more and more true. WAY more things are undefined behavior (UB) than you’d expect. Everyone knows that double-free, use after free, accessing outside the bounds of an object (e.g. array), and accessing uninitialized memory is UB. After all, C/C++ is not a memory safe language. And yet we as an industry seem to be unable to stop making even those mistakes over and over. But there’s more. More subtle. More illogical. Some people seem to think that as long as they don’t compile with optimizations turned on, undefined behavior can’t hurt them. They believe that the compiler is somehow being deliberately hostile, going “AHA! UB! I can do whatever I want here!”, and without optimizations turned on it won’t. This is incorrect. UB doesn’t mean that the compiler can take advantage of your sloppiness. UB means that the compiler can assume that your code is valid. It means that the intention of your code that’s oh so obvious when read by a human, doesn’t even have a way to be expressed between compiler stages or modules. UB means that the compiler doesn’t even have to implement some special cases in its code generation, because they “can’t happen”. The compiler, and really the underlying hardware too, is playing a game of telephone with your UB intentions. It may end up with what you wanted, but there’s no guarantee for now or in the future. The following is not an attempt at enumerating all the UB in the world. It’s merely making the case that UB is everywhere, and if nobody can do it right, how is it even fair to blame the programmer? My point is that ALL nontrivial C/C++ code has UB. As an example of this, take this code: If this function is called with a pointer not correctly aligned (probably meaning on an address that’s a multiple of , but who knows), this is UB. C23 6.3.2.3. On Linux Alpha, in some cases this would merely trap to the kernel, which would software emulate what you intended. In other cases it would (probably) crash your program with a SIGBUS. On SPARC it would cause a SIGBUS. Sure, on x86/amd64 (henceforth just “x86”) this is likely fine. Hell, it’s probably even an atomic read. x86 is famously extremely forgiving about cache coherency subtleties. So here we have three cases: What about ARM, RISC-V, and others? What about future architectures? A future architecture could even have special that do not populate the lowest bits, because such pointers cannot exist. Even if it works, maybe the compiler one day changes from using one load instruction to another, and suddenly that’s no longer fixed up by the kernel. Because the compiler is not obligated to generate assembly instructions that work on unaligned pointers . Because it’s UB. Or how about this: Is this operation atomic when the object is not correctly aligned? That’s the wrong question to ask. Mu , unask the question. It’s UB. (but also yes, in practice this can easily be an atomicity problem) If you want to get even more convinced, you can try thinking about what happens if an object you thought you were reading atomically spans pages . But don’t think too much about it, or you may conclude that “it’s fine”. It’s not. It’s UB. Don’t blame the function, above. The act of dereferencing the pointer wasn’t the problem. Merely creating the pointer was enough to be a problem. That cast is the problem, not . It’s perfectly valid for the compiler to assign specific meaning, such as garbage collection or security tagging bits, to the lower bits of an . is a simple function that takes a character and returns if it’s a hex digit. 0-9 or a-f. It can also take the value . Uh, ok. What value is ? Per C23 7.4p1 we know it’s an , and we can infer that it’s not representable by . therefore takes an , not a . All values of fit inside , so we should be fine. Casting from to fits, so per section 6.3.1.3 we’re fine, right? No. Because if is called with a value other than 0-127, and on your architecture is (implementation defined, per 6.2.5, paragraph 20 in C23 ), then the integer value ends up negative. And the following is a valid implementation of , that would cause a read of who-knows-what memory. It could even be I/O mapped memory, triggering things to happen that is more than merely getting a random value or crash. It could cause the motor to start. Less likely in an application running in a desktop operating system than in an embedded system, sure. But there are user space network drivers (for performance), so even user space won’t protect you. And, by omission, it’s also UB if the float is a non-finite value. So how do you compare a float to ? Do you cast the float to ? No, that’s the UB you want to avoid. So you cast to float? How do you know it can be represented exactly? Maybe casting to rounds to a value not representable in , and your comparison becomes non-representative? Maybe the following works? You’ll miss out on representing some really high values, but maybe that’s OK? I just wanted to convert a float to an int. :-( I bet there’s lots of code out there that take a value in seconds, and convert it to integer milliseconds, by just multiplying and casting. Most programmers won’t have to deal with this, but I don’t think there’s any C standards compliant way in practice to put an object at address zero. This can come up in OS kernel and embedded coding. By 6.3.2.3 an integer constant zero (which is convertible to a pointer) and are the “null pointer constant” (which I’ll just call ). C doesn’t specify that the actual pointer points addr machine address zero , because the C standard only talks of the C abstract machine, not about hardware. All C guarantees is that if you compare to zero you’ll see them equal. But for all you know that’s because the zero is converted to the native platform’s , which happens to be . It also explicitly says that dereferencing a null pointer, no matter what the value, is undefined behavior. It’s the example of UB under 3.4.3. This also means that you can’t assume that will create a pointer! You cannot initialize your structs this way and assume member pointers are ! And this does apply to most programmers. And yes, some historic machines used non-zero NULL pointers . But let’s say you have a modern machine, where is a pointer to address zero, and you actually have an object there. Again, C 6.3.2.3 says that compares unequal to “any object or function”. So this is UB: C says “there is no function there”. For all you know the compiler has no internal way to even express your intention here. You may argue that “but surely it’ll just emit a call instruction to the bit pattern of all zeroes? Nothing else seems reasonable. What is “all zeroes”, though? On 16bit x86, is it ? Is it ? This is UB: This is not: Because the argument needs to be a pointer, and the macro may be misinterpreted as an integer zero. Similarly, this is UB: It needs to be: So how do you print an ? Well, you could cast them to and print them using . But is even unsigned? Oh well, worst case you get a nonsense value printed instead of , I guess. Sure, you probably knew this. But did you consider the security aspects of it? It’s not rare for the denominator to come from untrusted input. And there’s so much more. The C23 standard contains 283 uses of the word “undefined”. And that’s not even including the things that are undefined by omission. Nobody can find integer promotion rules and code skimming speeds. Nobody . This post is already long enough, but as a start: Point an LLM at ANY C code, asking it to find UB, and it will. And it’ll be right almost all the time, nowadays. I felt a bit bad after it correctly found ones in my code, so I thought I’d point it at the mature and pedantically written OpenBSD. I just picked the first tool I could think of, , and it spit out a bunch. I sent the project a patch for an out of bounds write (and also for a non-UB logic bug ). I didn’t send them patches for the UB that was left and right, partly because the OpenBSD project has not been very receptive in the past for bug reports, my sense of “this is probably fine, in practice”, and that if OpenBSD wants to weed out UB from their code base, then that’s a major project that should be done in a better way than me just being the middle man between the LLM and them for a patch here and there. We can’t just throw away our C/C++ code bases. But leaving them inherently broken is also not an option. We need some way of fixing UB at scale, without committing AI slop nor overwhelming human reviewers. This too is not a new opinion, nor a great revelation. But yes, writing C/C++ in 2026 without an LLM supervising you for UB should probably be seen as a SOX violation, and just plain irresponsible. If OpenBSD people can’t find these problems gives 30+ years, what chance do the rest of us have? It may not scale to large code bases, but for my own projects I’ve asked the LLM to find UB, if necessary explain it, and fix it. And then stare at the output until I can confirm the issue and the fix. A problem with this is that in order to confirm the findings, you’ll need an expert human. But generally expert humans are busy doing other things. This is janitor work, but too subtle to leave to the junior programmers who have traditionally been assigned janitor work. kernel gave a helping hand (Alpha for some loads) crash (other Alpha loads, and SPARC) not a problem (x86) No way to parse integers in C Integer handling is broken UB in the Linux kernel Integer promotion

0 views
Max Bernstein 2 weeks ago

Partial static single information form

In compilers, static single information form (SSI) is a common extension to static single assignment form (SSA). It was introduced by C. Scott Ananian in 1999 in his MS thesis (PDF) 1 . SSI extends your existing SSA intermediate representation by discovering facts from your existing program and reifying them as path-dependent/flow-sensitive IR nodes. That might sound complicated, but at least the basic idea is pretty natural. I talk a little bit about it in What I talk about when I talk about IRs and I’ll rehash here in more depth, starting with some motivating examples. Consider this admittedly contrived example: We should be able to learn from the comparison that in some branches in the IR, is positive. In that region, we can add a new IR instruction that attaches that knowledge right in the instruction’s type field (yay, sparseness!) and then rewrite uses of to now use . Because we’ve done that, our (imaginary) optimization rule that gets rid of on known-positive integers can kick in, and we can delete the invocation of . Yay, optimization! But a couple of questions remain, at least for me: We’ll go through them, starting with the compiler pipeline. The original SSI paper starts with (I think?) SSA form and places some number of new refinement nodes based on conditionals. I have admittedly not tried very hard, but the into-SSI algorithms look complicated and kind of heavyweight. As a reward, you get “linear” into-SSI time complexity. But I am a humble compiler engineer, and I don’t have the time to go through and load all of this into my head. Instead what I have seen done and have been doing is to take a shortcut: build partial SSI during SSA construction 2 . Most of the time this is from bytecode, but it could also be from some other non-SSA IR. In any case, this is an excellent shortcut for two reasons: This is pretty compelling. We can learn from the bytecode with a very small amount of marginal new complexity. See my implementation in ZJIT , for example. All it really does is modify the abstract interpreter state when building SSA out of , , and bytecode instructions to take into account the new refined values. This is fine for branches that are already in the user’s source program but sometimes optimization, especially of dynamic languages, adds new branches that were not there before. And sometimes these branches get added much later, long after SSA construction. What then? Can we do something similar and rely on existing infrastructure? Implicit in this “can we do it” is the assumption that your IR tracks data dependencies from use to corresponding def, but not from def to uses. Sea of Nodes (at least the Simple implementation), is an IR that tracks both directions all the time for easier rewriting. Many IRs do not do this, so we will continue assuming that there’s no “easy way out”. JIT optimization of dynamic language compilers often adds synthetic instructions to the IR that enforce pre-conditions. These guards allow optimizing happy/fast path cases in JIT code while leaving the interpreter as a fallback. For example, we might be able to optimize two back-to-back instructions (a very dynamic operation in the world of ideas, but fast when concretely implemented using object shapes) from: which is very generic and involves calling into C code that might raise an exception, to something more like: which is much faster (assuming shape stability at run-time). There’s an irritating problem, though, which is that we have a bunch of duplicate instructions littered around the IR now because our optimizer worked on each instruction individually. Kind of a “template optimizer” situation. Now we need some pass to clean up the detritus. Global value numbering (GVN) will do a good job of de-duplicating instructions. It should notice that we already have an instruction that looks like called and rewrite into . That’s great because we have de-duplicated the guard. GVN may not get everything, though; if some instructions later use , they will not get rewritten to instead use the output of these new guard instructions. To do that, we need to add some kind of pass or augment GVN with some canonicalization feature. That canonicalization would handle rewriting operands to use the “latest version” of some value, so to speak. See the canonicalization section of Chris Fallin’s excellent aegraphs blog post for more (and of course the (currently block-local) implementation in ZJIT ). Where I’m going with all of this, though, is that you may already have some dominance-based instruction rewriting mechanism in your compiler, either as part of GVN or separately! And you can use this to do a very low code into-partial-SSI in the middle of your optimizer. This means you could very well get away with inserting instructions in successor blocks of conditionals and get the into-SSI “for free”. That’s up to you. There’s a trade-off between compile-time and run-time, especially in JITs. Inserting more instructions and rewriting more times may slow down your compiler. It’s a cheap lunch, not a free one. I don’t know. I don’t have a good grasp of how this “partial SSI” compares to the “full SSI”. I don’t plan on implementing full SSI in the near future. I will note that this partial SSI approach doesn’t do two things: I can’t tell what impact this has. Like Simple, TruffleRuby is built on a Sea of Nodes IR (Graal). Chris Seaton has an excellent blog post about TruffleRuby’s use of “stamp nodes” (“Pi nodes” 3 ). The function does a lot of heavy lifting, I think because Graal tracks uses. Cinder mostly inserts instructions in the HIR builder, before into-SSA, and then lets the SSA construction take care of things. That’s where I learned this trick, actually. Here is one example of refining the type of the matched operand when building IR for pattern matching. Luau is working on something like this, but for their type checker. Chatting with someone on their team is actually part of the reason I got motivated to write this post. Android ART looks like it has HBoundType and inserts them in reference type propagation . This handles class checks, null checks, and instanceof checks. Last, I want to talk a little bit about some interesting reasoning you can do when you have two implementations of something that you can switch between. For example, JIT (+ interpreter), or aliasing and non-aliasing cases in C code, or the weirdo NULL-UB reasoning LLVM can do to C code, things like that. In ZJIT, we currently insert s opportunistically in “easy” cases when building our HIR from the interpreter bytecode. For example, if in the bytecode there is a branch that compares some value with , it will have two outgoing control-flow edges: one block where is definitely , and one block where is definitely not . In each of these control-flow edges, we can insert corresponding type refinement hints. That’s pretty standard. But we can also do weirder stuff. CRuby has a notion of heap objects vs immediate objects. Many (most?) objects are heap objects. However, integer , for example is not allocated on the heap but instead represented by a tagged bit pattern that pretends to be an address: the whole value is encoded in the pointer itself. We encode this knowledge in the HIR’s type system: “heapness” and “immediateness” each get a bit in the type lattice . We use this in the optimizer to reason about effects , among other things. We can’t know a lot of the time what type a thing is, so we pessimistically type most objects flowing through bytecode as . This type encapsulates the entire world of possible values that could go on the stack or in a local variable. On most heap objects, with only a few exceptions, you can write instance variables (fields, attributes, whatever you want to call them). You can never write an instance variable to an immediate. This means that if we observe the following pattern in the bytecode: Then after building and emitting HIR for the opcode, we can upgrade the type of from a to a . We can do this because if it weren’t a heap-allocated object, we would have left the compiled code and entered the interpreter. This is another SSI-type thing you can do in your compiler. Uhh I guess the conclusion is that you don’t have to do full SSI and partial SSI is available and not too scary? Does your compiler do this? Reader, please write in. …and optimized in 2002 (PDF), revisited in 2009 (PDF), implemented in LLVM in 2010 (PDF), investigated in 2017 for abstract compilation (PDF), and probably more. The 2009 paper by Boissinot, Brisk, Darte, and Rastello even shows that both Ananian and Singer’s papers have bugs, while perhaps unintentionally also making an excellent pun about the literature being “sparse”.  ↩ This blog post is different than the what the LLVM paper (PDF) calls partial SSI. Partial for different reasons. Maybe it’s not even single information anymore.  ↩ Today I learned that this terminology comes from the ABCD paper (PDF).  ↩ Where/when in the compiler pipeline do we insert and remove these type refinements? Do we need to refine after every conditional? Do we need to implement the whole into-SSI and out-of-SSI algorithms from all the complicated-looking papers? It lets me cleanly separate adding the type refinements (pretty straightforward) from the hard part of doing all of the operand rewriting and phi placement and marking and all manner of other nonsense. In addition to separating the concerns, the hard part is already done by SSA construction. We can actually just skip it! SSA construction handles phi placement, operand rewriting, all of it. It probably fits neatly into a naive or a Braun-style (PDF) construction. It doesn’t split variables with a new sigma node, and it generally inserts the refine node within the target block rather than above the branch (For only) It doesn’t insert new phi nodes; it just leaves both IR nodes available and, instead of re-merging, drops them …and optimized in 2002 (PDF), revisited in 2009 (PDF), implemented in LLVM in 2010 (PDF), investigated in 2017 for abstract compilation (PDF), and probably more. The 2009 paper by Boissinot, Brisk, Darte, and Rastello even shows that both Ananian and Singer’s papers have bugs, while perhaps unintentionally also making an excellent pun about the literature being “sparse”.  ↩ This blog post is different than the what the LLVM paper (PDF) calls partial SSI. Partial for different reasons. Maybe it’s not even single information anymore.  ↩ Today I learned that this terminology comes from the ABCD paper (PDF).  ↩

0 views
daniel.haxx.se 2 weeks ago

Mythos finds a curl vulnerability

yes, as in singular one . Back in April 2026 Anthropic caused a lot of media noise when they concluded that their new AI model Mythos is dangerously good at finding security flaws in source code. Apparently Mythos was so good at this that Anthropic would not release this model to the public yet but instead trickle it out to a selected few companies for a while to allow a few good ones(?) to get a head start and fix the most pressing problems first, before the general populace would get their hands on it. The whole world seemed to lose its marbles. Is this the end of the world as we know it? An amazingly successful marketing stunt for sure. Part of the deal with project Glasswing was that Anthropic also offered access to their latest AI model to “Open Source projects” via Linux Foundation . Linux Foundation let their project Alpha Omega handle this part, and I was contacted by their representatives. As lead developer of curl I was offered access to the magic model and I graciously accepted the offer. Sure, I’d like to see what it can find in curl. I signed the contract for getting access, but then nothing happened. Weeks went past and I was told there was a hiccup somewhere and access was delayed. Eventually, I was instead offered that someone else, who has access to the model, could run a scan and analysis on curl for me using Mythos and send me a report. To me, the distinction isn’t that important. It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway. Getting the tool to generate a first proper scan and analysis would be great, whoever did it. I happily accepted this offer. (I am purposely leaving out the identity of the individual(s) involved in getting the curl analysis done as it is not the point of this blog post.) Before this first Mythos report, we had already scanned curl with several different very capable AI powered tools (I mean in addition to running a number of “normal” static code analyzers all the time, using the pickiest compiler options and doing fuzzing on it for years etc). Primarily AISLE , Zeropath and OpenAI’s Codex Security have been used to scrutinize the code with AI. These tools and the analyses they have done have triggered somewhere between two and three hundred bugfixes merged in curl through-out the recent 8-10 months or so. A bunch of the findings these AI tools reported were confirmed vulnerabilities and have been published as CVEs. Probably a dozen or more. Nowadays we also use tools like GitHub’s Copilot and Augment code to review pull requests, and their remarks and complaints help us to land better code and avoid merging new bugs. I mean, we still merge bugs of course but the PR review bots regularly highlight issues that we fix: our merges would be worse without them. The AI reviews are used in addition to the human reviews. They help us, they don’t replace us. We also see a high volume of high quality security reports flooding in : security researchers now use AI extensively and effectively. Security is a top priority for us in the curl project. We follow every guideline and we do software engineering properly, to reduce the number of flaws in code. Scanning for flaws is just one of many steps to keep this ship safe. You need to search long and hard to find another software project that makes as much or goes further than curl, for software security. Steps involved in keeping curl secure May 6, 2026 It was with great anticipation we received the first source code analysis report generated with Mythos. Another chance for us to find areas to improve and bugs to fix. To make an even better curl. This initial scan was made on curl’s git repository and its master branch of a certain recent commit . It counted 178K lines of code analyzed in the src/ and lib/ subdirectories. The analysis details several different approaches and methods it has performed the search, and how it has focused on trying to find which flaws. A fun note in the top of the report says: curl is one of the most fuzzed and audited C codebases in existence (OSS-Fuzz, Coverity, CodeQL, multiple paid audits). Finding anything in the hot paths (HTTP/1, TLS, URL parsing core) is unlikely. … and it correctly found no problems in those areas. Completely unscientific poll on Mastodon about people’s expectations for Mythos scanning curl The size of curl curl is currently 176,000 lines of C code when we exclude blank lines. The source code consists of 660,000 words, which is 12% more words than the entire English edition of the novel War and Peace. On average, every single production source code line of curl has been written (and then rewritten) 4.14 times. We have polished on this. Right now, the existing production code in git master that still remains, has been authored by 573 separate individuals. Over time, a total of 1,465 individuals have so far had their proposed changes merged into curl’s git repository. We have published 188 CVEs for curl up until now. curl is installed in over twenty billion instances . It runs on over 110 operating systems and 28 CPU architectures . It runs in every smart phone, tablet, car, TV, game console and server on earth. The report concluded it found five “Confirmed security vulnerabilities”. I think using the term confirmed is a little amusing when the AI says it confidently by itself. Yes, the AI thinks they are confirmed, but the curl security team has a slightly different take. Five issues felt like nothing as we had expected an extensive list. Once my curl security team fellows and I had poked on the this short list for a number of hours and dug into the details, we had trimmed the list down and were left with one confirmed vulnerability. The other four were three false positives (they highlighted shortcomings that are documented in API documentation) and the fourth we deemed “just a bug”. The single confirmed vulnerability is going to end up a severity low CVE planned to get published in sync with our pending next curl release 8.21.0 in late June. The flaw is not going to make anyone grasp for breath. All details of that vulnerability will of course not get public before then, so you need to hold out for details on that. The Mythos report on curl also contained a number of spotted bugs that it concluded were not vulnerabilities, much like any new code analyzer does when you run it on hundreds of thousands of lines of code. All the bugs in the report are being investigated and one by one we are fixing those that we agree with. All in all about twenty bugs that are described and explained very nicely. Barely any false positives, so I presume they have had a rather high threshold for certainty. curl is certainly getting better thanks to this report, but counted by the volume of issues found, all the previous AI tools we have used have resulted in larger bugfix amounts. This is only natural of course since the first tools we ran had many more and easier bugs to find. As we have fixed issues along the way, finding new ones are slowly becoming harder. Additionally, a bug can be small or big so it’s not always fair to just compare numbers My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing. This is just one source code repository and maybe it is much better on other things. I can only tell and comment on what it found here. But allow me to highlight and reiterate what I have said before: AI powered code analyzers are significantly better at finding security flaws and mistakes in source code than any traditional code analyzers did in the past. All modern AI models are good at this now. Anyone with time and some experimental spirits can find security problems now. The high quality chaos is real. Any project that has not scanned their source code with AI powered tooling will likely find huge number of flaws, bugs and possible vulnerabilities with this new generation of tools. Mythos will, and so will many of the others. Not using AI code analyzers in your project means that you leave adversaries and attackers time and opportunity to find and exploit the flaws you don’t find. Zero memory-safety vulnerabilities found. Methodology note: this review is hand-driven analysis using LLM subagents for parallel file reads, with every candidate finding re-verified by direct source inspection in the main session before being recorded. The CVE to variant-hunt mapping was built from curl’s own vuln.json. No automated SAST tooling was used. This outcome is consistent with curl’s status as one of the most heavily fuzzed and audited C codebases. The defensive infrastructure (capped dynbufs everywhere, with explicit max on every numeric parse, overflow guard, CURL_PRINTF format-string enforcement, per-protocol response-size caps, pingpong 64KB line cap) systematically closes the bug classes that would normally be productive in a codebase this size. Coverage now includes: all minor protocols, all file parsers, all TLS backends’ verify paths, http/1/2/3, ftp full depth, mprintf, x509asn1, doh, all auth mechanisms, content encoding, connection reuse, session cache, CLI tool, platform-specific code, and CI/build supply chain. It should be noted that the AI tools find the usual and established kind of errors we already know about. It just finds new instances of them. We have not seen any AI so far report a vulnerability that would somehow be of a novel kind or something totally new. They do not reinvent the field in that way, but they do dig up more issues than any other tools did before. These were absolutely not the last bugs to find or report. Just while I was writing the drafts for this blog post we have received more reports from security researchers about suspected problems. The AI tools will improve further and the researchers can find new and different ways to prompt the existing AIs to make them find more. We have not reached the end of this yet. I hope we can keep getting more curl scans done with Mythos and other AIs, over and over until they truly stop finding new problems. Thanks to Anthropic and Alpha Omega for providing the model, the tools and doing the scan for us. Thanks also to the individual who did the scan for us. Much appreciated! Top image by Jin Kim from Pixabay Thanks for flying curl. It’s never dull. They can spot when the comment says something about the code and then conclude that the code does not work as the comment says. It can check code for platforms and configurations we otherwise cannot run analyzers for It “knows” details about 3rd party libraries and their APIs so it can detect abuse or bad assumptions. It “knows” details about protocols curl implements and can question details in the code that seem to violate or contradict protocol specifications They are typically good at summarizing and explaining the flaw, something which can be rather tedious and difficult with old style analyzers. They can often generate and offer a patch for its found issue (even if the patch usually is not a 100% fix).

0 views
Blargh 2 weeks ago

Quantum safe amateur radio secure shell

I’ve previously pointed out that the AX.25 implementation in the kernel is pretty poor . It’s not really being maintained, and even when it gets fixes after I reported it , with people running LTS OSs it can take like 5 years before before the fix actually reaches users, if ever. So when writing applications, you still have to work around kernel bugs from a decade ago. This makes it kind of pointless to upstream patches. The exception is security patches, and reading between the lines of why the AX.25 code is now being removed from the kernel , it sounds like maybe some LLM (like the looming “Mythos” and the related Glasswing ) may have found some severe problems. But even if there aren’t any known security problems yet, having code is now more of a liability than ever. Code needs to be removed, or taken responsibility of. (tangent about ffmpeg at the bottom of this post) With the kernel code removed, say goodbye to the old walkthrough . Well, not “new”, per se, but “replacement”. With the socket based API about to be gone, we need some other way for applications to send packets and manage connections. For sending raw packets to and from the modem there’s KISS . I have no real complaints about it. Not much to get wrong about sending frames. It’s implemented by most modems, like the software modem Direwolf and by some radios like the Kenwood TH-D75 , so it’s not going anywhere. For connected mode (streams of in order data, like with TCP) the biggest contender seems to be AGW . Direwolf implements it, and I’ve made a messy implementation of an AGW client in Rust . The Rust API works, as we’ll see, but the code needs some refactoring and cleanup due to it being written exploratorily while I was deciding what it should even do, and how. The AGW protocol is not super amazing, but it gets the job done. One can build a connection API on top of it, as I have , and never have to think about the AGW protocol ever again. There’s another protocol called RHP, specified here and here . It came out of the XRouter project. Since XRouter is closed source, I have a strong aversion to it. It seems both counter to how I see amateur radio, and anachronistic, for it to be closed source. It’s bad enough that VARA and [Winlink][winlkn] are closed source. And people are definitely working on replacing VARA with various other modes because of it. tl;dr: I’m going with AGW for now. If someone writes a Rust crate for RHP exposing a compatible API, I certainly wouldn’t mind adding that dependency to optionally use. I have not yet implemented AGW (or RHP) in my own AX.25 stack , but I plan to. For now that means I’ll use Direwolf. My previous axsh implementation, since deleted , had some problems: So with everything but terminal management needing a rewrite, this is a reason to rewrite the whole thing. Non-requirement: Encrypt — This would violate the amateur radio license. And then, why not just use SSH? If you have an AGW server, such as Direwolf, then it’s easy to run axsh. Just start a server: Then log in: Then wait like 30-40 seconds for the handshake to complete. The reason for the wait is the large ML-DSA signatures used in the handshake. It can’t be the same direwolf instance, since Direwolf only shuffles packets between the radio and AGW clients, not from one AGW client to another. In my case I had one Direwolf connected to an ICom 9700, and another to a Baofeng UV5R using an AIOC (all in one cable) . AIOC is highly recommended for experimentation over the air. So yeah my test is between just about the cheapest VHF/UHF radio that exists, and maybe the most expensive one. In addition to running : With KISS providing packet support (and AGW providing a higher level API on top, if preferred), why not just run TCP/IP, and let the very stable OS TCP implementation take care of everything? TCP is definitely more modern, stable, and maintained, but it doesn’t scale down to slow speeds very well. A TCP+IPv4 header is at least 40 bytes, and if you don’t want to be some sort of caveman, IPv6 is another 20 bytes. At 1200bps that would be 267-400ms overhead for every packet 1 . Checking a random TCP data packet on my laptop I see that with TCP options TCP/IPv4 is actually 52 bytes, or 350ms. Counting the air time (milliseconds, not just bytes) makes this overhead problem more obvious. And because of amateur radio license reasons TCP would still need to identify the callsign, you probably have to add 17 bytes (113ms) as a surrounding header. That leaves TCP with 69 or 89 bytes overhead per packet, meaning 460ms or 593ms. And since you don’t want to tie up the RF channel for too long (only for the whole packet to be dropped due to interference), you won’t want to send packets that are too large. Of course it’s 4x as slow if you want to do something like Bell 103 on HF. AX.25 connected mode takes that down to 19 bytes (126ms) overhead (if using Mod 128 mode) per data packet. Because of the AX.25 segmenter, for bulk data TCP is not as bad as it may have sounded. For a 1500 byte TCP segment, fitting in just under 8 200 byte AX.25 frames (totalling bytes of overhead), this means 1367ms overhead instead of plain AX.25 (at bytes) 1013ms. A 1500 byte payload takes 10 seconds to send, so that’s an overhead of 13.7% instead of 10.1%. But for interactive use cases, worst case a single payload packet, it’s 467ms vs 133ms. And that’s only counting the data frames, not the acknowledgments. A TCP ACK is at a minimum bytes, or 380ms. An AX.25 RR is 18-19 bytes, or 120-127ms. That makes TCP about three times less efficient, compared to AX.25. A bigger problem with TCP, especially untweaked, is resend timers and window sizes. At 1200bps you don’t actually want too big a window size, since you don’t want to tie up the RF channel for several minutes if the other end has gone away. So a bunch of airtime tweaks are needed. And at best you’ll end up with the numbers above. Maybe you could tweak TCP to be more friendly to lower speeds, and find the other overhead acceptable. If so, then you’ll be happy to hear that axsh supports running on TCP as well. Well first, it inherits the same problems from TCP/IP. Sure, the UDP header is smaller than the TCP header, but then on top of that there’s the QUIC header. The second problem is that QUIC is meant to be encrypted. Ripping out encryption, while staying secure, seems more dangerous that keeping it simple and just working from the requirements. Probably the whole handshake would have to be redesigned. AX.25 being removed from the Linux kernel reminds me of LLM finding that bug in ffmpeg , causing all that drama. I have no dog in this fight, but in my opinion ffmpeg is in the wrong, here. Their argument seems to be all about how this particular encoder is rarely used, is just a hobby project, etc.. Ok, but it’s in your code base. Even if disabled by default, why would you want to ship a security footgun? Maybe some hobbyists out there build ffmpeg with all encoders enabled. Do you want them to be vulnerable to someone’s virus? So Google should either keep quiet, or give a patch? Well, keeping quiet because the codec is rarely used is not really an option. That’s borderline negligent and morally culpable, for when someone eventually gets hacked. So Google “should” always provide a patch in these cases? Perhaps, depending on the meaning of the word “should”. Google is rich, so “should” be morally forced to contribute to your software, just because Google (presumably, via youtube) is a heavy user of ffmpeg? Well, that just sounds like the the (non-)problem with open source software (or free software) in general. The license permits use and profit without contribution. If you wanted a tithe then you should have put that in the license. Sounds like you want everyone to be free only to do what you want. That’s not how that works. This is also why I don’t like the AGPL license . It’s not free software if it binds me in your serfdom. Actually, it’s a tiny bit more, because of the occasional bit stuffing ↩ it was implemented in C++, and not only do I prefer Rust, how could I even call something written in C++ “secure”? (a blog post for another day) used the kernel API, so that needs rewriting, used , which proved to be a bit “weird” when interoperating with some other APIs, and used crypto primitives vulnerable to quantum computers. Don’t use kernel AX.25 sockets — this means use AGW. Also work on TCP (mainly for debugging) — This means using an internal framing protocol. Be quantum safe — Use ML-DSA+ed25519 dual signed for authentication of server and client. Be efficient — This means don’t use ML-DSA for per packet signatures (they are huge), at the cost of some quantum safety (see [the README][axsh]). Actually, it’s a tiny bit more, because of the occasional bit stuffing ↩

0 views
Susam Pal 3 weeks ago

I Will Not Add Query Strings to Your URLs

Last evening, a short blog post appeared in my feed reader that felt as if it spoke directly to me. It is Chris Morgan's excellent post I've banned query strings . Chris is someone whose Internet comments I have been reading for about half a decade now. I first stumbled upon his comments on Hacker News, where he left very detailed feedback on a small collection of boilerplate CSS rules I had shared there. I am by no means a web developer. I have spent most of my professional life doing systems programming in C and C++. However, developing websites and writing small HTML tools has been a long-time hobby for me. I have learnt most of my web development skills as a hobbyist by studying what other people do: first by viewing the source of websites I liked in the early 2000s, and later by occasionally getting possessed by the urge to implement a new game or tool and searching MDN Web Docs to learn whatever I needed to make it work. One problem with learning a skill this way is that you sometimes pick up habits and practices that are fashionable but not necessarily optimal or correct. So it was really valuable to me when Chris commented on my collection of boilerplate CSS rules. It helped me improve my CSS a lot. In fact, a few of the lessons from his comment have really stuck with me; I keep them in mind whenever I make a hobby HTML project: always retain underlines in links and retain purple for visited links. I have been following Chris's posts and comments on web-related topics since then. He often posts great feedback on web-related projects. Whenever I come across one, I make sure to read them carefully, even when the project isn't mine. I always end up learning something nice and useful from his comments. Here is one such recent example from the Lobsters story Adding author context to RSS . A couple of months ago, I created a new project called Wander Console . It is a small, decentralised, self-hosted web console that lets visitors to your website explore interesting websites and pages recommended by a community of independent personal website owners. For example, my console is here: susam.net/wander/ . If you click the 'Wander' button there, the tool loads a random personal web page recommended by the Wander community. The tool consists of one HTML file that implements the console and one JavaScript file where the website owner defines a list of neighbouring consoles along with a list of web pages they recommend. If you copy these two files to your web server, you instantly have a Wander console live on the Web. You don't need any server-side logic or server-side software beyond a basic web server to run Wander Console. You can even host it in constrained environments like Codeberg Pages or GitHub Pages. When you click the 'Wander' button, the console connects to other remote consoles, fetches web page recommendations, picks one randomly and loads it in your web browser. It is a bit like the now defunct StumbleUpon but it is completely decentralised. It is also a bit like web rings except that the community network is not restricted to being a cycle; it is a graph that can take any shape. There are currently over 50 websites hosting this tool. Together, they recommend over 1500 web pages. You can find a recent snapshot of the list of known consoles and the pages they recommend at susam.codeberg.page/wcn/ . To learn more about this tool or to set it up on your website, please see codeberg.org/susam/wander . In case you were wondering why I suddenly plugged my project into this post in the previous section, it is because I recently added a dubious feature to that project that I myself was not entirely convinced about. That misfeature is relevant to this post. In version 0.4.0 of Wander Console, I added support for a query parameter while loading web pages. For example, if you encountered midnight.pub while using the console at susam.net/wander/ , the console loaded the page using the following URL: This allowed the owner of the recommended website to see, via their access logs, that the visit originated from a Wander Console. Chris's recent blog post is critical of features like this. He writes: I don't like people adding tracking stuff to URLs. Still less do I like people adding tracking stuff to my URLs. ? Did I ask? If I wanted to know I'd look at the header; and if it isn't there, it's probably for a good reason. You abuse your users by adding that to the link. I mentioned earlier that I was not entirely convinced that adding a referral query string was a good thing to do. Why did I add it anyway? I succumbed to popular demand. Let me briefly describe my frame of mind when I considered and implemented that feature. When I first saw the feature request on Codeberg, my initial reaction was reluctance. I wasn't convinced it was a good feature. But I was too busy with some ongoing algebraic graph theory research, another recent hobby, with a looming deadline, so I didn't have a lot of time to think about it clearly. In fact, everything about Wander Console has been made in very little time during the short breaks I used to take from my research. I made the first version of the console in about one and a half hours one early morning when my brain was too tired to read more algebraic graph theory literature and I really needed a break. During another such break, I revisited that feature request and, despite my reservations, decided to implement it anyway. During yet another such break, I am writing this post. Normally, I don't like adding too many new features to my little projects. I want them to have a limited scope. I also want them to become stable over time. After a project has fulfilled some essential requirements I had, I just want to call it feature complete and never add another feature to it again. I'll fix bugs, of course. But I don't like to keep adding new features endlessly. That's my style of maintaining my hobby projects. So it should have been very easy for me to ignore the feature request for adding a referral query string to URLs loaded by the console tool. But I think a tired body and mind, worn down by long and intense research work, took a toll on me. Although my gut feeling was telling me that it was not a good feature, I couldn't articulate to myself exactly why. So I implemented the referral query string feature anyway. While doing so, I added an opt-out mechanism to the configuration, so that if someone else didn't like the feature, they could disable it for themselves. This was another mistake. A questionable feature like this should be implemented as an opt-in feature, not an opt-out feature, if implemented at all. The fact that I didn't have a lot of time to reason through the implications of this feature meant that I just went ahead and implemented it without thinking about it critically. As the famous quote from Jurassic Park goes: It soon turned out that my gut feeling was correct. After I implemented that feature, a page from one of my favourite websites refused to load in the console. To illustrate the problem, here are a few similar but slightly different URLs for that page: The first and second URLs load fine, but the third URL returns an HTTP 404 error page. The website uses the query string to determine which one of its several font collections to show. So when we add an arbitrary query string to the URL, the website tries to interpret it as a font collection identifier and the page fails to load. That is why, when my tool added the query parameter to the first URL, the page failed to load. Later, with a little time to breathe and some hindsight, I could articulate why adding referral query strings to a working URL was such a bad idea. Altering a URL gives you a new URL. The new URL could point to a completely different resource, or to no resource at all, even if the alteration is as small as adding a seemingly harmless query string. By adding the referral query string, I had effectively broken a working URL from a website I am very fond of. It is also worth asking whether an HTML tool should concern itself with referral query strings at all when web browsers already have a mechanism for this: the HTTP Referer header, governed by Referrer-Policy . That policy can be set at the server level, the document level or even on individual links. The Web standards already provide deliberate controls to decide how much referrer information should be sent. Appending referral query strings to URLs bypasses those controls. It moves a privacy and attribution concern out of the referrer mechanism and embeds it into the destination URL instead. I don't think an HTML tool should do that. There is also a moral question here about whether it is okay to modify a given URL on behalf of the user in order to insert a referral query string into it. I think it isn't. In the end, I decided to remove the referral query string feature from Wander Console. One might wonder why I couldn't simply leave the feature in as an opt-in. Well, the answer is that once I had deemed the feature misguided, I no longer wanted it to be part of my software in any form. The project is still new and we are still in the days of 0.x releases, so if there is a good time to remove features, this is it. But my ongoing research work left me with no time to do it. Finally, when the post I've banned query strings appeared in my feed reader last evening, it nudged me just enough to take a little time away from my academic hobby and devote it to removing that ill-considered feature. The feature is now gone. See commit b26d77c for details. The latest release, version 0.6.0, does not have it anymore. This is a lesson I'll remember for any new hobby projects I happen to make in the future. If I ever load URLs again, I'll load them exactly as the website's author intended. I will never add query strings to your URLs. Read on website | #web | #technology Wisdom on the Web Wander on the Web Broken URLs https://int10h.org/oldschool-pc-fonts/fontlist/ https://int10h.org/oldschool-pc-fonts/fontlist/?2 https://int10h.org/oldschool-pc-fonts/fontlist/?foo

0 views
Stone Tools 3 weeks ago

PipeDream on the Acorn Archimedes

During the "throw everything at the wall and see what sticks" years of home computing, up to around 1995, a lot was thrown and a lot failed to stick. Sometimes clumps would form that appeared to have the combined friction necessary to maintain wall grip, each holding the other up. But, like Mitch Hedberg's observation of belts and belt loops, it was difficult to discern who was helping who stick to what. Take for example, our focus today. We have a completely novel CPU, built by a tiny team of engineers who had never designed a processor before, running a bespoke operating system squeezed out in a rush to meet the shipping deadline of a computer that wanted to carry on the legacy of a system beloved by British schoolchildren, hosting a productivity suite that completely rethought what the term "productivity suite" even meant. Together, they formed a complete computing dead-end. Yet separately, they each achieved life beyond expectations, given their shaky beginnings. Let's start with the hardware, Acorn Computer Ltd.'s follow-up to the famous 8-bit BBC Micro, the Archimedes. Feeling the 16-bit processors of the day didn't deliver enough bang-for-the-quid, they began an investigation into 32-bit processor options. After reading a U.C. Berkeley paper extolling the virtues of the RISC architecture, and seeing firsthand the ease with which chips could be designed, in 1983 Acorn launched the Acorn RISC Machine project to develop the 32-bit brain of their next system. The fruit of that labor, the ARM processor, defined the Archimedes line. Try as they might, Acorn could never crack the home market the way they did education. Still, those ARM CPUs had longevity well beyond the life of the company that commissioned it. Your smartphone likely has ARM in it right now, and Apple's entire current hardware ecosystem is built on its spec. That powerful hardware needed a preemptive multitasking operating system that befit its computing prowess. That was to be ARX , whose troubled development missed the product launch window. In the meantime, so the computer could have something driving it at launch, a stop-gap operating system called Arthur was shipped. It was similar to Acorn's previous BBC Micro MOS (Machine Operating System), with a graphical layer grafted on top; hit F12 and that text interface will peek out from behind the curtain. Over time it was decided that Arthur was doing a bang-up job and ARX was cancelled. Thus was born RISC OS, a cooperative multitasking WIMP (windows, icons, menu, pointer) with possibly the first application "dock" on a home computer. Its mandatory three-button mouse summons an application's current context menu at the pointer location; there are no menu bars whatsoever. Drag-and-drop is embraced as a central file management metaphor, even to save documents. On top of all that, it was the first to offer scalable, anti-aliased font rendering, even if its fonts were a little "off brand." On top of this unique foundation, we have PipeDream . Developer Mark Colton was convinced that the boundaries between word processor, spreadsheet, and database were artificial and could be eliminated. A document should be able to do any of those functions at any time, anywhere on the page, he posited. One might think, "Oh, like Google Sheets ." but PipeDream handles word processing more elegantly. Another might think, "Oh, like Apple Pages " but the spreadsheet and database functions are more robust in PipeDream . This particular balance of the three productivity functions feels unique amongst even its modern peers. Does a productivity suite work better when it's just a single app? Did Colton successfully execute his vision? And where is the Homerton documentary we deserve? (I didn't know Ghost blogging platform forces images to 2000px max; I've revised my design workflow to mitigate this in the future. To make amends for this timeline's illegibility at 2000px, please accept this PDF version) Testing Rig RPCEmu v371 on Windows 11 RISC OS v3.7 1024 x 768 15-bit color 64MB RAM PipeDream v4.13 Let's Get to Work My process when first examining unfamiliar systems is as follows: I do that across a variety of emulators to see which gives me the least grief; I need to be sure I can trust a basic productivity loop. I usually try to give it a go without research, to see how far I can get on pure skillz (with a Z). It's unusual to sit down at what appears to be a computer I understand and be baffled every step of the way. I've heard this system described as "elegant" and "easy to learn." This has me questioning if maybe I'm actually a very dumb person because my impression is "uncomfortable." You know that modern horror story, aka "creepypasta", The Backrooms ? It's a hidden world that co-exists with our own, which can be entered only by clipping through a seam of reality which separates the two. In there, buzzing fluorescents light an infinite maze of featureless, yellow-wallpapered office-style floor layouts. If one were to find a running computer there, I suspect RISC OS would drive it. It's just common enough in its GUI metaphors to feel familiar, and just off-kilter enough to turn that familiarity against you. Liam Proven wrote in The Register , "You will find it very disorienting, especially if all you know is post-1990s OSes." My dude, I've been computing since the 1970s and I find it disorienting. Nothing is unlearnable (I'm dumb, not incompetent), but I genuinely had to work through its manual to acclimate myself. To be clear, I enjoyed the thrill of venturing into the unknown. After all, one of the goals of this blog is to investigate the less-trodden paths in software history. Still, there are times when I feel RISC OS is " having me on." (trying to ingratiate myself with British readers in today's post) I'll start with the three-button mouse. From left to right the buttons are "Select", "Menu", and "Adjust." After weeks working with the system, I still can't figure out what problem the "Adjust" button solves. It's semi-analogous to on modern systems, as when clicking to add/remove elements to/from a set of selected items. Then, sometimes it does something unexpected like, "drag a window by its title bar without bringing that window to the front." Other times it is baffling. a file icon to a new folder location doesn't move the file to the new location. It copies the file. If you want to move the file, you must . Why are we "SHIFT" dragging anything when we have a perfectly good "Adjust" button? Sometimes the "Adjust" button does "opposite" actions. Click a "down" scroll arrow with "Adjust" and it will to scroll up instead. Is that an "adjustment?" What does it even mean, to "Adjust" a mouse click? It seems like it could mean anything , and that's kind of my point. It's unguessable and unintuitive. An interesting UI element (which predates NeXT and Windows 95) is the Icon Tray, an important tool inexplicably not described at all in the RISC OS 3 manual. Situated along the bottom of the screen, currently running applications and directory icons sit on a little shelf. Double-click "Select" on an application icon to launch it and... nothing. Its icon displays in the Icon Tray, and that's it. We must now Single-click "Select" on that icon to actually bring the application to the forefront and activate it. I don't know what that's all about, but that's how it works. Menus are fascinating in both the positive and negative meanings of the word. There are no menus on screen whatsoever, they are only made visible by the middle "Menu" mouse button. "Menu" clicking opens a given menu at the current mouse pointer location. Icons in the Icon Tray can be "Menu" clicked to get application-level menus, like "Make a new document." Within a document, "Menu" click will give us document-level options. Conceptually, I like the "Menu" button a lot. Within a menu, any choices which open dialog boxes or control panels tend to open in-menu. It's kind of cool, being able to type, or flip switches and radio buttons, directly inside the menu itself, rather than popping up a modal window. However, it is jarring to have large panels suddenly lunge out like a xenomorph's inner jaws when scrolling through menus. These can obscure the root menu, depending on screen position. 0:00 / 0:08 1× The last point to get our collective heads around is file saving. When saving a new document, simply typing in a file name is not sufficient. Save dialog boxes expect and require the full path to your save destination; no assumptions or default folder locations are provided. You can manually type in the full path to your desired save location like this: While you type, the system will not assist you in navigating the directory structure; no autocompletion here. You must know the path by heart. The other option, as described in manuals, is to drag-and-drop your document to its save location. Drag-and-drop really seems to be the RISC OS idiomatic way to manipulate files. In a Save dialog box there is a little icon for the application. It looks like decoration, but it physically represents your document. Type a name into the text field, then drag that icon to your desired save folder. 0:00 / 0:13 1× I don't want to get bogged down enumerating RISC OS's idiosyncrasies, but a few more things need mentioning. There is a kind of "programmer's art" ugliness to the user interface; those folder icons are terrible. There are graphical glitches, as when scrolling a window too quickly (though moving windows around shows full contents, which wasn't typical during that period). Everything you set up to customize the system, like desktop icons, window positions, desktop resolution, and other settings is reset every boot unless you manually tell the system to save the current state as the "boot file." The list goes on like that. Sheesh, what a journey just to understand the basics. I expect that kind of learning curve for the text-based systems, as those DOS-like commands are unknown to me. For a GUI system to throw this "spanner in the works" (continuing my pandering) is unexpected, but a fun challenge. I can't feel myself growing to love it, but the initial feeling of discombobulation is receding. A spreadsheet is an ordered matrix of cells, each of which can hold text or math. Cells with text are typically used as labels for columns and rows of numbers, and the math cells do the work of calculating relationships between those numbers. It's all very simple. No, wait, I mean it's "easy-peasy." (commitment to the bit) Lotus 1-2-3 felt "columns and rows" could also be useful for textual data. They said the line between spreadsheets and databases is pretty fuzzy, and even today spreadsheets are used to hold and manipulate simple databases. Then racecar driver Mark Colton pierced the veil entirely. It wasn't just spreadsheets and databases that had a fuzzy separation. If we can type arbitrary text into a cell in a spreadsheet, why couldn't we type an entire book? What if all applications were really just one application, in the end? He fired his first shot at uniting everything in View Professional . This was released as PipeDream on the Cambridge Z88, a portable Z80 machine by Sir Clive Sinclair's Cambridge Computer. Built into the ROM itself, it was insta-boot, insta-launch right into a multi-purpose integrated document suite. Jerry Pournelle, in BYTE Magazine 's February 1989 issue, was moderately enamored with the hardware, but PipeDream was, "disappointingly hard to use." With Acorn evolving their BBC Micro via the Archimedes, Colton continued to support their hardware line. In interviews, he seemed to really be leaning toward Windows for the future of his company. However, since he switched development to C and there was a C compiler for the Archimedes, he said it wasn't hard to provide his product to the Acorn crowd. Running on Arthur, the precursor to RISC OS, he embraced and extended the "one document, many forms" approach. Much like today's Google Sheets, we can add arbitrarily long sections of text, insert images, set up database information, perform spreadsheet calculations, run spellcheck, and generate inline graphs. However, try typing a chapter of a book into Google Sheets if you want to drive yourself "mental." (there's no stopping me) In PipeDream , that's frictionless (within a certain definition of "friction"). Like RISC OS itself, PipeDream also requires certain shifts in thinking to not lose a finger to its sharp edges. I suppose that when a developer offers a truly new paradigm, it is fair to ask users to meet it halfway. I'm not convinced the advertising (see "Historical Record" at the end) gave customers a full understanding of how drastic that shift was. "Menu" click the Icon Tray icon (i.e. the application-level menu) for PipeDream to start up a new "Text" file and begin typing into cell A1. You'll find that text overflows, across cell boundaries, until it hits the "row wrap marker" seen in the rightmost column header (shown as a "down arrow" icon). Every line of text is its own row, in spreadsheet terms. As you type, PipeDream fills the current row, then silently inserts a new row to catch overflow. Until a paragraph break, these rows are internally associated as a logical unit. Edits which alter or disrupt text flow across rows within a paragraph are not reflected immediately in the UI. Or maybe they are? It's hard to tell with the graphic glitches in the screen redraw, a constant source of frustration while working on this article. PipeDream concedes the reflow point itself. When in doubt about the current visual structure of your text, , a manual action, will force PipeDream to recalculate text wrapping and line spacing. This can be mitigated a bit through a hidden toggle in the "Options" screen, the confusingly named "Insert on Return." This reduces the need to force a manual reflow, but can still leave visual chaos. 0:00 / 0:44 1× I've altered the text flow and initiated a recalculation of the lines. It does the work, but visually shows no change until I trigger a graphics refresh in some way. Selecting the text works, but then leaves its own graphic artifacts behind. I've "gone nutter!" (yes, these are in the captions as well!) Interestingly, I saw similar redraw issues in View Professional on the BBC Micro. It would appear this is, to some extent, part of the software's DNA. Honestly, this is all "a bit of a shambles." (the hits keep coming) Have you ever wanted a word processor that won't indent paragraphs? PipeDream being a chimera, navigation idioms are forced to choose which parent they love most. An examination of the key demonstrates this. In a word processor, we usually have a horizontal page ruler with tab stops. Tab over to a tab stop and type to align text at that indentation point on the page. In a spreadsheet, navigates us to the next cell to the right. In PipeDream , the spreadsheet idiom wins TAB's love. In a text cell, sets an invisible indicator at paragraph start which forces every subsequent line of that paragraph to begin at that same column. For example, by default every line of text is added to column A, the leftmost. If we to column B, the text will start there but when it wraps to the next line, that will also begin in column B. "Indentation" is at the paragraph level, not the line level. How do we indent the first line of a paragraph? The manual has a solution. In looking back through the history of Colton's software on the Acorn line, I found this note in a review of View 2.1, his standalone word processor for the BBC Micro. "Why is there no numerical information on the rulers or cursors to assist formatting?" asked Acorn User , January 1985. It seems Colton had it in for rulers for a decade, and to my thinking this points to a disconnect between what a programmer thinks users need, versus what users actually need. A stubborn rejection of norms doesn't always mean we're on the right track. We can use the cell-based layout engine of the program to pull off a fun party trick. Under "Options" there is a toggle between Row and Column text wrap. "Row" behaves like a typical word processor. "Column" lets us divide the page into columns, like a newspaper. Tab between columns and the column width will be respected by the word wrap. Kind of cool, and could be useful in a "I need to make a newsletter, stat!" pinch. Like a spreadsheet, column widths are document-wide, so no mix-and-match. Someone very clever with the tools could probably coax complex layouts out of it, but that would require an ungodly amount of pre-planning, design, and patience before starting a document. You really have to try to get it right the first time, because I don't find PipeDream particularly adept at handling large structural changes after the fact. The column-based formatting gets frustrating, but in other ways the word processing is "bog-standard." (How many will I squeeze in? Place your bets!) We have a built-in spell check, user-definable dictionary, word count, text alignment, font choices, and an anagram/subgram maker. Bank Street Writer Plus had an anagram maker as well. Why was that such a thing back then? Have I forgotten some fad of the 80s and 90s? That's all fine and dandy, but I'll tell you what isn't: there's no simple cut/copy/paste, at least not as a modern audience may understand those tools. In the document, we are restricted to cell-level selection, meaning I can't select individual words inside cell A1. I can only select the entire cell A1, which in PipeDream means an entire line of text. We can ask PipeDream to edit a cell in its own window, where it pops out for surgical editing. "Edit Formula in Window" highjacks the spreadsheet formula editor in order to get character-level selection control. In this pop-out window, we can highlight individual words and do typical cut/copy/paste actions. Notice, though, we're still restricted to only the text within the cell, which means only that line (row) of text. It's highly likely any given row will contain the tail-end of the previous sentence and the first part of the next sentence. If we want to cut out a specific sentence which doesn't align neatly to the row structure, there is no way to do so. I will repeat that. There is no way to cut/copy/paste an arbitrary string of characters. Now I feel PipeDream's vision working against itself for anything but simple correspondence. Remember, this is version 4 of PipeDream, Colson's fifth software release to pursue this unified application dream, and this is where we're at. I can't imagine writing anything substantial within these frustrating limitations. As a spreadsheet, PipeDream performs far more admirably, even if certain conventions have been eschewed in favor of its new vision. Hey, if you're gonna quirk it up, might as well go for broke. Unlike its spreadsheet ancestors, there is no menu, nor is there a simple way to tell PipeDream that we want to enter a formula into a cell, as with to denote a function call, or to indicate we want to do math. Many of Lotus 1-2-3's innovations have been utterly ignored. The global "Options" allows us to set default behavior for cell entry. Setting it to "numbers" will put us into the right context for easy formula entry, or we can click into the ever-present formula entry line at the top of the window. Turn on the "Grid" overlay to draw cell boundaries and before you know it what was a word processing document is now a spreadsheet with "the full Monty." (TIL it doesn't mean "full-frontal nudity") The functions available to number crunchers are plentiful and robust. Trigonometric functions are a given, but its inclusion of matrix math may come as a surprise. Even complex functions like , which computes "the complex hyperbolic arc cosecant of as a complex number," are present and accounted for, so hardcore math nerds can breathe a sigh of relief. A wide number of financial functions, statistical functions, lookup tables, string manipulations, and date handling are all here. So too are flow control tools, like , , and more. There are even GUI controls available for showing error dialog boxes and prompts for user input, though those are only available from within custom functions. Yes, if you're missing a function, you can make your own. In a new worksheet, start a formula with (which can accept typed parameters) and end it with . In between, do the work. PipeDream will check syntax and accept or reject each line of your function. If accepted, it will prefix a line with In your real working worksheet, access the formula by . That file reference implies PipeDream can access data from other worksheets, and that is true. Even a cell reference in a formula can be pulled from a completely different worksheet. I find the syntax for custom functions opaque, and the manual does a poor job of explaining what is possible and how to use the tool. There are a handful of examples provided with the software installation, with bugs, that reveal secrets only upon very close inspection. For example, notice in the screenshot above that the parameters to the function are later referenced by prefix, but local variables, as set by the function are not prefixed when used in calculations. It's those subtle little things that tripped me up. The same with having the return value called . Or how the program has a selection of "Strings" functions, but when passing a string as a parameter its type is "Text." I stared at that syntax for a LONG TIME before finally realizing my various little misunderstandings. Customization doesn't stop there. Individual keys can be defined as shortcuts to longer string sequences, F-Keys (plain and modified) can be defined to trigger commands, and command sequences (triggered by the CTRL key) can be redefined to your liking (which risks overwriting built-in command shortcuts). You really can make PipeDream your own, though you're in for a struggle compared to Lotus 1-2-3 and the thousands of books available to help learn its principles. I found no actual books for PipeDream , just publishing announcements in old magazines. Something must exist, but the internet at large appears bereft. On the scorecard of "this amalgamation approach to productivity software is working," I'd say we're 1 and 1. The spreadsheet tools are fiddly, but robust. The word processing has me very underwhelmed. Time for the tie-breaker: databases. Using the supplied Lotus 1-2-3 conversion tool, I was able to bring in the data I originally created in CP/M dBASE II and had subsequently converted to DOS Lotus 1-2-3. Now it lives on in RISC OS PipeDream . This data has more passport stamps than Indiana Jones. Let's consider some of the basic things one might want to do with data. PipeDream beats out Lotus in sorting, giving us a five-stage, multi-row, sort with ascension. Not too shabby for the time, all things considered. Search and replace does what it "says on the tin" (in for a penny, in for a pound), and can also accept regex-like tokens and patterns. More interestingly, cells can be set up to directly perform queries on table data. There are a small handful of prefixed database functions to calculate averages, min/max, counts of things, and more. One last feature of note is how to use the query tools to extract a result into a new database. This is interesting as it utilizes RISC OS's drag-and-drop Save functionality in a clever way. 0:00 / 0:15 1× Note how the query for data extraction is much longer than the tiny little text field in the contextual menu can handle elegantly. This is one of those usability tradeoffs for the RISC OS way of doing things. I was initially ready to write off the database functionality as being underwhelming, until I reminded myself of the stated goal for PipeDream . Its core proposition is that there is no difference between the various aspects of the software. The word processor is the spreadsheet is the database. We're not limited to the "database" functions when manipulating our database data. We have access to everything the program has to offer, at all times. Let's clip through the inverted UV plane separating database and spreadsheet, and see what kind of trouble we can get into. I'm thinking back to the Lotus 1-2-3 article and how database information was queried there. With a table of data, we had to use the built-in query forms, define areas on the sheet to hold query parameters, and designate another section of the sheet into which query results would display. It was an obtuse Rube Goldberg machine that I couldn't understand until I drew a diagram of the process. In PipeDream , we just write a formula, the same as if it were a spreadsheet. Let's get the average rating of all adventure games in the database published before 1985. "Bob's your uncle!" (I was hoping to work that one in) Let's mix it up a little and get the same average, but only for titles which begin with "Zork." We can use wildcards, but let's leverage PipeDream's word processing string tools. The most awesome part about this is that, like any spreadsheet formula, it updates in real time. Change the ratings, or add a new Zork game to the mix, and get the new average instantly. The database is the spreadsheet is the database, so that calculation can then be referenced as a value for another cell's formula, perhaps adding sales tax to the average unit price. While we're at it, might as well throw in some fancier text formatting to make it look pretty. In the Lotus 1-2-3 investigation, I wanted a pie chart showing a breakdown by game categories. Lotus had a handy function which removed duplicates from lists, making it possible to extract the full list of unique game categories, which could then be used as the query parameters for generating a chart. PipeDream can't do that, but it does have other string parsing routines, variables, cross-file data referencing, and the ability to write custom functions and macros. I don't doubt it would be possible to homebrew a workaround to this missing function. In fact, let's "have a bash at it." (swish!) 0:00 / 0:12 1× Note the real-time update of the chart as I modify an external database. Ultimately, I couldn't achieve an elegant solution, but I could achieve my goal. I sorted the original data by genre, then created a column that checks if the genre for each row matches the one above it. If so, it's a otherwise a . Then, I extracted all rows with in the column. Last, I did (count any items in a list), where the source list is contained in the original database document. With the documents thus linked, I get real-time graph updates when I alter the core database, thanks to external reference handling. Everything's "tickety-boo!" (I'm trusting The Independent on this one) OK, PipeDream , you're winning me over a little more now. Time to take this to its logical conclusion. We haven't yet pushed it as the multi-purpose document creation tool it promises to be. We've done a little dabbling, with text formatting and data extraction, but I want to see everything come together. I want the borders to crumble . The approach I'm finding to be least troublesome is to begin with a "text" document, then decorate that with spreadsheet/database elements. 0:00 / 0:16 1× As I scroll, text will disappear until I trigger a redraw event in the window. (pay no attention to the content of the letter) In building that document, here's what I learned. We have a unique confluence of interesting technologies coming together to form a strangely flawed jewel. It sparkles and shines when the light hits it just right , and in those sparkles we may catch a fleeting glimpse of a world that might have been. Might have been, but wasn't . Let's see where each of the underlying technologies wound up and those in the know can feign shock with the rest of us when we learn that ARM isn't the only thing that survives to this day. We'll start with the obvious truth: ARM won. It's in everything, everywhere, all at once. If it isn't in your computer, it's in your phone, or your Newton, or your Palm Pilot, or your Canon camera, or your Nintendo DS, or your Nintendo 3DS, or your Nintendo Wii, or your Nintendo Switch, or your Nintendo Switch 2, or your Raspberry Pi, or maybe you're sidetalking on your N-Gage. Its combination of low power consumption with high performance makes it ideal for mobile devices, of which we are in abundance. But why ARM specifically? Others have swung for the RISC fences and stumbled, yet Acorn set two engineers to the task of designing their first ever microprocessor and somehow achieved a ubiquity that has remained (mostly) unchallenged. Apple/IBM/Motorola gathered their forces and developed their own RISC architecture, which debuted in Apple's Power Macintosh 6100. PowerPC doesn't mean much to a Windows/Intel crowd, but the Mac faithful remember all too well Apple's investment in that as the successor to the x68000. Frustrated by delays in the evolution of the chip line, Apple wound up ditching it for Intel x86 , even if they eventually rediscovered the joys of RISC. PowerPC went on to be adopted by a number of game consoles, notably the Nintendo Wii, XBox 360, and PS3 simultaneously. The line continues today, and heck, Mars rovers Curiosity and Perseverance both have PPC inside. Hard to call such a history a "failure," but who outside hardcore Amiga faithful today is clamoring for a PowerPC chip? The SPARC RISC architecture, of "Sun SPARC Workstation" fame, chugged along until as late as 2017, when Oracle purchased Sun. A notable achievement, in pop culture circles, is this is the hardware Pixar's first Toy Story was rendered on . Though Oracle disbanded the design team keeping the architecture alive, the architecture itself is free and open source. There's nothing stopping an intrepid reader from carrying on the lineage, I suppose. Fujitsu, the last of the production line for the series, has abandoned SPARC for ARM. I'll be honest, I can't figure out what ARM does so much better than other attempts, like SPARC, at making a great RISC processor. Reading through the Ars Technica story , it seems to be less about the underlying tech and more about the savvy promotional work of Robin Saxby and his absolute unwillingness to lose the RISC wars. Where others were building RISC for the server-side, ARM committed themselves to the mobile side, skating to where the puck would be . Whatever the case, whatever the magic, ARM makes it available to anyone who wants it, through their licensing partnerships. Ultimately, this really seems to be what has given ARM its staying power; a low barrier to entry to quickly join in on high-performance, low-power draw, ARM fun. It's important to note that ARM doesn't make processors; they only license their IP. <<record_scratch.mp3>> OK, be that as it may, it is still substantially correct to say that IP licenses are their bread and butter. A "core license" allows a company to manufacture a specific ARM-designed CPU, a popular choice for system-on-a-chip designs. Alternatively, an "architectural license" permits a company to design and build its own custom CPU around the ARM instruction set. That's what Apple does with their A- and M-series chips. In recent years, ARM is feeling light competitive pressure from the RISC-V architecture. Born in the same UC Berkeley labs that birthed the original RISC design reports that inspired Acorn to take a chance on RISC, its architecture, unlike ARM, is free and open source. Consumer-level devices running on RISC-V have already started shipping. A new race has begun. Acorn's Archimedes line ultimately never sold particularly well. It's hard to nail down specific sales figures , but a 1991 Acorn shareholder report said, "Acorn is now the UK number one supplier of 32-bit RISC machines with an installed base of over 150,000 units." For context, the Amiga line had sold some 2 million units by 1991. We can't say Acorn didn't put in the effort, releasing some 13 model variations in under a decade. The general consensus seems to be that they "cost a bomb." (that's a new one on me) Schools adopted them, as a natural evolution of Acorn's prior BBC Micro installations, but at US$3,000 to $9,000 (in 2026 money) families just couldn't afford to put one in the home. In the mid-90s, Acorn dropped the Archimedes line, switching tracks to the more business-like Risc PC line, and produced a handful of systems around the StrongARM CPU. However, while the CPU spirit was willing, the motherboard flesh was weak, leaving the CPU underutilized . The lineup ended concurrently with the end of Acorn around 1998. Castle Technology tried to keep the Risc PC line going, post Acorn, but called it quits shortly thereafter, in 2003. Open-sourced in 2018, RISC OS Open keeps it running and up to date for modern RISC-based hardware platforms, especially the Raspberry Pi. Currently at v5.30 at the time of this writing, it is still a 32-bit operating system with " moonshot" aspirations of 64-bit someday. * checks watch* Time is ticking to pull that together before fading into 32-bit irrelevance. Did I mention how tiny this thing is? The latest version for Raspberry Pi is a 155MB download. Version 3.7, which I used for this article, downloaded as a pre-configured emulator with OS and apps pre-installed, was a mere 129MB. Even the most up-to-date pre-configured package tops out at a "massive" 1GB, apps and emulator inclusive. How big is macOS on ARM? Leading in with his View lineup of productivity apps on the BBC Micro, Mark Colton was the man with the all-in-one vision. With View Professional , he took his first stab at providing an uber-app for that 8-bit workhorse. It's primitive and clunky to use, but the spark is present. He would then expand on his ideas through the PipeDream lineup, taking it all the way to version 4.5. Every version refined the vision, but ultimately its character-based layout engine roots became a limiting factor to its growth. One rewrite later, he had a true GUI-based implementation, for both the Archimedes and Windows, in Fireworkz released in 1993. Having created standalone products Wordz and Resultz, Fireworkz combined those back into one. By mid 1995, Fireworkz Pro added in the database functionality, merging the new Recordz into the product, and that's where Colton's involvement ended. Besides asking "What even is a spreadsheet anyway?" Colton's other passion was race car driving. In August 1995, an engineering defect in the front wing of his Pilbeam M72 caused it to fold under his car while he was at top speed. He lost control, crashing headlong into a telegraph pole, and was killed. Most shockingly, both PipeDream and Fireworkz continue to be maintained to this day. Mark's father, Richard, generously open-sourced both PipeDream and Fireworkz just before his own untimely death in 2015. Fireworkz Pro, the version that includes database functionality, is not open-sourced and is still for sale . The PipeDream package available for installation in RISC OS package manager is not the version I'm using for this article. That is the modern update, which adds a bunch of niceties, including a GUI toolbar for formatting text, expanded spreadsheet functions, and a mind-boggling number of bug fixes. This is all maintained by lone developer Stewart Swales , someone intimately involved in the RISC OS and PipeDream history. He worked at Acorn and helped develop Arthur, the OS that became RISC OS. Later, he joined Colton Software as lead developer, working on PipeDream and Fireworkz. There's really nobody better to carry on the legacy. Where, precisely, Colton's continuation of that legacy would have gone, we can't say with certainty. However, we do have a little insight into his thinking. In an interview with Acorn User , December 1994, he said, "Over the next few years...we won’t be writing spreadsheets either; we'll be writing a totally different style of program. I expect spreadsheets, word processors and so on to be provided as part of the operating system in the future." Let me start by making it clear that I appreciate the effort. I say that with all sincerity and for everyone involved. From the machine, to the OS, to the productivity suite, all katamari'd up into a unique star. It was a lot of fun feeling like a beginner again. I had moments of true learning, shedding expectations of "how things should be" and experiencing fresh, alternate ways to approach work. I said at the beginning, the question that needs answering is, "Did Colton successfully execute his vision?" and here I must waffle. From View Professional , through five major releases of PipeDream , and two Fireworkz releases, he held fast to a very particular line of exploration. That he never wavered in his pursuit of that vision, says to me that he must have felt he had achieved his goal to some degree. In that regard, we can say he successfully executed his vision. As an end-user, it is hard to align myself to that vision. I get what he's after, especially when trying to make sure documents always reflect the latest data. After using PipeDream for a number of weeks, I remain unconvinced that the solution is to graft all software into one uber-application. If we follow that thinking to its logical conclusion, then why not include paint features? Why not include robust desktop publishing features? Where would it stop? Had the amalgamation of these productivity apps birthed something uniquely unachievable by other means, or unlocked some latent potential in the individual apps, I'd be very willing to adapt to this "skew-whiff" (last one, I promise!) approach to application design. As it stands, I ultimately don't see what it does that wouldn't be equally well-served, perhaps better-served, by intelligent file link management with robust publish/subscribe functionality. In fairness, a deep implementation of that would work best as an OS-level feature, and Colton could only control his own works. Paradoxically, the most frustrating aspect in removing the barriers between applications is how we wind up with a slate of new barriers forged in that alliance. Colton said of View Professional that even when the apps are combined, none should feel like a compromised version of that app. Yet, compromises are what I feel with every document I build. Is it worth giving up easy text formatting and basic cut/copy/paste for the off-chance I might need to insert a little spreadsheet table? There's an 80/20 rule being almost willfully ignored here. I love that Colton had a unique vision and stuck to it. I love that someone tried to forge a new path in productivity application design. I love that PipeDream exists, but I don't love it . Ways to improve the experience, notable deficiencies, workarounds, and notes about incorporating the software into modern workflows (if possible). Testing Rig RPCEmu v371 on Windows 11 RISC OS v3.7 1024 x 768 15-bit color PipeDream v4.13 boot the system launch my application of interest make a dummy document quit the emulator entirely and reboot load my saved document Because rows and columns are shared throughout the document, insertions and deletions, or moving things around, creates difficult-to-resolve layout issues. If a spreadsheet sits to the right of a block of text, and we want to insert a row into only the spreadsheet part, that's not possible. Doing so will also insert an empty row into the paragraph, leaving a gap. PipeDream has a strange concept of "global font" vs. "local font". Local fonts can't be changed until the global font is set to something other than the system font. The global font controls value cells, which cannot be styled individually. Local fonts will style a cell from wherever the cursor is currently located, and it is very easy to target a cell and style its font, but miss the first character or two, even though the entire cell is highlighted as a selection. "What will be the result of my action?" is not always crystal clear. The controls for styling charts are difficult to understand, and messing up is hard to reverse out. I accidentally added "New Text" to the chart and it took a long time to figure out how to delete it; selecting it and hitting "delete" doesn't work. There is no way to modify the legend. There's no facility for selecting elements for inclusion/exclusion from the graph. In my case, formatting to look good on the printed page meant adding empty columns which wound up in the pie chart. This is very representative of the struggles the layout engine introduces. Making data look good in one context risks "making a shambles of it" (are these working? have I won you over?) in another. Page layout settings are cryptic. Margins can only be set to the top and left (?!?!) and only in unspecified numeric units. I used the template default values, and the page wound up shifted down and to the left. Getting beautiful output is a challenge. How could I forget? There's no UNDO! Some programs, like !Draw (vector illustration) and !Edit (text editor) have undo, and others like !Paint and !PipeDream do not. Getting started with RPCEmu , using a pre-built package, was as dead simple to use as you'd imagine. I experienced no crashes of the emulator, operating system, or PipeDream . It was a very solid experience in that regard. PipeDream itself, at least the version I used, had a ton of annoying bugs and the graphical glitches were even noted in a review by Micro User , February 1992 . But emulator-wise, everything was smooth. I recommend first-time users grab a pre-built image for quickly jumping in and seeing what the fuss is all about. I also do recommend going through the RISC OS Manual. The operating system is almost unusable until you learn its little tricks and nuances of operation. Pre-built images: https://www.marutan.net/rpcemu/easystart.html v3 Manual: https://archive.org/details/ro-3-user-guide v5 Manual: https://archive.org/details/risc-os-5.28-user-guide Technically, I am cheating a bit in this review. RPCEmu doesn't emulate an Archimedes but rather Acorn's later Risc PC. I ran PipeDream from floppy in Arculator, which explicitly emulates Archimedes systems, to compare the experiences. Except for RPCEmu's snappier performance (which I want anyway), RISC OS itself abstracts away the hardware layer so much it didn't seem to matter one emulator over the other. The emulator itself expects some specific keyboard, with the key situated between and . I don't have that, and nothing on my extended keyboard would send the right code to the emulator. is used for logical in PipeDream data queries; I had to use Windows ALT keycodes. I mentioned earlier, but I'll make it explicit here: there is no undo. Fireworkz is available as a native Win32 app. It launches without issue on Windows 11 64-bit, and even in Wine on macOS. It looks and feels exactly like Fireworkz on RISC OS, which looks and feels a lot like the latest version of PipeDream (minus the database parts). The list of bug fixes and quality of life enhancements is vast. Scrolling through all changes since Colton passed is kind of pointless due to its scope. I'll say, "a lot has improved" and leave it at that. As a local-only alternative to the Google/Apple/Microsoft hegemony, it's worth checking out. It's free, open source, actively maintained, a mere 2.5MB download, and for God's sake at least it's trying to do something different. Getting documents out of RISC OS into a modern system is easy, but has its caveats. RPCEmu can directly save to the host operating system, so getting files out is a non-issue. PipeDream's options for saving documents will strip the document's uniqueness, however. Saving as ASCII will try to keep text precisely as shown in PipeDream, inserting line breaks at the end of every line of text. Tables are just tab-indented. Any text formatting, fonts, graphs, etc. are stripped, of course. Saving as "Paragraph" is like ASCII, but will keep text together as logical paragraphs. This is much better for pasting the text into new documents. We still lose anything done to make the document look pretty. PDF printing is an option in RISC OS, and proved to be the best way I could find to get PipeDream documents into the real world. This required two parts: activating the PDF printer and running a separate !PrintPDF application. With both active, PipeDream generated PDFs without issue.

0 views
Corrode 1 months ago

Bugs Rust Won't Catch

In April 2026, Canonical disclosed 44 CVEs in uutils, the Rust reimplementation of GNU coreutils that ships by default since 25.10. Most of them came out of an external audit commissioned ahead of the 26.04 LTS. I read through the list and thought there’s a lot to learn from it. What’s notable is that all of these bugs landed in a production Rust codebase, written by people who knew what they were doing, and none of them were caught by the borrow checker, clippy lints , or cargo audit . I’m not writing this to criticize the uutils team. Quite the contrary; I actually want to thank them for sharing the audit results in such detail so that we can all learn from them. We also had Jon Seager, VP Engineering for Ubuntu, on our ‘Rust in Production’ podcast recently and a lot of listeners appreciated his honesty about the state of Rust at Canonical. If you write systems code in Rust, this is the most concentrated look at where Rust’s safety ends that you’ll likely find anywhere right now. This is the largest cluster of bugs in the audit. It’s also the reason , , and are still GNU in Ubuntu 26.04 LTS. :( The pattern is always the same. You do one syscall to check something about a path, then another syscall to act on the same path. Between those two calls, an attacker with write access to a parent directory can swap the path component for a symbolic link. The kernel re-resolves the path from scratch on the second call, and the privileged action lands on the attacker’s chosen target. Rust’s standard library makes this easy to get wrong. The ergonomic APIs you reach for first ( , , , ) all take a path and re-resolve it every time, rather than taking a file descriptor and operating relative to that. That’s fine for a normal program, but if you’re writing a privileged tool that needs to be secure against local attackers, you have to be careful. Here’s the bug, simplified from . Between step 1 and step 2, anyone with write access to the parent directory can plant as a symlink to, say, . Then follows the symlink and the privileged process happily overwrites with whatever happened to contain. The fix uses : The docs for say (emphasis mine): No file is allowed to exist at the target location, also no (dangling) symlink . In this way, if the call succeeds, the file returned is guaranteed to be new. A in Rust looks like a value, but remember that to the kernel it’s just a name. That name can point to different things from one syscall to the next. Anchor your operations on a file descriptor instead. only helps with that when you’re creating a new file. For everything else, open the parent directory once and work relative to that handle . If you act on the same path twice, assume it’s a TOCTOU (Time Of Check To Time Of Use) bug until you’ve proven otherwise. This is a close relative of TOCTOU. You want a directory with restrictive permissions, so you write something like this. For a brief moment, exists with the default permissions. Any other user on the system can it during that window. Once they have a file descriptor, the later doesn’t take it away from them. Reach for and so the file or directory is born with the permissions you want. The kernel will apply your on top, so set that explicitly too if you really care. The original check in was literally this: That comparison is bypassed by anything that resolves to but isn’t spelled . So , , , or a symlink that points to . Run and see it rip right past your check and lock down the whole system. Here’s the fix : resolves , , and symlinks into a real absolute path. That’s a lot better than string comparison. Oh and if you were wondering about this line: I think that’s just a fancy way of saying In the specific case of , this works because has no parent directory, so there’s nothing for an attacker to swap from underneath you. In the more general case of comparing two arbitrary paths for filesystem identity, however, you’d want to open both and compare their pairs, the way GNU coreutils does. (Think identity, not string equality.) By the way, my favorite bug in this group is CVE-2026-35363: It refused and but happily accepted and , then deleted the current directory while printing . 😅 Rust’s and are always UTF-8. That’s a great choice in 99% of all cases, but Unix paths, environment variables, arguments, and the inputs flowing through tools like , , and live in the messy world of bytes. Every time a Rust program bridges that gap, it has three options. The audit found bugs in both of the first two categories. Here’s an example. This is the original code, from . GNU works on binary files because it just shuffles bytes around. The uutils version replaced anything that wasn’t valid UTF-8 with , which silently corrupted the output. Here’s the fix: stay in bytes. forces a UTF-8 round-trip through . does not. It writes the raw bytes directly to . For Unix-flavored systems code, use and for filesystem paths, for environment variables, and or for stream contents. It’s tempting to round-trip them through for easier formatting, but that’s where the corruption creeps in. UTF-8 is a great default for application strings, but it’s absolutely, positively the wrong default for the raw byte stuff Unix tools work with. In a CLI, every , every , every slice index, every unchecked arithmetic operation, every is a potential denial of service if an attacker can shape the input. That’s because a unwinds the stack and aborts the process. If your tool is running in a cron job, a CI pipeline, or a shell script, that means the whole thing just stops working. Even worse, you could find yourself in a crash loop that paralyzes the entire system. A canonical case from the audit was ( CVE-2026-35348 ). The flag reads a NUL-separated list of filenames from a file, but the parser called on a UTF-8 conversion of each name: GNU treats filenames as raw bytes, the way the kernel does. The uutils version required UTF-8 and aborted the whole process on the first non-UTF-8 path: (I reproduced this against on macOS. The Python one-liner is there because most modern shells refuse to create a non-UTF-8 filename for you.) Your nightly cron job is dead and there goes your weekend. In code that processes untrusted input, treat every , , indexing, or cast as a CVE waiting to be filed. Use , , , , and surface a real error. Push back on the boundary of your application and let the caller deal with the fallout. A good lint baseline to catch this in CI: These are noisy in test code where panicking on bad data is exactly what you want. The cleanest way to scope them to non-test code is to put at the top of each crate root, or to gate on the individual modules. Closely related to the previous point, a few CVEs come from ignoring or losing error information. and returned the exit code of the last file processed instead of the worst one. So could fail on half the files and still exit . Your script thinks everything is fine. called on its call to mimic GNU’s behavior on . The intent was reasonable, but that same code ran for regular files too, so a full disk silently produced a half-written destination. The reason was that someone wanted to throw away a and reached for , , or . Here’s a very simple pattern to avoid that: Also, if you write to discard a , leave a comment that explains why this specific failure is safe to ignore. A surprising number of these CVEs aren’t “the code does something unsafe” but “the code does something different from GNU, and a shell script somewhere relied on the GNU behavior.” The clearest example is (CVE-2026-35369). GNU reads as “signal 1” and asks for a PID. uutils read it as “send the default signal to PID -1”, which on Linux means every process you can see . Yikes! A typo becomes a system-wide kill switch. If you reimplement a battle-tested tool, bug-for-bug compatibility on exit codes, error messages, edge cases, and option semantics is a security feature. (Hello, Hyrum’s Law – and obligatory XKCD 1172 !) Anywhere your behavior diverges from the original, somebody’s shell script is making a wrong decision. uutils now runs the upstream GNU coreutils test suite against itself in CI. That’s the right scale of defense for this class of bug. CVE-2026-35368 is the worst single bug in the audit. It’s local root code execution in . The bug is visible if you know what to look for (a followed by a function call that loads a dynamic library), but it’s the kind of thing that doesn’t jump out on a first read. Here’s the pattern, simplified from the utility. Huh. Looks innocent. The trap is that ends up loading shared libraries from the new root filesystem to resolve the username. An attacker who can plant a file in the chroot gets to run code as uid 0. GNU resolves the user before calling . Same fix here. Once you’re across, every library call might run the attacker’s code. And no, static compilation doesn’t help here, because goes through NSS, which s modules at runtime regardless of whether your binary is statically linked. You might have made it this far and thought “Wow, that’s a lot of bugs! Maybe Rust isn’t as safe as I thought?” That would be the wrong conclusion. Keep in mind that none of the following bad things happened: That means, even if the tools were (and probably still are) buggy, they never had a bug that could be exploited to read arbitrary memory. GNU coreutils has shipped CVEs in every single one of those categories. Take a peek at the last few years of the GNU file: …the list goes on and on. The Rust rewrite has shipped zero of these, over a comparable window of activity. 1 That’s most of what historically goes wrong in a C codebase. What’s left is, frankly, a more interesting class of bug. It lives at the boundary between our controlled Rust environment and the messy, chaotic outside world, where paths, bytes, strings, and syscalls are all tangled up in one eternal ball of sadness. That’s the new security boundary of modern systems code. 2 If you write systems code in Rust, treat this CVE list as a checklist. Grep your own codebase for , stray calls, discarded s, , and string comparisons against . I also wrote a companion post, titled Patterns for Defensive Programming in Rust . When I think of “ idiomatic Rust ”, correctness is not the first thing that comes to mind. After all, isn’t that the compiler’s job? Instead, I think of elegant iterator patterns , ergonomic method signatures, immutability , or clever use of expressions . But none of that matters if the code doesn’t do the right thing, and the compiler is far from perfect at enforcing correctness. That’s why we don’t only have idioms for writing more elegant code; we also have idioms for writing correct code. They are the distilled experience of a community that has learned, often painfully, which shapes of code survive contact with reality and which ones do not. Reality is rarely as tidy as the abstractions we would like to impose on it. The mark of robust systems, in any language, is the willingness to reflect that untidiness rather than paper over it. Rust gives us extraordinary tools to do so, and the compiler will hold a great deal for us. But the part it cannot hold, the boundary between our program and everything else, is still ours to get right. The type system can encode many things, but it cannot encode conditions outside of its control, such as the passage of time between two syscalls. Idiomatic Rust, then, is not just code that the borrow checker accepts or that leaves alone. It is code whose types, names, and control flow tell the truth about the system they run in. And that truth is sometimes ugly. It could mean using file descriptors instead of paths, instead of , instead of , and bug-for-bug compatibility over clean semantics. None of it is as pretty as the version you would write on a whiteboard. But it is more honest. Need Help Hardening Your Rust Codebase? Is your team shipping Rust into production and want to make sure you’re not falling into the same traps? I offer Rust consulting services, from code reviews and security-focused audits to training your team on the patterns that the compiler won’t enforce for you. Get in touch to learn more. To be fair to GNU: GNU coreutils is 40 years old and has had a very long time to surface and fix this class of bug. And we don’t know there are no memory-safety bugs in the Rust rewrite, only that the audit didn’t find any. Still, the difference is noticeable when comparing the same duration of development activity. ↩ It’s worth noting that the / TOCTOU class of bug is in some ways easier to avoid in C than in Rust. C code naturally reaches for an open file descriptor and the family of syscalls ( , , , ), and most creation syscalls take a argument directly. Rust’s high-level APIs abstract over the file descriptor and operate on values, which makes the path-based, re-resolving call the path of least resistance. The handle-based APIs exist on every Unix platform; Rust just doesn’t put them front and center. ↩ 🫩 Lossy conversion with silently rewrites invalid bytes to U+FFFD. That’s just fancy data corruption. 🫤 Strict conversion with or crashes or refuses to operate. 😚 Staying in bytes with or is what you should usually do. No buffer overflows. No use-after-free. No double-free. No data races on shared mutable state. No null-pointer dereferences. No uninitialized memory reads. buffer overflow on deep paths longer than (9.11, 2026) out-of-bounds read on trailing blanks (9.9, 2025) heap buffer overflow (9.9, 2025) writes a NUL byte past a heap buffer (9.8, 2025) 1-byte read before a heap buffer with a key offset (9.8, 2025) and crashes with SELinux but no xattr support (9.7, 2025) heap overwrite ( CVE-2024-0684 , 9.5, 2024) reads unallocated memory on malformed input (9.4, 2023) stack buffer overrun with many files and a high (9.0, 2021) To be fair to GNU: GNU coreutils is 40 years old and has had a very long time to surface and fix this class of bug. And we don’t know there are no memory-safety bugs in the Rust rewrite, only that the audit didn’t find any. Still, the difference is noticeable when comparing the same duration of development activity. ↩ It’s worth noting that the / TOCTOU class of bug is in some ways easier to avoid in C than in Rust. C code naturally reaches for an open file descriptor and the family of syscalls ( , , , ), and most creation syscalls take a argument directly. Rust’s high-level APIs abstract over the file descriptor and operate on values, which makes the path-based, re-resolving call the path of least resistance. The handle-based APIs exist on every Unix platform; Rust just doesn’t put them front and center. ↩

0 views

Debugging WASM in Chrome DevTools

When I was working on the WASM backend for my Scheme compiler , I ran into several tricky situations with debugging generated WASM code. It turned out that Chrome has a very capable WASM debugger in its DevTools, so in this brief post I want to share how it can be used. I'll be using an example from my wasm-wat-samples project for this post. In fact, everything is already in place in the gc-print-scheme-pairs sample. This sample shows how to construct Scheme-like s-exprs in WASM using gc references and print them out recursively. The sample supports nested pairs of integers, booleans and symbols. To see this in action, we have to first compile the WAT file to WASM, for example using watgo : The browser-loader.html file in that directory already expects to load gc-print-scheme-pairs.wasm . But we can't just open it directly from the file-system; since it loads WASM, this file needs to be served with a local HTTP server. I personally use static-server for this, but you can use anything else - like Python's built-in http.server : Now it can be opened in the browser by following the printed link and selecting the browser-loader.html file. Open the Chrome DevTools, and in Sources , open the Page view on the left. It should have one entry under wasm , which will show the decompiled WAT code for our module. Note: this code is disassembled from the binary WASM, so it will lose some WAT syntactic sugar (like folded instructions): You can set a breakpoint by clicking on the address column to the left of the code, and then refresh the page. The DevTools debugger will run the program again and stop at the breakpoint: Here you can step over, into, see local values and call stack, etc - a real debugger! The most important use case for me while developing the compiler was debugging unexpected exceptions (coming from instructions like ref.cast ). Notice the checkboxes saying "Pause on ... exceptions" on the right-hand side of the previous screenshot. With these selected, the DevTools debugger will automatically stop on an exception and show where it is coming from. Let's modify the gc-print-scheme-pairs.wat sample to see this in action. The $emit_value function performs a set of ref.test checks to see which kind of reference it's dealing with before casting; let's add this line at the very start: It's clearly wrong to assume that $v is a bool reference without first testing it; this is just for demonstration purposes. Without setting any breakpoints, recompiling this code with watgo and reloading the page, we get: The debugger stopped at the instruction causing the exception; moreover, in the Scope pane on the right we can see that the actual type of $v is (ref $Pair) , so it's immediately clear what's going on. I've found this capability extremely valuable when writing (or emitting from a compiler) non-trivial chunks of WASM code using gc types and instructions. "Should I use a debugger or just printfs" is a common topic of debate among programmers. While I'm usually in the "printf debugging" camp, I'm not dogmatic, and will certainly reach for a debugger when the situation calls for it. Specifically, when investigating reference exceptions in WASM, two strong factors tilt the decision towards using a debugger: In general, WASM's printf capabilities aren't great. We can import print-like functions from the host (and - in fact - our sample does just that), but they're not very flexible and dealing with strings in WASM is painful in general. This is compounded even more when working with gc types, because these aren't even visible to the host (they're opaque references). If we want to do printf debugging of gc values, we have to build a lot of scaffolding first. Exception debugging - in general - is much easier with a supportive debugger in hand. Our ref.cast exception from the example above could have happened anywhere in the code. Imagine having to debug a very large WASM program (emitted by a compiler) to find the source of a failed ref.cast ; the debugger takes you right to the spot! In fact, even for C programming, I've always found gdb most useful for pinpointing the source of segmentation faults and similar crashes. In general, WASM's printf capabilities aren't great. We can import print-like functions from the host (and - in fact - our sample does just that), but they're not very flexible and dealing with strings in WASM is painful in general. This is compounded even more when working with gc types, because these aren't even visible to the host (they're opaque references). If we want to do printf debugging of gc values, we have to build a lot of scaffolding first. Exception debugging - in general - is much easier with a supportive debugger in hand. Our ref.cast exception from the example above could have happened anywhere in the code. Imagine having to debug a very large WASM program (emitted by a compiler) to find the source of a failed ref.cast ; the debugger takes you right to the spot! In fact, even for C programming, I've always found gdb most useful for pinpointing the source of segmentation faults and similar crashes.

0 views
Simon Willison 1 months ago

Anthropic's Project Glasswing - restricting Claude Mythos to security researchers - sounds necessary to me

Anthropic didn't release their latest model, Claude Mythos ( system card PDF ), today. They have instead made it available to a very restricted set of preview partners under their newly announced Project Glasswing . The model is a general purpose model, similar to Claude Opus 4.6, but Anthropic claim that its cyber-security research abilities are strong enough that they need to give the software industry as a whole time to prepare. Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser . Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. Project Glasswing partners will receive access to Claude Mythos Preview to find and fix vulnerabilities or weaknesses in their foundational systems—systems that represent a very large portion of the world’s shared cyberattack surface. We anticipate this work will focus on tasks like local vulnerability detection, black box testing of binaries, securing endpoints, and penetration testing of systems. There's a great deal more technical detail in Assessing Claude Mythos Preview’s cybersecurity capabilities on the Anthropic Red Team blog: In one case, Mythos Preview wrote a web browser exploit that chained together four vulnerabilities, writing a complex  JIT heap spray  that escaped both renderer and OS sandboxes. It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses. And it autonomously wrote a remote code execution exploit on FreeBSD's NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain over multiple packets. Plus this comparison with Claude 4.6 Opus: Our internal evaluations showed that Opus 4.6 generally had a near-0% success rate at autonomous exploit development. But Mythos Preview is in a different league. For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla’s Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more. Saying "our model is too dangerous to release" is a great way to build buzz around a new model, but in this case I expect their caution is warranted. Just a few days ( last Friday ) ago I started a new ai-security-research tag on this blog to acknowledge an uptick in credible security professionals pulling the alarm on how good modern LLMs have got at vulnerability research. Greg Kroah-Hartman of the Linux kernel: Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us. Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real. Daniel Stenberg of : The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good. I'm spending hours per day on this now. It's intense. And Thomas Ptacek published Vulnerability Research Is Cooked , a post inspired by his podcast conversation with Anthropic's Nicholas Carlini. Anthropic have a 5 minute talking heads video describing the Glasswing project. Nicholas Carlini appears as one of those talking heads, where he said (highlights mine): It has the ability to chain together vulnerabilities. So what this means is you find two vulnerabilities, either of which doesn't really get you very much independently. But this model is able to create exploits out of three, four, or sometimes five vulnerabilities that in sequence give you some kind of very sophisticated end outcome. [...] I've found more bugs in the last couple of weeks than I found in the rest of my life combined . We've used the model to scan a bunch of open source code, and the thing that we went for first was operating systems, because this is the code that underlies the entire internet infrastructure. For OpenBSD, we found a bug that's been present for 27 years, where I can send a couple of pieces of data to any OpenBSD server and crash it . On Linux, we found a number of vulnerabilities where as a user with no permissions, I can elevate myself to the administrator by just running some binary on my machine. For each of these bugs, we told the maintainers who actually run the software about them, and they went and fixed them and have deployed the patches patches so that anyone who runs the software is no longer vulnerable to these attacks. I found this on the OpenBSD 7.8 errata page : 025: RELIABILITY FIX: March 25, 2026 All architectures TCP packets with invalid SACK options could crash the kernel. A source code patch exists which remedies this problem. I tracked that change down in the GitHub mirror of the OpenBSD CVS repo (apparently they still use CVS!) and found it using git blame : Sure enough, the surrounding code is from 27 years ago. I'm not sure which Linux vulnerability Nicholas was describing, but it may have been this NFS one recently covered by Michael Lynch . There's enough smoke here that I believe there's a fire. It's not surprising to find vulnerabilities in decades-old software, especially given that they're mostly written in C, but what's new is that coding agents run by the latest frontier LLMs are proving tirelessly capable at digging up these issues. I actually thought to myself on Friday that this sounded like an industry-wide reckoning in the making, and that it might warrant a huge investment of time and money to get ahead of the inevitable barrage of vulnerabilities. Project Glasswing incorporates "$100M in usage credits ... as well as $4M in direct donations to open-source security organizations". Partners include AWS, Apple, Microsoft, Google, and the Linux Foundation. It would be great to see OpenAI involved as well - GPT-5.4 already has a strong reputation for finding security vulnerabilities and they have stronger models on the near horizon. The bad news for those of us who are not trusted partners is this: We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale—for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring. To do so, we need to make progress in developing cybersecurity (and other) safeguards that detect and block the model’s most dangerous outputs. We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview. I can live with that. I think the security risks really are credible here, and having extra time for trusted teams to get ahead of them is a reasonable trade-off. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
devansh 1 months ago

On LLMs and Vulnerability Research

I have been meaning to write this for six months. The landscape kept shifting. It has now shifted enough to say something definitive. I work at the intersection of vulnerability triage. I see, every day, how this landscape is changing. These views are personal and do not represent my employer. Take them with appropriate salt. Two things happened in quick succession. Frontier models got dramatically better (Opus 4.6, GPT 5.4). Agentic toolkits (Claude Code, Codex, OpenCode) gave those models hands. The combination produces solid vulnerability research. "LLMs are next-token predictors." This framing was always reductive. It is now actively misleading. The gap between what these models theoretically do (predict the next word) and what they actually do (reason about concurrent thread execution in kernel code to identify use-after-free conditions) has grown too wide for the old frame to hold. Three mechanisms explain why. Implicit structural understanding. Tokenizers know nothing about code. Byte Pair Encoding treats , , and as frequent byte sequences, not syntactic constructs. But the transformer layers above tell a different story. Through training on massive code corpora, attention heads specialise: some track variable identity and provenance, others develop bias toward control flow tokens. The model converges on internal representations that capture semantic properties of code, something functionally equivalent to an abstract syntax tree, built implicitly, never formally. Neural taint analysis. The most security-relevant emergent capability. The model learns associations between sources of untrusted input (user-controlled data, network input, file reads) and dangerous sinks (system calls, SQL queries, memory operations). When it identifies a path from source to sink without adequate sanitisation, it flags a vulnerability. This is not formal taint analysis. No dataflow graph as well. It is a statistical approximation. But it works well for intra-procedural bugs where the source-to-sink path is short, and degrades as distance increases across functions, files, and abstraction layers. Test-time reasoning. The most consequential advance. Standard inference is a single forward pass: reactive, fast, fundamentally limited. Reasoning models (o-series, extended thinking, DeepSeek R1) break this constraint by generating internal reasoning tokens, a scratchpad where the model works through a problem step by step before answering. The model traces execution paths, tracks variable values, evaluates branch conditions. Symbolic execution in natural language. Less precise than formal tools but capable of handling what they choke on: complex pointer arithmetic, dynamic dispatch, deeply nested callbacks. It self-verifies, generating a hypothesis ("the lock isn't held across this path"), then testing it ("wait, is there a lock acquisition I missed?"). It backtracks when reasoning hits dead ends. DeepSeek R1 showed these behaviours emerge from pure reinforcement learning with correctness-based rewards. Nobody taught the model to check its own work. It discovered that verification produces better answers. The model is not generating the most probable next token. It is spending variable compute to solve a specific problem. Three advances compound on each other. Mixture of Experts. Every frontier model now uses MoE. A model might contain 400 billion parameters but activate only 17 billion per token. Vastly more encoded knowledge about code patterns, API behaviours, and vulnerability classes without proportional inference cost. Million-token context. In 2023, analysing a codebase required chunking code into a vector database, retrieving fragments via similarity search, and feeding them to the model. RAG is inherently lossy: code split at arbitrary boundaries, cross-file relationships destroyed, critical context discarded. For vulnerability analysis, where understanding cross-module data flow is the entire point, this information loss is devastating. At one million tokens, you fit an entire mid-size codebase in a single prompt. The model traces user input from an HTTP handler through three middleware layers into a database query builder and spots a sanitisation gap on line 4,200 exploitable via the endpoint on line 890. No chunking. No retrieval. No information loss. Reinforcement-learned reasoning. Earlier models trained purely on next-token prediction. Modern frontier models add an RL phase: generate reasoning chains, reward correctness of the final answer rather than plausibility of text. Over millions of iterations, this shapes reasoning to produce correct analyses rather than plausible-sounding ones. The strategies transfer across domains. A model that learned to verify mathematical reasoning applies the same verification to code. A persistent belief: truly "novel" vulnerability classes exist, bugs so unprecedented that only human genius could discover them. Comforting. Also wrong. Decompose the bugs held up as examples. HTTP request smuggling: the insight that a proxy and backend might disagree about where one request ends and another begins feels like a creative leap. But the actual bug is the intersection of known primitives: ambiguous protocol specification, inconsistent parsing between components, a security-critical assumption about message boundaries. None novel individually. The "novelty" was in combining them. Prototype pollution RCEs in JavaScript frameworks. Exotic until you realise it is dynamic property assignment in a prototype-based language, unsanitised input reaching object modification, and a rendering pipeline evaluating modified objects in a privileged context. Injection, type confusion, privilege boundary crossing. Taxonomy staples for decades. The pattern holds universally. "Novel" vulnerabilities decompose into compositions of known primitives: spec ambiguities, type confusions, missing boundary checks, TOCTOU gaps, trust boundary violations. The novelty is in the composition, not the components. This is precisely what frontier LLMs are increasingly good at. A model that understands protocol ambiguity, inconsistent component behaviour, and security boundary assumptions has all the ingredients to hypothesise a request-smuggling-class vulnerability when pointed at a reverse proxy codebase. It does not need to have seen that exact bug class. It needs to recognise that the conditions for parser disagreement exist and that parser disagreement at a trust boundary has security implications. Compositional reasoning over known primitives. Exactly what test-time reasoning enables. LLMs will not discover the next Spectre tomorrow. Microarchitectural side channels in CPU pipelines are largely absent from code-level training data. But the space of "LLM-inaccessible" vulnerabilities is smaller than the security community assumes, and it shrinks with every model generation. Most of what we call novel vulnerability research is creative recombination within a known search space. That is what these models do best. Effective AI vulnerability research = good scaffolding + adequate tokens. Scaffolding (harness design, prompt engineering, problem framing) is wildly underestimated. Claude Code and Codex are general-purpose coding environments, not optimised for vulnerability research. A purpose-built harness provides threat models, defines trust boundaries, highlights historical vulnerability patterns in the specific technology stack, and constrains search to security-relevant code paths. The operator designing that context determines whether the model spends its reasoning budget wisely or wastes it on dead ends. Two researchers, same model, same codebase, dramatically different results. Token quality beats token quantity. A thousand reasoning tokens on the right code path with the right threat model outperform a million tokens sprayed across a repo with "find vulnerabilities." The search space is effectively infinite. You cannot brute-force it. You narrow it with human intelligence encoded as context, directing machine intelligence toward where bugs actually live. "LLMs are non-deterministic, so you can't trust their findings." Sounds devastating. Almost entirely irrelevant. It confuses the properties of the tool with the properties of the target. The bugs are deterministic. They are in the code. A buffer overflow on line 847 is still there whether the model notices it on attempt one or attempt five. Non-determinism in the search process does not make the search less valid. It makes it more thorough under repetition. Each run samples a different trajectory through the hypothesis space. The union of multiple runs covers more search space than any single run. Conceptually identical to fuzzing. Nobody says "fuzzers are non-deterministic so we can't trust them." You run the fuzzer longer, cover more input space, find more bugs. Same principle. Non-determinism under repetition becomes coverage. In 2023 and 2024, the state of the art was architecture. Multi-agent systems, RAG pipelines, tool integration with SMT solvers and fuzzers and static analysis engines. The best orchestration won. That era is ending. A frontier model ingests a million tokens of code in a single prompt. Your RAG pipeline is not an advantage when the model without RAG sees the whole codebase while your pipeline shows fragments selected by retrieval that does not know what is security-relevant. A reasoning model spends thousands of tokens tracing execution paths and verifying hypotheses. Your external solver integration is not a differentiator when the model approximates what the solver does with contextual understanding the solver lacks. Agentic toolkits handle orchestration better than your custom tooling. The implication the security industry has not fully processed: vulnerability research is being democratised. When finding a memory safety bug in a C library required a Project Zero-calibre researcher with years of experience, the supply was measured in hundreds worldwide. When it requires a well-prompted API call, the supply is effectively unlimited. What replaces architecture as the competitive advantage? Two things. Domain expertise encoded as context. Not "find bugs in this code" but "this is a TLS implementation; here are three classes of timing side-channel that have affected similar implementations; analyse whether the constant-time guarantees hold across these specific code paths." The human provides the insight. The model does the grunt work. Access to compute. Test-time reasoning scales with inference compute. More tokens means deeper analysis, more self-verification, more backtracking. Teams that let a model spend ten minutes on a complex code path will find bugs that teams limited to five-second responses will miss. The end state: vulnerability discovery for known bug classes becomes a commodity, available to anyone with API access and a credit card. The researchers who thrive will focus where the model cannot: novel vulnerability classes, application-level logic flaws, architectural security review, adversarial creativity. This is not a prediction. It is already happening. The pace is set by model capability, which doubles on a timeline measured in months. Beyond next-token prediction Implicit structural understanding Neural taint analysis Test-time reasoning The architecture that enabled this Mixture of Experts Million-token context Reinforcement-learned reasoning The myth of novel vulnerabilities Scaffolding and tokens Non-determinism is a feature Orchestration is no longer your moat

0 views
Anton Zhiyanov 1 months ago

Porting Go's strings package to C

Creating a subset of Go that translates to C was never my end goal. I liked writing C code with Go, but without the standard library it felt pretty limited. So, the next logical step was to port Go's stdlib to C. Of course, this isn't something I could do all at once. I started with the io package , which provides core abstractions like and , as well as general-purpose functions like . But isn't very interesting on its own, since it doesn't include specific reader or writer implementations. So my next choices were naturally and — the workhorses of almost every Go program. This post is about how the porting process went. Bits and UTF-8 • Bytes • Allocators • Buffers and builders • Benchmarks • Optimizing search • Optimizing builder • Wrapping up Before I could start porting , I had to deal with its dependencies first: Both of these packages are made up of pure functions, so they were pretty easy to port. The only minor challenge was the difference in operator precedence between Go and C — specifically, bit shifts ( , ). In Go, bit shifts have higher precedence than addition and subtraction. In C, they have lower precedence: The simplest solution was to just use parentheses everywhere shifts are involved: With and done, I moved on to . The package provides functions for working with byte slices: Some of them were easy to port, like . Here's how it looks in Go: And here's the C version: Just like in Go, the ( → ) macro doesn't allocate memory; it just reinterprets the byte slice's underlying storage as a string. The function (which works like in Go) is easy to implement using from the libc API. Another example is the function, which looks for a specific byte in a slice. Here's the pure-Go implementation: And here's the C version: I used a regular C loop to mimic Go's : But and don't allocate memory. What should I do with , since it clearly does? I had a decision to make. The Go runtime handles memory allocation and deallocation automatically. In C, I had a few options: An allocator is a tool that reserves memory (typically on the heap) so a program can store its data structures there. See Allocators from C to Zig if you want to learn more about them. For me, the winner was clear. Modern systems programming languages like Zig and Odin clearly showed the value of allocators: An is an interface with three methods: , , and . In C, it translates to a struct with function pointers: As I mentioned in the post about porting the io package , this interface representation isn't as efficient as using a static method table, but it's simpler. If you're interested in other options, check out the post on interfaces . By convention, if a function allocates memory, it takes an allocator as its first parameter. So Go's : Translates to this C code: If the caller doesn't care about using a specific allocator, they can just pass an empty allocator, and the implementation will use the system allocator — , , and from libc. Here's a simplified version of the system allocator (I removed safety checks to make it easier to read): The system allocator is stateless, so it's safe to have a global instance: Here's an example of how to call with an allocator: Way better than hidden allocations! Besides pure functions, and also provide types like , , and . I ported them using the same approach as with functions. For types that allocate memory, like , the allocator becomes a struct field: The code is pretty wordy — most C developers would dislike using instead of something shorter like . My solution to this problem is to automatically translate Go code to C (which is actually what I do when porting Go's stdlib). If you're interested, check out the post about this approach — Solod: Go can be a better C . Types that don't allocate, like , need no special treatment — they translate directly to C structs without an allocator field. The package is the twin of , so porting it was uneventful. Here's usage example in Go and C side by side: Again, the C code is just a more verbose version of Go's implementation, plus explicit memory allocation. What's the point of writing C code if it's slow, right? I decided it was time to benchmark the ported C types and functions against their Go versions. To do that, I ported the benchmarking part of Go's package. Surprisingly, the simplified version was only 300 lines long and included everything I needed: Here's a sample benchmark for the type: Reads almost like Go's benchmarks. To monitor memory usage, I created — a memory allocator that wraps another allocator and keeps track of allocations: The benchmark gets an allocator through the function and wraps it in a to keep track of allocations: There's no auto-discovery, but the manual setup is quite straightforward. With the benchmarking setup ready, I ran benchmarks on the package. Some functions did well — about 1.5-2x faster than their Go equivalents: But (searching for a substring in a string) was a total disaster — it was nearly 20 times slower than in Go: The problem was caused by the function we looked at earlier: This "pure" Go implementation is just a fallback. On most platforms, Go uses a specialized version of written in assembly. For the C version, the easiest solution was to use , which is also optimized for most platforms: With this fix, the benchmark results changed drastically: Still not quite as fast as Go, but it's close. Honestly, I don't know why the -based implementation is still slower than Go's assembly here, but I decided not to pursue it any further. After running the rest of the function benchmarks, the ported versions won all of them except for two: Benchmarking details is a common way to compose strings from parts in Go, so I tested its performance too. The results were worse than I expected: Here, the C version performed about the same as Go, but I expected it to be faster. Unlike , is written entirely in Go, so there's no reason the ported version should lose in this benchmark. The method looked almost identical in Go and C: Go's automatically grows the backing slice, while does it manually ( , on the contrary, doesn't grow the slice — it's merely a wrapper). So, there shouldn't be any difference. I had to investigate. Looking at the compiled binary, I noticed a difference in how the functions returned results. Go returns multiple values in separate registers, so uses three registers: one for 8-byte , two for the interface (implemented as two 8-byte pointers). But in C, was a single struct made up of two unions and a pointer: Of course, this 56-byte monster can't be returned in registers — the C calling convention passes it through memory instead. Since is on the hot path in the benchmark, I figured this had to be the issue. So I switched from a single monolithic type to signature-specific types for multi-return pairs: Now, the implementation in C looked like this: is only 16 bytes — small enough to be returned in two registers. Problem solved! But it wasn't — the benchmark only showed a slight improvement. After looking into it more, I finally found the real issue: unlike Go, the C compiler wasn't inlining calls. Adding and moving to the header file made all the difference: 2-4x faster. That's what I was hoping for! Porting and was a mix of easy parts and interesting challenges. The pure functions were straightforward — just translate the syntax and pay attention to operator precedence. The real design challenge was memory management. Using allocators turned out to be a good solution, making memory allocation clear and explicit without being too difficult to use. The benchmarks showed that the C versions outperformed Go in most cases, sometimes by 2-4x. The only exceptions were and , where Go relies on hand-written assembly. The optimization was an interesting challenge: what seemed like a return-type issue was actually an inlining problem, and fixing it gave a nice speed boost. There's a lot more of Go's stdlib to port. In the next post, we'll cover — a very unique Go package. In the meantime, if you'd like to write Go that translates to C — with no runtime and manual memory management — I invite you to try Solod . The and packages are included, of course. implements bit counting and manipulation functions. implements functions for UTF-8 encoded text. Loop over the slice indexes with ( is a macro that returns , similar to Go's built-in). Access the i-th byte with (a bounds-checking macro that returns ). Use a reliable garbage collector like Boehm GC to closely match Go's behavior. Allocate memory with libc's and have the caller free it later with . Introduce allocators. It's obvious whether a function allocates memory or not: if it has an allocator as a parameter, it allocates. It's easy to use different allocation methods: you can use for one function, an arena for another, and a stack allocator for a third. It helps with testing and debugging: you can use a tracking allocator to find memory leaks, or a failing allocator to test error handling. Figuring out how many iterations to run. Running the benchmark function in a loop. Recording metrics (ns/op, MB/s, B/op, allocs/op). Reporting the results.

0 views
Maurycy 2 months ago

GopherTree

While gopher is usually seen as a proto-web, it's really closer to FTP. It has no markup format, no links and no URLs. Files are arranged in a hierarchically, and can be in any format. This rigid structure allows clients to get creative with how it's displayed ... which is why I'm extremely disappointed that everyone renders gopher menus like shitty websites: You see all that text mixed into the menu? Those are informational selectors: a non-standard feature that's often used to recreate hypertext. I know this "limited web" aesthetic appeals to certain circles, but it removes the things that make the protocol interesting. It would be nice to display gopher menus like what they are, a directory tree : This makes it easy to browse collections of files, and help avoid the Wikipedia problem: Absentmindedly clicking links until you realize it's 3 AM and you have a thousand tabs open... and that you never finished what you wanted to read in the first place. I've made the decision to hide informational selectors by default . These have two main uses: creating faux hypertext and adding ASCII art banners. ASCII art banners are simply annoying: Having one in each menu looks cute in a web browser, but having 50 copies cluttering up the directory tree is... not great. Hypertext doesn't work well. In the strict sense, looking ugly is better then not working at all — but almost everyone who does this also hosts on the web, so it's not a huge loss. The client also has a built in text viewer , with pagination and proper word-wrap. It supports both UTF-8 and Latin-1 text encodings, but this has to be selected manually: gopher has no mechanism to indicate encoding. (but most text looks the same in both) Bookmarks work by writing items to a locally stored gopher menu, which also serves as a "homepage" of sorts. Because it's just a file, I didn't bother implementing any advanced editing features: any text editor works fine for that. The bookmark code is UNIX/Linux specific, but porting should be possible. All this fits within a thousand lines of C code , the same as my ultra-minimal web browser. While arguably a browser, it was practically unusable: lacking basic features like a back button or pagination. The gopher version of the same size is complete enough to replace Lynx as my preferred client. Usage instructions can be found at the top of the source file. /projects/gopher/gophertree.c : Source and instructions /projects/tinyweb/ : 1000 line web browser https://datatracker.ietf.org/doc/html/rfc1436 : Gopher RFC

0 views
Maurycy 2 months ago

My ramblings are available over gopher

It has recently come to my attention that people need a thousand lines of C code to read my website. This is unacceptable. For simpler clients, my server supports gopher: The response is just a text file: it has no markup, no links and no embedded content. For navigation, gopher uses specially formatted directory-style menus: The first character on a line indicates the type of the linked resource: The type is followed by a tab-separated list containing a display name, file path, hostname and port. Lines beginning with an "i" are purely informational and do not link to anything. (This is non-standard, but widely used) Storing metadata in links is weird to modern sensibilities , but it keeps the protocol simple. Menus are the only thing that the client has to understand: there's no URLs, no headers, no mime types — the only thing sent to the server is the selector (file path), and the only thing received is the file. ... as a bonus, this one liner can download files: That's quite clunky , but there are lots of programs that support it. If you have Lynx installed, you should be able to just point it at this URL: ... although you will want to put in because it's not 1991 anymore [Citation Needed] I could use informational lines to replicate the webs navigation by making everything a menu — but that would be against the spirit of the thing: gopher is document retrieval protocol, not a hypertext format. Instead, I converted all my blog posts in plain text and set up some directory-style navigation. I've actually been moving away from using inline links anyways because they have two opposing design goals: While reading, links must be normal text. When you're done, links must be distinct clickable elements. I've never been able to find a good compromise: Links are always either distracting to the reader, annoying to find/click, or both. Also, to preempt all the emails : ... what about Gemini? (The protocol, not the autocomplete from google.) Gemini is the popular option for non-web publishing... but honestly, it feels like someone took HTTP and slapped markdown on top of it. This is a Gemini request... ... and this is an HTTP request: For both protocols, the server responds with metadata followed by hypertext. It's true that HTTP is more verbose, but 16 extra bytes doesn't create a noticeable difference. Unlike gopher, which has a unique navigation model and is of historical interest , Gemini is just the web but with limited features... so what's the point? I can already write websites that don't have ads or autoplaying videos, and you can already use browsers that don't support features you don't like. After stripping away all the fluff (CSS, JS, etc) the web is quite simple: a functional browser can be put together in a weekend. ... and unlike gemini, doing so won't throw out 35 years of compatibility: Someone with Chrome can read a barebones website, and someone with Lynx can read normal sites. Gemini is a technical solution to an emotional problem . Most people have a bad taste for HTTP due to the experience of visiting a commercial website. Gemini is the obvious choice for someone looking for "the web but without VC types". It doesn't make any sense when I'm looking for an interesting (and humor­ously outdated) protocol. /projects/tinyweb/ : A browser in 1000 lines of C ... /about.html#links : ... and thoughts on links for navigation. https://www.rfc-editor.org/rfc/rfc1436.html : Gopher RFC https://lynx.invisible-island.net/ : Feature complete text-based web browser

0 views
Max Bernstein 2 months ago

Using Perfetto in ZJIT

Originally published on Rails At Scale . Look! A trace of slow events in a benchmark! Hover over the image to see it get bigger. Now read on to see what the slow events are and how we got this pretty picture. The first rule of just-in-time compilers is: you stay in JIT code. The second rule of JIT is: you STAY in JIT code! When control leaves the compiled code to run in the interpreter—what the ZJIT team calls either a “side-exit” or a “deopt”, depending on who you talk to—things slow down. In a well-tuned system, this should happen pretty rarely. Right now, because we’re still bringing up the compiler and runtime system, it happens more than we would like. We’re reducing the number of exits over time. We can track our side-exit reduction progress with , which, on process exit, prints out a tidy summary of the counters for all of the bad stuff we track. It’s got side-exits. It’s got calls to C code. It’s got calls to slow-path runtime helpers. It’s got everything. Here is a chopped-up sample of stats output for the Lobsters benchmark, which is a large Rails app: (I’ve cut out significant chunks of the stats output and replaced them with because it’s overwhelming the first time you see it.) The first thing you might note is that the thing I just described as terrible for performance is happening over twelve million times . The second thing you might notice is that despite this, we’re staying in JIT code seemingly a high percentage of the time. Or are we? Is 80% high? Is a 4.5% class guard miss ratio high? What about 11% for shapes? It’s hard to say. The counters are great because they’re quick and they’re reasonably stable proxies for performance. There’s no substitute for painstaking measurements on a quiet machine but if the counter for Bad Slow Thing goes down (and others do not go up), we’re probably doing a good job. But they’re not great for building intuition. For intuition, we want more tangible feeling numbers. We want to see things. The third thing is that you might ask yourself “self, where are these exits coming from?” Unfortunately, counters cannot tell you that. For that, we want stack traces. This lets us know where in the guest (Ruby) code triggers an exit. Ideally also we would want some notion of time: we would want to know not just where these events happen but also when. Are the exits happening early, at application boot? At warmup? Even during what should be steady state application time? Hard to say. So we need more tools. Thankfully, Perfetto exists. Perfetto is a system for visualizing and analyzing traces and profiles that your application generates. It has both a web UI and a command-line UI. We can emit traces for Perfetto and visualize them there. Take a look at this sample ZJIT Perfetto trace generated by running Ruby with 1 . What do you see? I see a couple arrows on the left. Arrows indicate “instant” point-in-time events. Then I see a mess of purple to the right of that until the end of the trace. Hover over an arrow. Find out that each arrow is a side-exit. Scream silently. But it’s a friendly arrow. It tells you what the side-exit reason is. If you click it, it even tells you the stack trace in the pop-up panel on the bottom. If we click a couple of them, maybe we can learn more. We can also zoom by mousing over the track, holding Ctrl, and scrolling. That will get us look closer. But there are so many… Fortunately, Perfetto also provides a SQL interface to the traces. We can write a query to aggregate all of the side exit events from the table and line them up with the topmost method from the backtrace arguments in the table: This pulls up a query box at the bottom showing us that there are a couple big hotspots: It even has a helpful option to export the results Markdown table so I can paste (an edited version) into this blog post: Looks like we should figure out why we’re having shape misses so much and that will clear up a lot of exits. (Hint: it’s because once we make our first guess about what we think the object shape will be, we don’t re-assess… yet .) This has been a taste of Perfetto. There’s probably a lot more to explore. Please join the ZJIT Zulip and let us know if you have any cool tracing or exploring tricks. Now I’ll explain how you too can use Perfetto from your system. Adding support to ZJIT was pretty straightforward. The first thing is that you’ll need some way to get trace data out of your system. We write to a file with a well-known location ( ), but you could do any number of things. Perhaps you can stream events over a socket to another process, or to a server that aggregates them, or store them internally and expose a webserver that serves them over the internet, or… anything, really. Once you have that, you need a couple lines of code to emit the data. Perfetto accepts a number of formats. For example, in his excellent blog post , Tristan Hume opens with such a simple snippet of code for logging Chromium Trace JSON-formatted events (lightly modified by me): This snippet is great. It shows, end-to-end, writing a stream of one event. It is a complete (X) event, as opposed to either: It was enough to get me started. Since it’s JSON, and we have a lot of side exits, the trace quickly ballooned to 8GB large for a several second benchmark. Not great. Now, part of this is our fault—we should side exit less—and part of it is just the verbosity of JSON. Thankfully, Perfetto ingests more compact binary formats, such as the Fuchsia trace format . In addition to being more compact, FXT even supports string interning. After modifying the tracer to emit FXT, we ended with closer to 100MB for the same benchmark. We can reduce further by sampling —not writing every exit to the trace, but instead every K exits (for some (probably prime) K). This is why we provide the option. Check out the trace writer implementation from the point this article was written. We could trace: Visualizations are awesome. Get your data in the right format so you can ask the right questions easily. Thanks for Perfetto! Also, looks like visualizations are now available in Perfetto canary. Time to go make some fun histograms… This is also sampled/strobed, so not every exit is in there. This is just 1/K of them for some K that I don’t remember.  ↩ two discrete timestamped begin (B) and end (E) events that book-end something, or an instant (i) event that has no duration, or a couple other event types in the Chromium Trace Event Format doc When methods get compiled How big the generated code is How long each compile phase takes When (and where) invalidation events happen When (and where) allocations happen from JITed code Garbage collection events This is also sampled/strobed, so not every exit is in there. This is just 1/K of them for some K that I don’t remember.  ↩

0 views
Anton Zhiyanov 2 months ago

Porting Go's io package to C

Creating a subset of Go that translates to C was never my end goal. I liked writing C code with Go, but without the standard library it felt pretty limited. So, the next logical step was to port Go's stdlib to C. Of course, this isn't something I could do all at once. So I started with the standard library packages that had the fewest dependencies, and one of them was the package. This post is about how that went. io package • Slices • Multiple returns • Errors • Interfaces • Type assertion • Specialized readers • Copy • Wrapping up is one of the core Go packages. It introduces the concepts of readers and writers , which are also common in other programming languages. In Go, a reader is anything that can read some raw data (bytes) from a source into a slice: A writer is anything that can take some raw data from a slice and write it to a destination: The package defines many other interfaces, like and , as well as combinations like and . It also provides several functions, the most well-known being , which copies all data from a source (represented by a reader) to a destination (represented by a writer): C, of course, doesn't have interfaces. But before I get into that, I had to make several other design decisions. In general, a slice is a linear container that holds N elements of type T. Typically, a slice is a view of some underlying data. In Go, a slice consists of a pointer to a block of allocated memory, a length (the number of elements in the slice), and a capacity (the total number of elements that can fit in the backing memory before the runtime needs to re-allocate): Interfaces in the package work with fixed-length slices (readers and writers should never append to a slice), and they only use byte slices. So, the simplest way to represent this in C could be: But since I needed a general-purpose slice type, I decided to do it the Go way instead: Plus a bound-checking helper to access slice elements: Usage example: So far, so good. Let's look at the method again: It returns two values: an and an . C functions can only return one value, so I needed to figure out how to handle this. The classic approach would be to pass output parameters by pointer, like or . But that doesn't compose well and looks nothing like Go. Instead, I went with a result struct: The union can store any primitive type, as well as strings, slices, and pointers. The type combines a value with an error. So, our method (let's assume it's just a regular function for now): Translates to: And the caller can access the result like this: For the error type itself, I went with a simple pointer to an immutable string: Plus a constructor macro: I wanted to avoid heap allocations as much as possible, so decided not to support dynamic errors. Only sentinel errors are used, and they're defined at the file level like this: Errors are compared by pointer identity ( ), not by string content — just like sentinel errors in Go. A error is a pointer. This keeps error handling cheap and straightforward. This was the big one. In Go, an interface is a type that specifies a set of methods. Any concrete type that implements those methods satisfies the interface — no explicit declaration needed. In C, there's no such mechanism. For interfaces, I decided to use "fat" structs with function pointers. That way, Go's : Becomes an struct in C: The pointer holds the concrete value, and each method becomes a function pointer that takes as its first argument. This is less efficient than using a static method table, especially if the interface has a lot of methods, but it's simpler. So I decided it was good enough for the first version. Now functions can work with interfaces without knowing the specific implementation: Calling a method on the interface just goes through the function pointer: Go's interface is more than just a value wrapper with a method table. It also stores type information about the value it holds: Since the runtime knows the exact type inside the interface, it can try to "upgrade" the interface (for example, a regular ) to another interface (like ) using a type assertion : The last thing I wanted to do was reinvent Go's dynamic type system in C, so dropping this feature was an easy decision. There's another kind of type assertion, though — when we unwrap the interface to get the value of a specific type: And this kind of assertion is quite possible in C. All we have to do is compare function pointers: If two different types happened to share the same method implementation, this would break. In practice, each concrete type has its own methods, so the function pointer serves as a reliable type tag. After I decided on the interface approach, porting the actual types was pretty easy. For example, wraps a reader and stops with EOF after reading N bytes: The logic is straightforward: if there are no bytes left, return EOF. Otherwise, if the buffer is bigger than the remaining size, shorten it. Then, call the underlying reader, and decrease the remaining size. Here's what the ported C code looks like: A bit more verbose, but nothing special. The multiple return values, the interface call with , and the slice handling are all implemented as described in previous sections. is where everything comes together. Here's the simplified Go version: In Go, allocates its buffer on the heap with . I could take a similar approach in C — make take an allocator and use it to create the buffer like this: But since this is just a temporary buffer that only exists during the function call, I decided stack allocation was a better choice: allocates memory on a stack with a bounds-checking macro that wraps C's . It moves the stack pointer and gives you a chunk of memory that's automatically freed when the function returns. People often avoid using because it can cause a stack overflow, but using a bounds-checking wrapper fixes this issue. Another common concern with is that it's not block-scoped — the memory stays allocated until the function exits. However, since we only allocate once, this isn't a problem. Here's the simplified C version of : Here, you can see all the parts from this post working together: a function accepting interfaces, slices passed to interface methods, a result type wrapping multiple return values, error sentinels compared by identity, and a stack-allocated buffer used for the copy. Porting Go's package to C meant solving a few problems: representing slices, handling multiple return values, modeling errors, and implementing interfaces using function pointers. None of this needed anything fancy — just structs, unions, functions, and some macros. The resulting C code is more verbose than Go, but it's structurally similar, easy enough to read, and this approach should work well for other Go packages too. The package isn't very useful on its own — it mainly defines interfaces and doesn't provide concrete implementations. So, the next two packages to port were naturally and — I'll talk about those in the next post. In the meantime, if you'd like to write Go that translates to C — with no runtime and manual memory management — I invite you to try Solod . The package is included, of course.

0 views
Anton Zhiyanov 2 months ago

Solod: Go can be a better C

I'm working on a new programming language named Solod ( So ). It's a strict subset of Go that translates to C, without hidden memory allocations and with source-level interop. Highlights: So supports structs, methods, interfaces, slices, multiple returns, and defer. To keep things simple, there are no channels, goroutines, closures, or generics. So is for systems programming in C, but with Go's syntax, type safety, and tooling. Hello world • Language tour • Compatibility • Design decisions • FAQ • Final thoughts This Go code in a file : Translates to a header file : Plus an implementation file : In terms of features, So is an intersection between Go and C, making it one of the simplest C-like languages out there — on par with Hare. And since So is a strict subset of Go, you already know it if you know Go. It's pretty handy if you don't want to learn another syntax. Let's briefly go over the language features and see how they translate to C. Variables • Strings • Arrays • Slices • Maps • If/else and for • Functions • Multiple returns • Structs • Methods • Interfaces • Enums • Errors • Defer • C interop • Packages So supports basic Go types and variable declarations: is translated to ( ), to ( ), and to ( ). is not treated as an interface. Instead, it's translated to . This makes handling pointers much easier and removes the need for . is translated to (for pointer types). Strings are represented as type in C: All standard string operations are supported, including indexing, slicing, and iterating with a for-range loop. Converting a string to a byte slice and back is a zero-copy operation: Converting a string to a rune slice and back allocates on the stack with : There's a stdlib package for heap-allocated strings and various string operations. Arrays are represented as plain C arrays ( ): on arrays is emitted as compile-time constant. Slicing an array produces a . Slices are represented as type in C: All standard slice operations are supported, including indexing, slicing, and iterating with a for-range loop. As in Go, a slice is a value type. Unlike in Go, a nil slice and an empty slice are the same thing: allocates a fixed amount of memory on the stack ( ). only works up to the initial capacity and panics if it's exceeded. There's no automatic reallocation; use the stdlib package for heap allocation and dynamic arrays. Maps are fixed-size and stack-allocated, backed by parallel key/value arrays with linear search. They are pointer-based reference types, represented as in C. No delete, no resize. Only use maps when you have a small, fixed number of key-value pairs. For anything else, use heap-allocated maps from the package (planned). Most of the standard map operations are supported, including getting/setting values and iterating with a for-range loop: As in Go, a map is a pointer type. A map emits as in C. If-else and for come in all shapes and sizes, just like in Go. Standard if-else with chaining: Init statement (scoped to the if block): Traditional for loop: While-style loop: Range over an integer: Regular functions translate to C naturally: Named function types become typedefs: Exported functions (capitalized) become public C symbols prefixed with the package name ( ). Unexported functions are . Variadic functions use the standard syntax and translate to passing a slice: Function literals (anonymous functions and closures) are not supported. So supports two-value multiple returns in two patterns: and . Both cases translate to C type: Named return values are not supported. Structs translate to C naturally: works with types and values: Methods are defined on struct types with pointer or value receivers: Pointer receivers pass in C and cast to the struct pointer. Value receivers pass the struct by value, so modifications operate on a copy: Calling methods on values and pointers emits pointers or values as necessary: Methods on named primitive types are also supported. Interfaces in So are like Go interfaces, but they don't include runtime type information. Interface declarations list the required methods: In C, an interface is a struct with a pointer and function pointers for each method (less efficient than using a static method table, but simpler; this might change in the future): Just as in Go, a concrete type implements an interface by providing the necessary methods: Passing a concrete type to functions that accept interfaces: Type assertion works for concrete types ( ), but not for interfaces ( ). Type switch is not supported. Empty interfaces ( and ) are translated to . So supports typed constant groups as enums: Each constant is emitted as a C : is supported for integer-typed constants: Iota values are evaluated at compile time and translated to integer literals: Errors use the type (a pointer): So only supports sentinel errors, which are defined at the package level using (implemented as compiler built-in): Errors are compared using . This is an O(1) operation (compares pointers, not strings): Dynamic errors ( ), local error variables ( inside functions), and error wrapping are not supported. schedules a function or method call to run at the end of the enclosing scope. The scope can be either a function (as in Go): Or a bare block (unlike Go): Deferred calls are emitted inline (before returns, panics, and scope end) in LIFO order: Defer is not supported inside other scopes like or . Include a C header file with : Declare an external C type (excluded from emission) with : Declare an external C function (no body or ): When calling extern functions, and arguments are automatically decayed to their C equivalents: string literals become raw C strings ( ), string values become , and slices become raw pointers. This makes interop cleaner: The decay behavior can be turned off with the flag: The package includes helpers for converting C pointers back to So string and slice types. The package is also available and is implemented as compiler built-ins. Each Go package is translated into a single + pair, regardless of how many files it contains. Multiple files in the same package are merged into one file, separated by comments. Exported symbols (capitalized names) are prefixed with the package name: Unexported symbols (lowercase names) keep their original names and are marked : Exported symbols are declared in the file (with for variables). Unexported symbols only appear in the file. Importing a So package translates to a C : Calling imported symbols uses the package prefix: That's it for the language tour! So generates C11 code that relies on several GCC/Clang extensions: You can use GCC, Clang, or to compile the transpiled C code. MSVC is not supported. Supported operating systems: Linux, macOS, and Windows (partial support). So is highly opinionated. Simplicity is key . Fewer features are always better. Every new feature is strongly discouraged by default and should be added only if there are very convincing real-world use cases to support it. This applies to the standard library too — So tries to export as little of Go's stdlib API as possible while still remaining highly useful for real-world use cases. No heap allocations are allowed in language built-ins (like maps, slices, new, or append). Heap allocations are allowed in the standard library, but they must clearly state when an allocation happens and who owns the allocated data. Fast and easy C interop . Even though So uses Go syntax, it's basically C with its own standard library. Calling C from So, and So from C, should always be simple to write and run efficiently. The So standard library (translated to C) should be easy to add to any C project. Readability . There are several languages that claim they can transpile to readable C code. Unfortunately, the C code they generate is usually unreadable or barely readable at best. So isn't perfect in this area either (though it's arguably better than others), but it aims to produce C code that's as readable as possible. Go compatibility . So code is valid Go code. No exceptions. Raw performance . You can definitely write C code by hand that runs faster than code produced by So. Also, some features in So, like interfaces, are currently implemented in a way that's not very efficient, mainly to keep things simple. Hiding C entirely . So is a cleaner way to write C, not a replacement for it. You should know C to use So effectively. Go feature parity . Less is more. Iterators aren't coming, and neither are generic methods. I have heard these several times, so it's worth answering. Why not Rust/Zig/Odin/other language? Because I like C and Go. Why not TinyGo? TinyGo is lightweight, but it still has a garbage collector, a runtime, and aims to support all Go features. What I'm after is something even simpler, with no runtime at all, source-level C interop, and eventually, Go's standard library ported to plain C so it can be used in regular C projects. How does So handle memory? Everything is stack-allocated by default. There's no garbage collector or reference counting. The standard library provides explicit heap allocation in the package when you need it. Is it safe? So itself has few safeguards other than the default Go type checking. It will panic on out-of-bounds array access, but it won't stop you from returning a dangling pointer or forgetting to free allocated memory. Most memory-related problems can be caught with AddressSanitizer in modern compilers, so I recommend enabling it during development by adding to your . Can I use So code from C (and vice versa)? Yes. So compiles to plain C, therefore calling So from C is just calling C from C. Calling C from So is equally straightforward. Can I compile existing Go packages with So? Not really. Go uses automatic memory management, while So uses manual memory management. So also supports far fewer features than Go. Neither Go's standard library nor third-party packages will work with So without changes. How stable is this? Not for production at the moment. Where's the standard library? There is a growing set of high-level packages ( , , , ...). There are also low-level packages that wrap the libc API ( , , , ...). Check the links below for more details. Even though So isn't ready for production yet, I encourage you to try it out on a hobby project or just keep an eye on it if you like the concept. Further reading: Go in, C out. You write regular Go code and get readable C11 as output. Zero runtime. No garbage collection, no reference counting, no hidden allocations. Everything is stack-allocated by default. Heap is opt-in through the standard library. Native C interop. Call C from So and So from C — no CGO, no overhead. Go tooling works out of the box — syntax highlighting, LSP, linting and "go test". Binary literals ( ) in generated code. Statement expressions ( ) in macros. for package-level initialization. for local type inference in generated code. for type inference in generic macros. for and other dynamic stack allocations. Installation and usage So by example Language description Stdlib description Source code

0 views
daniel.haxx.se 2 months ago

Dependency tracking is hard

curl and libcurl are written in C. Rather low level components present in many software systems. They are typically not part of any ecosystem at all. They’re just a tool and a library. In lots of places on the web when you mention an Open Source project, you will also get the option to mention in which ecosystem it belongs. npm, go, rust, python etc. There are easily at least a dozen well-known and large ecosystems. curl is not part of any of those. Recently there’s been a push for PURLs ( Package URLs ), for example when describing your specific package in a CVE. A package URL only works when the component is part of an ecosystem. curl is not. We can’t specify curl or libcurl using a PURL. SBOM generators and related scanners use package managers to generate lists of used components and their dependencies . This makes these tools quite frequently just miss and ignore libcurl. It’s not listed by the package managers. It’s just in there, ready to be used. Like magic. It is similarly hard for these tools to figure out that curl in turn also depends and uses other libraries. At build-time you select which – but as we in the curl project primarily just ships tarballs with source code we cannot tell anyone what dependencies their builds have. The additional libraries libcurl itself uses are all similarly outside of the standard ecosystems. Part of the explanation for this is also that libcurl and curl are often shipped bundled with the operating system many times, or sometimes perceived to be part of the OS. Most graphs, SBOM tools and dependency trackers therefore stop at the binding or system that uses curl or libcurl, but without including curl or libcurl. The layer above so to speak. This makes it hard to figure out exactly how many components and how much software is depending on libcurl. A perfect way to illustrate the problem is to check GitHub and see how many among its vast collection of many millions of repositories that depend on curl. After all, curl is installed in some thirty billion installations, so clearly it used a lot . (Most of them being libcurl of course.) It lists one dependency for curl. Repositories that depend on curl/curl: one. Screenshot taken on March 9, 2026 What makes this even more amusing is that it looks like this single dependent repository ( Pupibent/spire ) lists curl as a dependency by mistake.

0 views