Posts in C (20 found)
Simon Willison 2 days ago

Code research projects with async coding agents like Claude Code and Codex

I've been experimenting with a pattern for LLM usage recently that's working out really well: asynchronous code research tasks . Pick a research question, spin up an asynchronous coding agent and let it go and run some experiments and report back when it's done. Software development benefits enormously from something I call code research . The great thing about questions about code is that they can often be definitively answered by writing and executing code. I often see questions on forums which hint at a lack of understanding of this skill. "Could Redis work for powering the notifications feed for my app?" is a great example. The answer is always "it depends", but a better answer is that a good programmer already has everything they need to answer that question for themselves. Build a proof-of-concept, simulate the patterns you expect to see in production, then run experiments to see if it's going to work. I've been a keen practitioner of code research for a long time. Many of my most interesting projects started out as a few dozen lines of experimental code to prove to myself that something was possible. It turns out coding agents like Claude Code and Codex are a fantastic fit for this kind of work as well. Give them the right goal and a useful environment and they'll churn through a basic research project without any further supervision. LLMs hallucinate and make mistakes. This is far less important for code research tasks because the code itself doesn't lie: if they write code and execute it and it does the right things then they've demonstrated to both themselves and to you that something really does work. They can't prove something is impossible - just because the coding agent couldn't find a way to do something doesn't mean it can't be done - but they can often demonstrate that something is possible in just a few minutes of crunching. I've used interactive coding agents like Claude Code and Codex CLI for a bunch of these, but today I'm increasingly turning to their asynchronous coding agent family members instead. An asynchronous coding agent is a coding agent that operates on a fire-and-forget basis. You pose it a task, it churns away on a server somewhere and when it's done it files a pull request against your chosen GitHub repository. OpenAI's Codex Cloud , Anthropic's Claude Code for web , Google Gemini's Jules , and GitHub's Copilot coding agent are four prominent examples of this pattern. These are fantastic tools for code research projects. Come up with a clear goal, turn it into a few paragraphs of prompt, set them loose and check back ten minutes later to see what they've come up with. I'm firing off 2-3 code research projects a day right now. My own time commitment is minimal and they frequently come back with useful or interesting results. You can run a code research task against an existing GitHub repository, but I find it's much more liberating to have a separate, dedicated repository for your coding agents to run their projects in. This frees you from being limited to research against just code you've already written, and also means you can be much less cautious about what you let the agents do. I have two repositories that I use for this - one public, one private. I use the public one for research tasks that have no need to be private, and the private one for anything that I'm not yet ready to share with the world. The biggest benefit of a dedicated repository is that you don't need to be cautious about what the agents operating in that repository can do. Both Codex Cloud and Claude Code for web default to running agents in a locked-down environment, with strict restrictions on how they can access the network. This makes total sense if they are running against sensitive repositories - a prompt injection attack of the lethal trifecta variety could easily be used to steal sensitive code or environment variables. If you're running in a fresh, non-sensitive repository you don't need to worry about this at all! I've configured my research repositories for full network access, which means my coding agents can install any dependencies they need, fetch data from the web and generally do anything I'd be able to do on my own computer. Let's dive into some examples. My public research repository is at simonw/research on GitHub. It currently contains 13 folders, each of which is a separate research project. I only created it two weeks ago so I'm already averaging nearly one a day! It also includes a GitHub Workflow which uses GitHub Models to automatically update the README file with a summary of every new project, using Cog , LLM , llm-github-models and this snippet of Python . Here are a some example research projects from the repo. node-pyodide shows an example of a Node.js script that runs the Pyodide WebAssembly distribution of Python inside it - yet another of my ongoing attempts to find a great way of running Python in a WebAssembly sandbox on a server. python-markdown-comparison ( transcript ) provides a detailed performance benchmark of seven different Python Markdown libraries. I fired this one off because I stumbled across cmarkgfm , a Python binding around GitHub's Markdown implementation in C, and wanted to see how it compared to the other options. This one produced some charts! came out on top by a significant margin: Here's the entire prompt I used for that project: Create a performance benchmark and feature comparison report on PyPI cmarkgfm compared to other popular Python markdown libraries - check all of them out from github and read the source to get an idea for features, then design and run a benchmark including generating some charts, then create a report in a new python-markdown-comparison folder (do not create a _summary.md file or edit anywhere outside of that folder). Make sure the performance chart images are directly displayed in the README.md in the folder. Note that I didn't specify any Markdown libraries other than - Claude Code ran a search and found the other six by itself. cmarkgfm-in-pyodide is a lot more fun. A neat thing about having all of my research projects in the same repository is that new projects can build on previous ones. Here I decided to see how hard it would be to get - which has a C extension - working inside Pyodide inside Node.js. Claude successfully compiled a 88.4KB file with the necessary C extension and proved it could be loaded into Pyodide in WebAssembly inside of Node.js. I ran this one using Claude Code on my laptop after an initial attempt failed. The starting prompt was: Figure out how to get the cmarkgfm markdown lover [typo in prompt, this should have been "library" but it figured it out anyway] for Python working in pyodide. This will be hard because it uses C so you will need to compile it to pyodide compatible webassembly somehow. Write a report on your results plus code to a new cmarkgfm-in-pyodide directory. Test it using pytest to exercise a node.js test script that calls pyodide as seen in the existing node.js and pyodide directory There is an existing branch that was an initial attempt at this research, but which failed because it did not have Internet access. You do have Internet access. Use that existing branch to accelerate your work, but do not commit any code unless you are certain that you have successfully executed tests that prove that the pyodide module you created works correctly. This one gave up half way through, complaining that emscripten would take too long. I told it: Complete this project, actually run emscripten, I do not care how long it takes, update the report if it works It churned away for a bit longer and complained that the existing Python library used CFFI which isn't available in Pyodide. I asked it: Can you figure out how to rewrite cmarkgfm to not use FFI and to use a pyodide-friendly way of integrating that C code instead? ... and it did. You can see the full transcript here . blog-tags-scikit-learn . Taking a short break from WebAssembly, I thought it would be fun to put scikit-learn through its paces on a text classification task against my blog: Work in a new folder called blog-tags-scikit-learn Download - a SQLite database. Take a look at the blog_entry table and the associated tags - a lot of the earlier entries do not have tags associated with them, where the later entries do. Design, implement and execute models to suggests tags for those earlier entries based on textual analysis against later ones Use Python scikit learn and try several different strategies Produce JSON of the results for each one, plus scripts for running them and a detailed markdown description Also include an HTML page with a nice visualization of the results that works by loading those JSON files. This resulted in seven files, four results files and a detailed report . (It ignored the bit about an HTML page with a nice visualization for some reason.) Not bad for a few moments of idle curiosity typed into my phone! That's just three of the thirteen projects in the repository so far. The commit history for each one usually links to the prompt and sometimes the transcript if you want to see how they unfolded. More recently I added a short file to the repo with a few extra tips for my research agents. You can read that here . My preferred definition of AI slop is AI-generated content that is published without human review. I've not been reviewing these reports in great detail myself, and I wouldn't usually publish them online without some serious editing and verification. I want to share the pattern I'm using though, so I decided to keep them quarantined in this one public repository. A tiny feature request for GitHub: I'd love to be able to mark a repository as "exclude from search indexes" such that it gets labelled with tags. I still like to keep AI-generated content out of search, to avoid contributing more to the dead internet . It's pretty easy to get started trying out this coding agent research pattern. Create a free GitHub repository (public or private) and let some agents loose on it and see what happens. You can run agents locally but I find the asynchronous agents to be more convenient - especially as I can run them (or trigger them from my phone) without any fear of them damaging my own machine or leaking any of my private data. Claude Code for web offers a free $250 of credits for their $20/month users for a limited time (until November 18, 2025). Gemini Jules has a free tier . There are plenty of other coding agents you can try out as well. Let me know if your research agents come back with anything interesting! You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Code research Coding agents Asynchronous coding agents Give them a dedicated GitHub repository Let them rip with unlimited network access My simonw/research collection This is total slop, of course Try it yourself

0 views
Abhinav Sarkar 3 days ago

A Short Survey of Compiler Targets

As an amateur compiler developer, one of the decisions I struggle with is choosing the right compiler target. Unlike the 80’s when people had to target various machine architectures directly, now there are many mature options available. This is a short and very incomplete survey of some of the popular and interesting options. A compiler can always directly output machine code or assembly targeted for one or more architectures. A well-known example is the Tiny C Compiler . It’s known for its speed and small size, and it can compile and run C code on the fly. Another such example is Turbo Pascal . You could do this with your compiler too, but you’ll have to figure out the intricacies of the Instruction set of each architecture (ISA) you want to target, as well as, concepts like Register allocation . Most modern compilers actually don’t emit machine code or assembly directly. They lower the source code down to a language-agnostic Intermediate representation (IR) first, and then generate machine code for major architectures (x86-64, ARM64, etc.) from it. The most prominent tool in this space is LLVM . It’s a large, open-source compiler-as-a-library. Compilers for many languages such as Rust , Swift , C/C++ (via Clang ), and Julia use LLVM as an IR to emit machine code. An alternative is the GNU C compiler (GCC), via its GIMPLE IR, though no compilers seem to use it directly. GCC can be used as a library to compile code, much like LLVM, via libgccjit . It is used in Emacs to Just-in-time (JIT) compile Elisp . Cranelift is another new option in this space, though it supports only few ISAs. For those who find LLVM or GCC too large or slow to compile, minimalist alternatives exist. QBE is a small backend focused on simplicity, targeting “70% of the performance in 10% of the code”. It’s used by the language Hare that prioritizes fast compile times. Another option is libFIRM , which uses a graph-based SSA representation instead of a linear IR. Sometimes you are okay with letting other compilers/runtimes take care of the heavy lifting. You can transpile your code to a another established high-level language and leverage that language’s existing compiler/runtime and toolchain. A common target in such cases is C. Since C compilers exist for nearly all platforms, generating C code makes your language highly portable. This is the strategy used by Chicken Scheme and Vala . Or you could compile to C++ instead, like Jank , if that’s your thing. There is also C– , a subset of C targeted by GHC and OCaml . Another ubiquitous target is JavaScript (JS), which is one of the two options (other being WebAssembly ) for running code natively in a web browser or one of the JS runtimes ( Node , Deno , Bun ). Multiple languages such as TypeScript , PureScript , Reason , ClojureScript , Dart and Elm transpile to JS. Nim interestingly, can transpile to C, C++ or JS. Another target similar to JS is Lua , a lightweight and embeddable scripting language, which languages such as MoonScript and Fennel transpile to. A more niche approach is to target a Lisp dialect. Compiling to Chez Scheme , for example, allows you to leverage its macro system, runtime, and compiler. The Idris 2 and Racket use Chez Scheme as their primary backend targets. This is a common choice for application languages. You compile to a portable bytecode for a Virtual machine (VM). VMs generally come with features like Garbage collection , JIT compilation , and security sandboxing. The Java Virtual Machine (JVM) is probably the most popular one. It’s the target for many languages including Java , Kotlin , Scala , Groovy , and Clojure . Its main competitor is the Common Language Runtime , originally developed by Microsoft , which is targeted by languages such as C# , F# , and Visual Basic.NET . Another notable VM is the BEAM , originally built for Erlang . The BEAM VM isn’t built for raw computation speed but for high concurrency, fault tolerance, and reliability. Recently, new languages such as Elixir and Gleam have been created to target it. Finally, this category also includes MoarVM —the spiritual successor to the Parrot VM —built for the Raku (formerly Perl 6) language. WebAssembly (Wasm) is a relatively new target. It’s a portable binary instruction format focused on security and efficiency. Wasm is supported by all major browsers, but not limited to them. The WebAssembly System Interface (WASI) standard provides APIs for running Wasm in non-browser and non-JS environments. Wasm is now targeted by many languages such as Rust , C/C++ , Go , Kotlin , Scala , Zig , and Haskell . Meta-tracing frameworks are a more complex category. These are not the targets for your compiler backend, instead, you use them to build a custom JIT compiler for your language by specifying an interpreter for it. The most well-known example is PyPy , an implementation of Python , created using the RPython framework. Another such framework is GraalVM/Truffle , a polyglot VM and meta-tracing framework from Oracle . Its main feature is zero-cost interoperability: code from GraalJS , TruffleRuby , and GraalPy can all run on the same VM, and can call each other directly. Move past the mainstream, and you’ll discover a world of unconventional and esoteric compiler targets. Developers pick them for academic curiosity, artistic expression, or to test the boundaries of viable compilation targets. Brainfuck: An esoteric language with only eight commands, Brainfuck is Turing-complete and has been a target for compilers as a challenge. People have written compilers for C , Haskell and Lambda calculus . Lambda calculus: Lambda calculus is a minimal programming languages that expresses computation solely as functions and their applications. It is often used as the target of educational compilers because of its simplicity, and its link to the fundamental nature of computation. Hell , a subset of Haskell, compiles to Simply typed lambda calculus . SKI combinators: The SKI combinator calculus is even more minimal than lambda calculus. All programs in SKI calculus can be composed of only three combinators: S, K and I. MicroHs compiles a subset of Haskell to SKI calculus. JSFuck: Did you know that you can write all possible JavaScript programs using only six characters ? Well, now you know . Postscript: Postscript is also a Turing-complete programming language. Your next compiler could target it! Regular Expressions ? Lego ? Cellular automata ? I’m going to write a compiler from C++ to JSFuck. If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading! This post was originally published on abhinavsarkar.net . If you liked this post, please leave a comment . Machine Code / Assembly Intermediate Representations Other High-level Languages Virtual Machines / Bytecode WebAssembly Meta-tracing Frameworks Unconventional Targets Brainfuck: An esoteric language with only eight commands, Brainfuck is Turing-complete and has been a target for compilers as a challenge. People have written compilers for C , Haskell and Lambda calculus . Lambda calculus: Lambda calculus is a minimal programming languages that expresses computation solely as functions and their applications. It is often used as the target of educational compilers because of its simplicity, and its link to the fundamental nature of computation. Hell , a subset of Haskell, compiles to Simply typed lambda calculus . SKI combinators: The SKI combinator calculus is even more minimal than lambda calculus. All programs in SKI calculus can be composed of only three combinators: S, K and I. MicroHs compiles a subset of Haskell to SKI calculus. JSFuck: Did you know that you can write all possible JavaScript programs using only six characters ? Well, now you know . Postscript: Postscript is also a Turing-complete programming language. Your next compiler could target it! Regular Expressions ? Lego ? Cellular automata ?

0 views
Blargh 1 weeks ago

The strange webserver hot potato — sending file descriptors

I’ve previously mentioned my io-uring webserver tarweb . I’ve now added another interesting aspect to it. As you may or may not be aware, on Linux it’s possible to send a file descriptor from one process to another over a unix domain socket. That’s actually pretty magic if you think about it. You can also send unix credentials and SELinux security contexts, but that’s a story for another day. I want to run some domains using my webserver “tarweb”. But not all. And I want to host them on a single IP address, on the normal HTTPS port 443. Simple, right? Just use nginx’s ? Ah, but I don’t want nginx to stay in the path. After SNI (read: “browser saying which domain it wants”) has been identified I want the TCP connection to go directly from the browser to the correct backend. I’m sure somewhere on the internet there’s already an SNI router that does this, but all the ones I found stay in line with the request path, adding a hop. A few reasons: Livecount has an open websocket to every open browser tab in the world reading a given page, so they add up. (no, it doesn’t log. It just keeps count) I built a proof of concept SNI router . It is a frontline server receiving TCP connections, on which it then snoops the SNI from the TLS ClientHello, and routes the connection according to its given rules. Anything it reads from the socket is sent along to the real backend along with the file descriptor. So the backend (in my use that’s tarweb) needs to have code cooperating to receive the new connection. It’s not the cleanest code, but it works. I got ChatGPT to write the boring “parse the TLS record / ClientHello” parts. Rust is a memory safe language, so “how bad could it be?”. :-) It seems to work for all the currently used TLS versions. As I said, it requires the backend to be ready to receive “hey, here’s a file descriptor, and here’s the first few hundred bytes you should treat as if you’ve read them from the client”. File descriptors don’t have an operation to “unread”. If they did then this would be easier. Then it would “just” be a matter of giving a backend webserver a file descriptor. For some use cases that could mean starting a new webserver process that reads and writes from stdin/stdout. Not super efficient to go back to the fork-exec-per-connection model from the previous century, but it would achieve the direct connection. But the details are academic. We do need to pass along the snooped bytes somehow, or the TLS handshake won’t succeed. Which means it does need cooperation from the backend. Because the SNI router never writes to the client, and therefore doesn’t perform a TLS handshake, it doesn’t need any private keys or certificates. The SNI router has no secrets, and sees no secrets. I also added a mode that proxies the TCP connection, if some SNI should be routed to a different server. But of course then it’s not possible to pass the file descriptor. So encrypted bytes will bounce on the SNI router for that kind of flow. But still the SNI router is not able to decrypt anything. A downside is of course that bouncing the connection around the world will slow it down, add latency, and waste resources. So pass the file descriptor where possible. So now my setup has the SNI router accept the connection, and then throw the very file descriptor over to tarweb, saying “you deal with this TCP connection”. Tarweb does the TLS handshake, and then throws the TLS session keys over to the kernel, saying “I can’t be bothered doing encryption, you do it”, and then actually handles the HTTP requests. Well actually, there’s another strange indirection. When tarweb receives a file descriptor, it uses io-uring “registered files” to turn it into a “fixed file handle”, and closes the original file descriptor. On the kernel side there’s still a file descriptor of course, but there’s nothing in : This improves performance a bit on the linux kernel side. The SNI router does not use io-uring. At least not yet. The SNI router’s job is much smaller (doesn’t even do a TLS handshake), much more brief (it almost immediately passes the file descriptor to tarweb), and much less concurrency (because of the connections being so short lived as far as it’s concerned), that it may not be worth it. In normal use the SNI router only needs these syscalls per connection: At the risk of going off on an unrelated tangent, HTTP/3 (QUIC-based) has an interesting way of telling a client to “go over there” . A built in load balancer inside the protocol, you could say, sparing the load balancer needing to proxy everything. This opens up opportunities to steer not just on SNI, and is much more flexible than DNS, all without needing the “proxy” to be inline. E.g. say a browser is in Sweden, and you have servers in Norway and Italy. And say you have measured, and find that it would be best if the browser connected to your Norway server. But due to peering agreements and other fun stuff, Italy will be preferred on any BGP anycasted address. You then have a few possible options, and I do mean they’re all possible: The two DNS-based ones also have the valid concern that screwing up DNS can have bad consequences . If you can leave DNS alone that’s better. Back to HTTP/3. If you’ve set up HTTP/3 it may be because you care about latency. It’s then easier to act on information you have about every single connection. On an individual connection basis you can tell the browser in Sweden that it should now talk to the servers in Norway. All without DNS or anycast. Which is nice, because running a webserver is hard enough. Also running a dynamic DNS service or anycast has even more opportunities to blow up fantastically. I should add that HTTP/3 doesn’t have the “running out of file descriptors” problem. Being based on UDP you can run your entire service with just a single file descriptor. Connections are identified by IDs, not 5-tuples. So why didn’t I just use HTTP/3? No support for that (yet). From some skimming ESNI should “just work”, with just a minor decryption operation in the SNI router. ECH seems harder. It should still be doable, but the SNI router will need to do the full handshake, or close to it. And after taking its routing decision it needs to transfer the encryption state to the backend, along with the file descriptor. This is not impossible, of course. It’s similar to how tarweb passes the TLS session keys to the kernel. But it likely does mean that the SNI router needs to have access to both the TLS session keys and maybe even the domain TLS private keys. But that’s a problem for another day. Having all bytes bounce on the SNI router triples the number of total file descriptors for the connection. (one on the backend, then one each on the router for upstream and downstream). There are limits per process and system wide, and the more you have the more you need to juggle them in code. It also wastes CPU and RAM. I want the backend to know the real client IP address, via or similar, on the socket itself. I don’t want restarting nginx to cut existing connections to backends. I’d like to use TLS keys that the nginx user doesn’t have access to. I used for livecount , and last time I got blog posts on HackerNews nginx ran out of file descriptors, and started serving 500 for it serving just plain old static files on disk. For now I’ve moved livecount to a different port, but in the long run I want it back on port 443, and yet isolated from nginx so that the latter keeps working even if livecount is overloaded. for the new connection, a few hundred bytes of ClientHello, of same size to pass it on, to forget the file descriptor. Have the browser connect to , with Norway-specific IP addresses. Not great. People will start bookmarking these URLs, and what happens when you move your Norway servers to Denmark? now goes to servers in Denmark? Use DNS based load balancing, giving Swedish browsers the Norway unicast IPs. Yes… but this is WAY more work than you probably think. And WAY less reliable at giving the best experience for the long tail. And sometimes your most important customer is in that long tail. Try to traffic engineer the whole Internet with BGP announcement tweaks. Good luck with that, for the medium to long tail. Install servers in Sweden, and any other place you may have users. Then you can anycast your addresses from there, and have full control of how you proxy (or packet by packet traffic engineer over tunnels) them. Expensive if you have many locations you need to do this in. Some traffic will still go to the wrong anycast entry point, but pretty feasible though expensive. HTTP/3 is complex. You can build a weird io-uring kTLS based webserver on a weekend, and control everything (except TLS handshakes). Implementing HTTP/3 from scratch, and controlling everything, is a different beast. HTTP/1 needs to still work. Not all clients support HTTP/3, and HTTP/1 or 2 is even used to bootstrap HTTP/3 via its header. Preferred address in HTTP/3 is just a suggestion. Browsers don’t have to actually move. tarweb was first written in C++ . livecount keeps long lived connection . tarweb rewritten in Rust, and using io-uring . You can redirect with Javascript , but this still has the problem. I passed file descriptors between processes in injcode , but it was only ever a proof of concept that only worked on 32bit x86, and the code doesn’t look like it actually does it? Anyway I can’t expect to remember code from 17 years ago.

0 views
Simon Willison 2 weeks ago

Living dangerously with Claude

I gave a talk last night at Claude Code Anonymous in San Francisco, the unofficial meetup for coding agent enthusiasts. I decided to talk about a dichotomy I've been struggling with recently. On the one hand I'm getting enormous value from running coding agents with as few restrictions as possible. On the other hand I'm deeply concerned by the risks that accompany that freedom. Below is a copy of my slides, plus additional notes and links as an annotated presentation . I'm going to be talking about two things this evening... Why you should always use . (This got a cheer from the room full of Claude Code enthusiasts.) And why you should never use . (This did not get a cheer.) is a bit of a mouthful, so I'm going to use its better name, "YOLO mode", for the rest of this presentation. Claude Code running in this mode genuinely feels like a completely different product from regular, default Claude Code. The default mode requires you to pay constant attention to it, tracking everything it does and actively approving changes and actions every few steps. In YOLO mode you can leave Claude alone to solve all manner of hairy problems while you go and do something else entirely. I have a suspicion that many people who don't appreciate the value of coding agents have never experienced YOLO mode in all of its glory. I'll show you three projects I completed with YOLO mode in just the past 48 hours. I wrote about this one at length in Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code . I wanted to try the newly released DeepSeek-OCR model on an NVIDIA Spark, but doing so requires figuring out how to run a model using PyTorch and CUDA, which is never easy and is a whole lot harder on an ARM64 device. I SSHd into the Spark, started a fresh Docker container and told Claude Code to figure it out. It took 40 minutes and three additional prompts but it solved the problem , and I got to have breakfast and tinker with some other projects while it was working. This project started out in Claude Code for the web . I'm eternally interested in options for running server-side Python code inside a WebAssembly sandbox, for all kinds of reasons. I decided to see if the Claude iPhone app could launch a task to figure it out. I wanted to see how hard it was to do that using Pyodide running directly in Node.js. Claude Code got it working and built and tested this demo script showing how to do it. I started a new simonw/research repository to store the results of these experiments, each one in a separate folder. It's up to 5 completed research projects already and I created it less than 2 days ago. Here's my favorite, a project from just this morning. I decided I wanted to try out SLOCCount , a 2001-era Perl tool for counting lines of code and estimating the cost to develop them using 2001 USA developer salaries. .. but I didn't want to run Perl, so I decided to have Claude Code (for web, and later on my laptop) try and figure out how to run Perl scripts in WebAssembly. TLDR: it got there in the end ! It turned out some of the supporting scripts in SLOCCount were written in C, so it had to compile those to WebAssembly as well. And now tools.simonwillison.net/sloccount is a browser-based app which runs 25-year-old Perl+C in WebAssembly against pasted code, GitHub repository references and even zip files full of code. The wild thing is that all three of these projects weren't even a priority for me - they were side quests, representing pure curiosity that I could outsource to Claude Code and solve in the background while I was occupied with something else. I got a lot of useful work done in parallel to these three flights of fancy. But there's a reason has that scary name. It's dangerous to use Claude Code (and other coding agents) in this way! The reason for this is prompt injection , a term I coined three years ago to describe a class of attacks against LLMs that take advantage of the way untrusted content is concatenated together with trusted instructions. (It's named after SQL injection which shares a similar shape.) This remains an incredibly common vulnerability. Here's a great example of a prompt injection attack against a coding agent, described by Johann Rehberger as part of his Month of AI Bugs , sharing a new prompt injection report every day for the month of August. If a coding agent - in this case OpenHands - reads this file it can be tricked into grepping the available environment variables for (matching GitHub Personal Access Tokens) and sending that to the attacker's external server for "help debugging these variables". I coined another term to try and describe a common subset of prompt injection attacks: the lethal trifecta . Any time an LLM system combines access to private data with exposure to untrusted content and the ability to externally communicate , there's an opportunity for attackers to trick the system into leaking that private data back to them. These attacks are incredibly common . If you're running YOLO coding agents with access to private source code or secrets (like API keys in environment variables) you need to be concerned about the potential of these attacks. This is the fundamental rule of prompt injection: anyone who can get their tokens into your context should be considered to have full control over what your agent does next, including the tools that it calls. Some people will try to convince you that prompt injection attacks can be solved using more AI to detect the attacks. This does not work 100% reliably, which means it's not a useful security defense at all . The only solution that's credible is to run coding agents in a sandbox . The best sandboxes are the ones that run on someone else's computer! That way the worst that can happen is someone else's computer getting owned. You still need to worry about your source code getting leaked. Most of my stuff is open source anyway, and a lot of the code I have agents working on is research code with no proprietary secrets. If your code really is sensitive you need to consider network restrictions more carefully, as discussed in a few slides. There are lots of great sandboxes that run on other people's computers. OpenAI Codex Cloud, Claude Code for the web, Gemini Jules are all excellent solutions for this. I also really like the code interpreter features baked into the ChatGPT and Claude consumer apps. There are two problems to consider with sandboxing. The first is easy: you need to control what files can be read and written on the filesystem. The second is much harder: controlling the network connections that can be made by code running inside the agent. The reason network access is so important is that it represents the data exfiltration leg of the lethal trifecta. If you can prevent external communication back to an attacker they can't steal your private information, even if they manage to sneak in their own malicious instructions. Claude Code CLI grew a new sandboxing feature just yesterday, and Anthropic released an a new open source library showing how it works. The key to the implementation - at least on macOS - is Apple's little known but powerful command. This provides a way to run any command in a sandbox configured by a policy document. Those policies can control which files are visible but can also allow-list network connections. Anthropic run an HTTP proxy and allow the Claude Code environment to talk to that, then use the proxy to control which domains it can communicate with. (I used Claude itself to synthesize this example from Anthropic's codebase.) ... the bad news is that has been marked as deprecated in Apple's documentation since at least 2017! It's used by Codex CLI too, and is still the most convenient way to run a sandbox on a Mac. I'm hoping Apple will reconsider. So go forth and live dangerously! (But do it in a sandbox.) You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
Abhinav Sarkar 2 weeks ago

A Fast Bytecode VM for Arithmetic: The Virtual Machine

In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics: In this final post, we write the virtual machine that executes our bytecode, and benchmark it. This post was originally published on abhinavsarkar.net . This post is part of the series: A Fast Bytecode VM for Arithmetic . Bytecode Virtual Machines (VMs) are known to be faster than AST -walking interpreters. That’s why many real-world programming languages these days are implemented with bytecode VM s, for example, Java , Python , PHP , and Raku . The reason is partially, the flat and compact nature of bytecode itself. But VM s also have a few other tricks up their sleeves that make them highly performant. In this post, we write a VM for our arithmetic expression language, and explore some of these performance tricks. But first, we need to finish a pending task. We wrote some unit tests for our compiler in the last post, but unit tests cover only the cases we can think of. A compiler has to deal with any input, and with just unit tests we cannot be sure of its correctness. To test our compiler and other components for correctness, we use the QuickCheck library. QuickCheck is a Property-based Testing framework. The key idea of property-based testing is to write properties of our code that hold true for any input, and then to automatically generate a large number of arbitrary inputs and make sure that the properties are indeed true for them 1 2 . Since we are writing an arithmetic expression parser/compiler/ VM , we generate arbitrary expression AST s, and use them to assert certain invariants of our program. With QuickCheck, we write generators for the inputs for our tests. These generators are composable just like parser combinators are. We use the library provided generators to write small generators that we combine to create larger ones. Let’s start: First come the basic generators: Moving on to composite generators: generates binary expressions with arbitrary binary operations. It recursively calls to generate the operands. The parameter controls the complexity of the generated expressions, and we half the size of operands (and so on recursively) so that we don’t end up with infinitely large expressions. generates expressions by generating an identifier, and then generating the assignment and body expressions recursively. We do the same trick of halving sizes here as well. Notice that the assignment is generated with the passed variable names in scope, whereas the body is generated with the new identifier added to the scope. uses the above generators to generate all kinds of expressions. At smaller sizes, it prefers to generate base expressions, while at larger sizes, it prefers composite ones. Due to the careful recursive halving of size in composite generators, we end up with expressions of finite sizes. Finally, we have some instances of QuickCheck’s type class to tie everything together: We can apply them in GHCi: Notice that the generated samples increase in complexity. With the generators in place, we define our properties next. Let’s test our parser first: This property is a simple round-trip test for the parser and printer: we parse the string representation of a generated expression, and assert that it gives back the same expression. The second property is a more involved round-trip test for the compiler and decompiler: This asserts that compiling an expression, then disassembling and decompiling it, and finally compiling it again should result in the original bytecode 3 . This requires a helper function to get the size of an expression: We run these tests in a later section. This ends our short detour. Now for the main event: the virtual machine. Our VM is a stack-based machine that operates on a stack of values and executes the compiled bytecode. Our goal is to be as fast as possible. For a quick reminder, these are our s: And now, the heart of the VM : The function is where the action happens. It is way more complex than , but the complexity has a reason, namely performance. runs inside the monad wrapped with the monad transformer. monad lets us use mutable data structures locally while ensuring the function remains externally pure. monad transformer adds support for throwing and propagating errors in a pure manner. We use for our stack, which is a mutable array of unboxed primitive types, in our case an array of values. Using a mutable unboxed array is much faster than using an immutable and/or boxed one like or due to reduced allocation and/or pointer chasing. The core of the VM is the function, a tight, tail-recursive loop that GHC compiles into an efficient machine loop, as we see later. It takes the stack pointer ( ), instruction pointer ( ) 4 , and the stack as arguments. At the top of each loop, a block of guard clauses checks for stack overflow, underflow, and other error conditions before branching on the current opcode. Placing these checks at the top instead of inside the opcode cases is a deliberate choice. This may make the code slightly harder to understand, but it significantly improves the performance of the loop by moving all branching at the beginning of the loop, resulting in code that is more friendly to the CPU’s Branch Predictor . Also notice how we reduce the number of checks by working with a range of opcodes at once in the guard. The checks are also sorted so as to be most performant, guided by profiling and benchmarking 5 . The handling of each opcode is actually pretty straightforward. We use different specific operations to read and write to the stack, while taking care of doing the required bound and arithmetic checks. We also use the functions that we wrote earlier . After carrying out each operation, we reenter the loop by calling it tail-recursively with the right stack and instruction pointers. Finally, we make sure that the execution terminated correctly by checking the state of the stack, and return its first element. We see later that the VM is quite fast, but how does GHC achieve this performance? To see the magic, we can look at GHC ’s intermediate language: Core . Core is a simpler functional language than Haskell to which GHC compiles Haskell. The simpler nature of Core makes it easier for GHC to optimize it, and compile it further. We can get the Core code for a program by compiling with the GHC option . The actual Core code for our VM is too verbose to show here, but here is a simplified C-like pseudo-code version of our loop: A few key optimizations are worth pointing out: The loop: The tail-recursive function is compiled into a proper loop. The instruction is effectively a , which means there’s no function call overhead for each iteration of the VM loop. Unboxing: The Core code is full of primitive, unboxed types like , , and , and operations on them. These are raw machine integers and memory addresses, not boxed Haskell objects. This means operations on them are as fast as they would be in C. The stack operations are not function calls on a instance, but primitive memory reads and writes on a raw memory address . Inlining: The helper function is completely inlined into the main loop. For , the code for reading two values, adding them, and writing the result is laid out inline, and works on unboxed values and array address. In short, GHC has turned our high-level, declarative Haskell code into a low-level loop that looks remarkably like one we would write in C. We get the safety and expressiveness of Haskell, while GHC does the heavy lifting to produce highly optimized code. It’s the best of both worlds! We must test the VM to make sure it works correctly 6 . We reuse the success and failure tests for the AST interpreter, as the bytecode interpreter should yield the same result: We also add a property-based test this time: for any given expression, interpreting the AST should produce the same result as compiling it to bytecode and executing it in the VM 7 . Our test suite is complete now: And finally, we run all tests together: Happily, all tests pass. Now for the fun part: benchmarking. We use the criterion library to benchmark the code. We have a benchmark suite to measure the performance of each pass, the two interpreters ( AST and bytecode), and the full end-to-end runs 8 . We compile with the following GHC options: Here are the results in a more digestible format: Here are the times in a chart (smaller is better): Let’s break down these numbers: I can already see readers thinking, “Sure that’s fast, but is it faster than C/Rust/Zig/my favourite language?” Let’s find out. To get a better sense of our VM ’s performance, I rewrote it in C. The C implementation is a classic manual approach: a hand-written tokenizer and recursive-descent parser, s with pointers for the AST , and manual memory management and error propagation. The VM is a simple loop with a statement for dispatching opcodes 9 . To compare our Haskell code against the C code, we need to write the last Haskell module, the CLI app that we demonstrated in the first post : We compile with the following GHC options 10 : And for the C version, we compile using GCC: Now, let’s see how they stack up against each other. We use hyperfine to run the two executables. Here’s a summary of the results: I have subtracted the times of previous passes to get the times for individual passes. Here’s the same in a chart (smaller is better): As expected, the C implementation is faster across the board, between 1.5x to 2.6x. The biggest difference is in parsing, where the hand-written C parser is more than twice as fast as our combinator-based one. On the other hand, the Haskell VM is only 50% slower than the C VM . In my opinion, the Haskell code’s performance is quite respectable, especially given the safety, expressiveness and conciseness benefits, as illustrated by the code sizes 11 : The Haskell implementation is almost half the size of the C code. I don’t know about you but I’m perfectly happy with the half as small, half as fast tread-off. The benchmark results for the VM s become less surprising when I compare the C function with the GHC Core code for 12 . This structure is almost a 1-to-1 match with the GHC Core code we saw earlier. The C loop corresponds to the optimized function that GHC generates, the statement is almost identical to the analysis on the raw opcode byte, and the C stack array is equivalent to the GHC uses. GHC effectively compiles our high-level Haskell into a low-level code that is structurally identical to what we wrote by hand in C 13 . This explains why the performance is in the same ballpark. The remaining performance gap is probably due to the thin layer of abstraction that the Haskell runtime still maintains, but it’s remarkable how close we can get to C-like performance. While our Haskell program is fast, we can improve certain things: Parser optimizations : As the benchmarks showed, parsing is our slowest pass. For better performance, we could replace our Attoparsec-based combinator parser with a parser generator like Alex and Happy , or even write a recursive-descent parser by hand. Superinstructions : We could analyze the bytecode for common instruction sequences (like followed by ) and combine them into single superinstructions. This would reduce the instruction dispatch overhead, but may make compilation slower. Register-based VM : A register-based VM , which uses a small array of virtual registers instead of a memory-based stack, could significantly reduce memory traffic and improve performance. This would require a more complex compiler capable of register allocation. Just-in-Time (JIT) compilation : The ultimate performance boost could come from a JIT compiler . Instead of interpreting bytecode, we could compile it to native machine code at runtime, eliminating the interpreter entirely. Maybe we could use LLVM to build a JIT compiler in Haskell. And that’s a wrap! We successfully built a bytecode compiler and virtual machine in Haskell. We covered parsing, AST interpretation, compilation, and bytecode execution, as well as, debugging and testing functionalities. Let’s update our checklist: The journey from a simple AST interpreter to a bytecode VM has been a rewarding one. We saw a significant performance improvement, learned about how compilers and VM s work, and how to write performant code in Haskell. While our Haskell implementation isn’t as fast as the hand-written C version, it’s far more concise and, I would argue, easier to reason about. It’s a great demonstration of Haskell’s power for writing high-performance—yet safe and elegant—code. See the full code at: If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading! Actually, QuickCheck does not generate entirely arbitrary inputs. It generates arbitrary inputs with increasing complexity—where the complexity is defined by the user—and asserts the properties on these inputs. When a test fails for a particular input, QuickCheck also tries to simplify the culprit and tries to find the simplest input for which the test fails. This process is called Shrinking in QuickCheck parlance. QuickCheck then shows this simplest input to the user for them to use it to debug their code. ↩︎ Read this good introduction to QuickCheck if you are unfamiliar. ↩︎ Notice that we discard the expressions that do not compile successfully. ↩︎ and are not actual pointers, but indices into the stack and bytecode arrays respectively. ↩︎ Guided by the GHC profiler, I tweaked the code in many different ways and ran benchmarks for every change. Then I chose the code that was most performant. ↩︎ It is extremely important to write good tests before getting your hands dirty with performance optimizations. In my case, the tests saved me many times from breaking the VM while moving code around for performance. ↩︎ We are using our AST interpreter as a definitional interpreter, assuming it to be correctly implemented because of its simpler nature. ↩︎ I ran all benchmarks on an Apple M4 Pro 24GB machine against a 142MB file generated using the expression generator we wrote earlier. ↩︎ I don’t claim to be a great or even a good C programmer. In fact, this C VM is the first substantial C code I have written in decades. I’m sure the code is not most optimized. It may even be ridden with memory management bugs. If you find something wrong, please let me know in the comments. ↩︎ I tried various RTS options to tweak GHC garbage collection, but the defaults proved to be fastest. ↩︎ The lines of code are for only the overlapping functionalities between C and Haskell versions. ↩︎ I did try using Direct Threading and Subroutine Threading in the C code, but they resulted in slower code than the switch-case variant. GCC may be smart enough in case of this simple VM to optimize the switch-case to be faster than threaded code. ↩︎ You may have noticed that the C function is not laid out in the exact same manner as the Haskell function. In case of C, moving the checks to the front did not yield in performance improvement. I suspect this may be because GCC is smart enough to do that optimization by itself. The nested were also no detriment to the performance of the C code. ↩︎ If you liked this post, please leave a comment . Parsing arithmetic expressions to Abstract Syntax Trees (ASTs). Unit testing for our parser. Interpreting ASTs. Compiling ASTs to bytecode. Disassembling and decompiling bytecode. Unit testing for our compiler. Property-based testing for our compiler. Efficiently executing bytecode in a virtual machine (VM). Unit testing and property-based testing for our VM . Benchmarking our code to see how the different passes perform. All the while keeping an eye on performance. The Compiler The Virtual Machine (you are here) Introduction Testing the Compiler The Virtual Machine Testing the VM Benchmarking the VM Benchmarking Against C Future Directions generates number expressions by using QuickCheck’s built-in function. generates variable expressions by choosing from the set of passed valid variable names. generates valid identifiers from combinations of letters a—z and A—Z, and discarding ones that are reserved keywords. generates binary expressions with arbitrary binary operations. It recursively calls to generate the operands. The parameter controls the complexity of the generated expressions, and we half the size of operands (and so on recursively) so that we don’t end up with infinitely large expressions. generates expressions by generating an identifier, and then generating the assignment and body expressions recursively. We do the same trick of halving sizes here as well. Notice that the assignment is generated with the passed variable names in scope, whereas the body is generated with the new identifier added to the scope. uses the above generators to generate all kinds of expressions. At smaller sizes, it prefers to generate base expressions, while at larger sizes, it prefers composite ones. Due to the careful recursive halving of size in composite generators, we end up with expressions of finite sizes. The loop: The tail-recursive function is compiled into a proper loop. The instruction is effectively a , which means there’s no function call overhead for each iteration of the VM loop. Unboxing: The Core code is full of primitive, unboxed types like , , and , and operations on them. These are raw machine integers and memory addresses, not boxed Haskell objects. This means operations on them are as fast as they would be in C. The stack operations are not function calls on a instance, but primitive memory reads and writes on a raw memory address . Inlining: The helper function is completely inlined into the main loop. For , the code for reading two values, adding them, and writing the result is laid out inline, and works on unboxed values and array address. Parsing and decompiling are slow : At ~573ms and ~506ms, these are by far the slowest passes. This isn’t surprising. Parsing with parser combinators has a known trade-off of expressiveness for performance. Decompiling is a shift-reduce parser that reconstructs an AST from a linear stream of opcodes, and we didn’t spend any time optimizing it. Compilation is fast : At ~51ms, compilation is an order of magnitude faster than parsing. This is thanks to pre-calculating the bytecode size during the parsing phase, which allows us to pre-allocate a single and fill it in with low-level pointer operations. Bytecode interpretation is blazingly fast : At just ~16ms, our VM ’s interpreter is over 3 times faster than the AST interpreter (~50ms), which proves our belief that bytecode interpreters are faster. End-to-end runs : Interestingly, the total time to run via bytecode (~638ms) is slightly slower than the run via AST (~617ms). This is because the cost of parsing, compiling, and then interpreting is higher than just parsing and interpreting. The real win for a bytecode VM comes when you compile once and run many times , amortizing the initial compilation cost. Parser optimizations : As the benchmarks showed, parsing is our slowest pass. For better performance, we could replace our Attoparsec-based combinator parser with a parser generator like Alex and Happy , or even write a recursive-descent parser by hand. Superinstructions : We could analyze the bytecode for common instruction sequences (like followed by ) and combine them into single superinstructions. This would reduce the instruction dispatch overhead, but may make compilation slower. Register-based VM : A register-based VM , which uses a small array of virtual registers instead of a memory-based stack, could significantly reduce memory traffic and improve performance. This would require a more complex compiler capable of register allocation. Just-in-Time (JIT) compilation : The ultimate performance boost could come from a JIT compiler . Instead of interpreting bytecode, we could compile it to native machine code at runtime, eliminating the interpreter entirely. Maybe we could use LLVM to build a JIT compiler in Haskell. Parsing arithmetic expressions to Abstract Syntax Trees (ASTs). Unit testing for our parser. Interpreting ASTs. Compiling ASTs to bytecode. Disassembling and decompiling bytecode. Unit testing for our compiler. Property-based testing for our compiler. Efficiently executing bytecode in a virtual machine. Unit testing and property-based testing for our VM. Benchmarking our code to see how the different passes perform. All the while keeping an eye on performance. ArithVMLib.hs ArithVMSpec.hs ArithVMBench.hs ArithVMApp.hs Actually, QuickCheck does not generate entirely arbitrary inputs. It generates arbitrary inputs with increasing complexity—where the complexity is defined by the user—and asserts the properties on these inputs. When a test fails for a particular input, QuickCheck also tries to simplify the culprit and tries to find the simplest input for which the test fails. This process is called Shrinking in QuickCheck parlance. QuickCheck then shows this simplest input to the user for them to use it to debug their code. ↩︎ Read this good introduction to QuickCheck if you are unfamiliar. ↩︎ Notice that we discard the expressions that do not compile successfully. ↩︎ and are not actual pointers, but indices into the stack and bytecode arrays respectively. ↩︎ Guided by the GHC profiler, I tweaked the code in many different ways and ran benchmarks for every change. Then I chose the code that was most performant. ↩︎ It is extremely important to write good tests before getting your hands dirty with performance optimizations. In my case, the tests saved me many times from breaking the VM while moving code around for performance. ↩︎ We are using our AST interpreter as a definitional interpreter, assuming it to be correctly implemented because of its simpler nature. ↩︎ I ran all benchmarks on an Apple M4 Pro 24GB machine against a 142MB file generated using the expression generator we wrote earlier. ↩︎ I don’t claim to be a great or even a good C programmer. In fact, this C VM is the first substantial C code I have written in decades. I’m sure the code is not most optimized. It may even be ridden with memory management bugs. If you find something wrong, please let me know in the comments. ↩︎ I tried various RTS options to tweak GHC garbage collection, but the defaults proved to be fastest. ↩︎ The lines of code are for only the overlapping functionalities between C and Haskell versions. ↩︎ I did try using Direct Threading and Subroutine Threading in the C code, but they resulted in slower code than the switch-case variant. GCC may be smart enough in case of this simple VM to optimize the switch-case to be faster than threaded code. ↩︎ You may have noticed that the C function is not laid out in the exact same manner as the Haskell function. In case of C, moving the checks to the front did not yield in performance improvement. I suspect this may be because GCC is smart enough to do that optimization by itself. The nested were also no detriment to the performance of the C code. ↩︎

0 views

Automata All the Way Down

Fabs and EDA companies collaborate to provide the abstraction of synchronous digital logic to hardware designers. A hardware design comprises: A set of state elements (e.g., registers and on-chip memories), which retain values from one clock cycle to another A transfer function , which maps the values of all state elements at clock cycle N to new values of all state elements at clock cycle The transfer function cannot be too fancy. It can be large but cannot be defined with unbounded loops/recursion. The pragmatic reason for this restriction is that the function is implemented with physical gates on a chip, and each gate can only do one useful thing per clock cycle. You cannot loop the output of a circuit element back to itself without delaying the value by at least one clock cycle (via a state element). It feels to me like there is a deeper reason why this restriction must exist. Many people dabbled with synchronous digital logic in college. If you did, you probably designed a processor, which provides the stored program computer abstraction to software developers. And here comes the inception: you can think of a computer program as a transfer function. In this twisted mindset, the stored program computer abstraction enables software engineers to define transfer functions . For example, the following pseudo-assembly program: Can be thought of as the following transfer function: In the stored program computer abstraction, state elements are the architectural registers plus the contents of memory. As with synchronous digital logic, there are limits on what the transfer function can do. The switch statement can have many cases, but the body of each case block is defined by one instruction. Alternatively, you can define the transfer function at the basic block level (one case per basic block, many instructions inside of each case). Programming in assembly is a pain, so higher level languages were developed to make us less crazy. And here we go again, someone could write an interpreter for C. A user of this interpreter works at the C level of abstraction. Following along with our previous pattern, a C program comprises: A set of state elements (variables, both global and local) A transfer function For example, the following C function: Can be thought of with as the following transfer function: Think of and as intrinsics used to implement function calls. The key building blocks of the transfer function are state ments. It is easy to just store the term “statement” into your brain without thinking of where the term comes from. A state ment is a thing which can alter state . This transformation of an imperative program into a transfer function seems strange, but some PL folks do it all the time. In particular, the transfer function view is how small step operational semantics are defined. And of course this can keep going. One could write a Python interpreter in C, which allows development at a higher level of abstraction. But even at that level of abstraction, programs are defined in terms of state elements (variables) and a transfer function (statements). The term Turing Tax was originally meant to describe the performance loss associated with working at the stored-program computer level of abstraction instead of the synchronous digital logic level of abstraction. This idea can be generalized. At a particular level of abstraction, code defines the transfer function while data is held in the state elements. A particular set of bits can simultaneously be described as code at one level of abstraction, while defined as data at a lower level. This code/data duality is intimately related to the Turing Tax. The Turing Tax collector is constantly looking for bags of bits which can be interpreted as either code or data, and he collects his tax each time he finds such a situation. An analogous circumstance arises in hardware design. Some signals can be viewed as either part of the data path or the control path, depending on what level of abstraction one is viewing the hardware from. A compiler is one trick to avoid the Turing Tax by translating code (i.e., a transfer function) from a higher level of abstraction to a lower level. We all felt awkward when I wrote “interpreter for C” earlier, and now we can feel better about it. JIT compilers for Python are one way to avoid the Turing Tax. Another example is an HLS compiler which avoids the Turing Tax between the stored-program computer abstraction layer and the synchronous digital logic layer. No, this section isn’t about your Fitbit. Let’s call each evaluation of a transfer function a step . These steps occur at each level of abstraction. Let’s define the ultimate performance goal that we care about to be the number of steps required to execute a computation at the synchronous digital logic level of abstraction. The trouble with these layers of abstraction is that typically a step at a higher layer of abstraction requires multiple steps at a lower layer. For example, the multi-cycle processor implementation you learned about in a Patterson and Hennessy textbook could require 5 clock cycles to execute each instruction (instruction fetch, register fetch, execute, memory, register write back). Interpreters have the same behavior: one Python statement may be implemented with many C statements. Now imagine the following house of cards: A Python interpreter which requires an average of 4 C statements to implement 1 Python statement A C compiler which requires an average of 3 machine instructions to implement 1 C statement A processor which requires an average of 5 clock cycles to execute 1 machine instruction When the Turing property tax assessor sees this house, they tax each level of the house . In this system, an average Python statement requires (4 x 3 x 5) 60 clock cycles! Much engineering work goes into avoiding this problem (pipelined and superscalar processors, multi-threading, JIT compilation, SIMD). Partial evaluation is another way to avoid the Turing Tax. Partial evaluation transforms data into code . There must be some other method of creating abstractions which is more efficient. Self-modifying code is rarely used in the real world (outside of JIT compilers). Self-modifying code seems crazy to reason about but potentially could offer large performance gains. Partial evaluation is also rarely used but has a large potential. Subscribe now A set of state elements (e.g., registers and on-chip memories), which retain values from one clock cycle to another A transfer function , which maps the values of all state elements at clock cycle N to new values of all state elements at clock cycle A set of state elements (variables, both global and local) A transfer function Code vs Data At a particular level of abstraction, code defines the transfer function while data is held in the state elements. A particular set of bits can simultaneously be described as code at one level of abstraction, while defined as data at a lower level. This code/data duality is intimately related to the Turing Tax. The Turing Tax collector is constantly looking for bags of bits which can be interpreted as either code or data, and he collects his tax each time he finds such a situation. An analogous circumstance arises in hardware design. Some signals can be viewed as either part of the data path or the control path, depending on what level of abstraction one is viewing the hardware from. Compilers vs Interpreters A compiler is one trick to avoid the Turing Tax by translating code (i.e., a transfer function) from a higher level of abstraction to a lower level. We all felt awkward when I wrote “interpreter for C” earlier, and now we can feel better about it. JIT compilers for Python are one way to avoid the Turing Tax. Another example is an HLS compiler which avoids the Turing Tax between the stored-program computer abstraction layer and the synchronous digital logic layer. Step Counting (Multiple Taxation) No, this section isn’t about your Fitbit. Let’s call each evaluation of a transfer function a step . These steps occur at each level of abstraction. Let’s define the ultimate performance goal that we care about to be the number of steps required to execute a computation at the synchronous digital logic level of abstraction. The trouble with these layers of abstraction is that typically a step at a higher layer of abstraction requires multiple steps at a lower layer. For example, the multi-cycle processor implementation you learned about in a Patterson and Hennessy textbook could require 5 clock cycles to execute each instruction (instruction fetch, register fetch, execute, memory, register write back). Interpreters have the same behavior: one Python statement may be implemented with many C statements. Now imagine the following house of cards: A Python interpreter which requires an average of 4 C statements to implement 1 Python statement A C compiler which requires an average of 3 machine instructions to implement 1 C statement A processor which requires an average of 5 clock cycles to execute 1 machine instruction

0 views
Anton Zhiyanov 3 weeks ago

High-precision date/time in C

I've created a C library called vaqt that offers data types and functions for handling time and duration, with nanosecond precision. Works with C99 (C11 is recommended on Windows for higher precision). is a partial port of Go's package. It works with two types of values: and . Time is a pair (seconds, nanoseconds), where is the 64-bit number of seconds since zero time (0001-01-01 00:00:00 UTC) and is the number of nanoseconds within the current second (0-999999999). Time can represent dates billions of years in the past or future with nanosecond precision. Time is always operated in UTC, but you can convert it from/to a specific timezone. Duration is a 64-bit number of nanoseconds. It can represent values up to about 290 years. provides functions for common date and time operations. Creating time values: Extracting time fields: Calendar time: Time comparison: Time arithmetic: Formatting: Marshaling: Check the API reference for more details. Here's a basic example of how to use to work with time: If you work with date and time in C, you might find useful. See the nalgeon/vaqt repo for all the details.

0 views

Arrows to Arrows, Categories to Queries

I’ve had a little time off of work as of late, and been spending it in characteristically unwise ways. In particular, I’ve written a little programming language that compiles to SQL . I call it catlang . That’s not to say that I’ve written a new query language. It’s a programming language, whose compiler spits out one giant statement. When you run that query in postgres, you get the output of your program. Why have I done this? Because I needed a funny compilation target to test out the actual features of the language, which is that its intermediary language is a bunch of abstract category theory nonsense. Which I’ll get to. But I’m sure you first want to see this bad boy in action. Behold, the function that returns 100 regardless of what input you give it. But it does it with the equivalent of a while loop: If you’re familiar with arrow notation , you’ll notice the above looks kinda like one big block. This is not a coincidence (because nothing is a coincidence). I figured if I were to go through all of this work, we might as well get a working arrow desugarer out of the mix. But I digress; that’s a story for another time. Anyway, what’s going on here is we have an arrow , which takes a single argument . We then loop, starting from the value of . Inside the loop, we now have a new variable , which we do some voodoo on to compute —the current value of the loop variable. Then we subtract 100 from , and take the absolute value. The function here is a bit odd; it returns if the input was negative, and otherwise. Then we branch on the output of , where and have been renamed and respectively. If was less than zero, we find ourselves in the case, where we add 1 to and wrap the whole thing in —which the loop interprets as “loop again with this new value.” Otherwise, was non-negative, and so we can return directly. Is it roundabout? You bet! The obtuseness here is not directly a feature, I was just looking for conceptually simple things I could do which would be easy to desugar into category-theoretical stuff. Which brings us to the intermediary language. After desugaring the source syntax for above, we’re left with this IL representation: We’ll discuss all of this momentarily, but for now, just let your eyes glaze over the pretty unicode. The underlying idea here is that each of these remaining symbols has very simple and specific algebraic semantics. For example, means “do and pipe the result into .” By giving a transformation from this categorical IL into other domains, it becomes trivial to compile catlang to all sorts of weird compilation targets. Like SQL. You’re probably wondering what the generated SQL looks like. Take a peek if you dare. It’s not pretty, rather amazingly, running the above query in postgres 17 will in fact return a single row with a single column whose value is 100. And you’d better believe it does it by actually looping its way up to 100. If you don’t believe me, make the following change: which will instead return a row for each step of the iteration. There are some obvious optimizations I could make to the generated SQL, but it didn’t seem worth my time, since that’s not the interesting part of the project. Let’s take some time to discuss the underlying category theory here. I am by no means an expert, but what I have learned after a decade of bashing my head against this stuff is that a little goes a long way. For our intents and purposes, we have types, and arrows (functions) between types. We always have the identity “do nothing arrow” : and we can compose arrows by lining up one end to another: 1 Unlike Haskell (or really any programming language, for that matter), we DO NOT have the notion of function application. That is, there is no arrow: You can only compose arrows, you can’t apply them. That’s why we call these things “arrows” rather than “functions.” There are a bundle of arrows for working with product types. The two projection functions correspond to and , taking individual components out of pairs: How do we get things into pairs in the first place? We can use the “fork” operation, which takes two arrows computing and , and generates a new arrow which generates a pair of : If you’re coming from a Haskell background, it’s tempting to think of this operation merely as the pair constructor. But you’ll notice from the type of the computation that there can be no data dependency between and , thus we are free to parallelize each side of the pair. In category theory, the distinction between left and right sides of an arrow is rather arbitrary. This gives rise to a notion called duality where we can flip the arrows around, and get cool new behavior. If we dualize all of our product machinery, we get the coproduct machinery, where a coproduct of and is “either or , but definitely not both nor neither.” Swapping the arrow direction of and , and replacing with gives us the following injections: and the following “join” operation for eliminating coproducts: Again, coming from Haskell this is just the standard function. It corresponds to a branch between one of two cases. As you can see, with just these eight operations, we already have a tremendous amount of expressivity. We can express data dependencies via and branching via . With we automatically encode opportunities for parallelism, and gain the ability to build complicated data structures, with and allowing us to get the information back out of the data structures. You’ll notice in the IL that there are no variable names anywhere to be found. The desugaring of the source language builds a stack (via the pattern), and replaces subsequent variable lookups with a series of projections on the stack to find the value again. On one hand, this makes the categorical IL rather hard to read, but it makes it very easy to re-target! Many domains do have a notion of grouping, but don’t have a native notion of naming. For example, in an electronic circuit, I can have a ribbon of 32 wires which represents an . If I have another ribbon of 32 wires, I can trivially route both wires into a 64-wire ribbon corresponding to a pair of . By eliminating names before we get to the IL, it means no compiler backend ever needs to deal with names. They can just work on a stack representation, and are free to special-case optimize series of projections if they are able to. Of particular interest to this discussion is how we desugar loops in catlang. The underlying primitive is : which magically turns an arrow on s into an arrow without the eithers. We obviously must run that arrow on eithers. If that function returns , then we’re happy and we can just output that. But if the function returns , we have no choice but to pass it back in to the eithered arrow. In Haskell, cochoice is implemented as: which as you can see, will loop until finally returns a . What’s neat about this formulation of a loop is that we can statically differentiate between our first and subsequent passes through the loop body. The first time through is , while for all other times it is . We don’t take advantage of it in the original program, but how many times have you written loop code that needs to initialize something its first time through? So that’s the underlying theory behind the IL. How can we compile this to SQL now? As alluded to before, we simply need to give SQL implementations for each of the operations in the intermediary language. As a simple example, compiles to , where is the input of the arrow. The hardest part here was working out a data representation. It seems obvious to encode each element of a product as a new column, but what do we do about coproducts? After much work thought, I decided to flatten out the coproducts. So, for example, the type: would be represented as three columns: with the constraint that exactly one of or would be at any given point in time. With this hammered out, almost everything else is pretty trivial. Composition corresponds to a nested query. Forks are s which concatenate the columns of each sub-query. Joins are s, where we add a clause to enforce we’re looking at the correct coproduct constructor. Cochoice is the only really tricky thing, but it corresponds to a recursive CTE . Generating a recursive CTE table for the computation isn’t too hard, but getting the final value out of it was surprisingly tricky. The semantics of SQL tables is that they are multisets and come with an arbitrary greatest element. Which is to say, you need an column structured in a relevant way in order to query the final result. Due to some quirks in what postgres accepts, and in how I structured my queries, it was prohibitively hard to insert a “how many times have I looped” column and order by that. So instead I cheated and added a column which looks at the processor clock and ordered by that. This is clearly a hack, and presumably will cause problems if I ever add some primitives which generate more than one row, but again, this is just for fun and who cares. Send me a pull request if you’re offended by my chicanery! I’ve run out of vacation time to work on this project, so I’m probably not going to get around to the meta-circular stupidity I was planning. The compiler still needs a few string-crunching primitives (which are easy to add), but then it would be simple to write a little brainfuck interpreter in catlang. Which I could then compile to SQL. Now we’ve got a brainfuck interpreter running in postgres. Of course, this has been done by hand before, but to my knowledge, never via compilation. There exist C to brainfuck compilers. And postgres is written in C. So in a move that would make Xzibit proud, we could run postgres in postgres. And of course, it would be fun to run brainfuck in brainfuck. That’d be a cool catlang backend if someone wanted to contribute such a thing. I am not the first person to do anything like this. The source language of catlang is heavily inspired by Haskell’s arrow syntax , which in turn is essentially a desugaring algorithm for Arrows . Arrows are slightly the wrong abstraction because they require an operation —which requires you to be able to embed Haskell functions in your category, something which is almost never possible. Unfortunately, arrow syntax in Haskell desugars down to for almost everything it does, which in turn makes arrow notation effectively useless. In an ideal world, everything I described in this blog post would be a tiny little Haskell library, with arrow notation doing the heavy lifting. But that is just not the world we live in. Nor am I the first person to notice that there are categorical semantics behind programming languages. I don’t actually know whom to cite on this one, but it is well-established folklore that the lambda calculus corresponds to cartesian-closed categories . The “closed” part of “cartesian-closed” means we have an operation , but everyone and their dog has implemented the lambda calculus, so I thought it would be fun to see how far we can get without it. This is not a limitation on catlang’s turing completeness (since gives us everything we need.) I’ve been thinking about writing a category-first programming language for the better part of a decade, ever since I read Compiling to Categories . That paper takes Haskell and desugars it back down to categories. I stole many of the tricks here from that paper. Anyway. All of the code is available on github if you’re interested in taking a look. The repo isn’t up to my usual coding standards, for which you have my apologies. Of note is the template-haskell backend which can spit out Haskell code; meaning it wouldn’t be very hard to make a quasiquoter to compile catlang into what Haskell’s arrow desugaring ought to be. If there’s enough clamor for such a thing, I’ll see about turning this part into a library. When looking at the types of arrows in this essay, we make the distinction that are arrows that we can write in catlang, while exist in the metatheory. ↩︎ When looking at the types of arrows in this essay, we make the distinction that are arrows that we can write in catlang, while exist in the metatheory. ↩︎

1 views
allvpv’s space 3 weeks ago

Environment variables are a legacy mess: Let's dive deep into them

Programming languages have rapidly evolved in recent years. But in software development, the new often meets the old, and the scaffolding that OS gives for running new processes hasn’t changed much since Unix. If you need to parametrize your application at runtime by passing a few ad-hoc variables (without special files or a custom solution involving IPC or networking), you’re doomed to a pretty awkward, outdated interface: There are no namespaces for them, no types. Just a flat, embarrassingly global dictionary of strings. But what exactly are these envvars? Is it some kind of special dictionary inside the OS? If not, who owns them and how do they propagate? In a nutshell: they’re passed from parent to child. On Linux, a program must use the syscall to execute another program. Whether you type in Bash, call in Python, or launch a code editor, it ultimately comes down to , preceded by a / . The family of C functions also relies on . This system call takes three arguments: , , . For example, for an invocation: By default, all envvars are passed from the parent to the child. However, nothing prevents a parent process from passing a completely different or even empty environment when calling ! In practice, most tooling passes the environment down: Bash, Python’s , the C library , and so on. And this is what you expect – variables are inherited by child processes. That’s the point – to track the environment. Which tools do not pass the parent’s environment? For example, the executable, used when signing into a system, sets up a fresh environment for its children. After launching the new program, the kernel dumps the variables on the stack as a sequence of null-terminated strings which contain the envvar definitions. Here is a hex view: This static layout can’t easily be modified or extended; the program must copy those variables into its own data structure. Let’s look at how Bash, C, and Python store envvars internally. I analyzed their source code and here is a summary. It stores the variables in a hashmap . Or, more precisely, in a stack of hashmaps . When you spawn a new process using Bash, it traverses the stack of hashmaps to find variables marked as exported and copies them into the environment array passed to the child. Side note: Why is traversing the stack needed? Each function invocation in Bash creates a new local scope – a new entry on the stack. If you declare your variable with , it ends up in this locally-scoped hashmap. What’s interesting is that you can export a variable too! I wouldn’t have learned this without diving into Bash source. My intuitive (wrong) assumption was that automatically makes the variable global – like ! Super interesting stuff. exposes a dynamic array, managed via and library functions. It uses an array, so the time complexity of and is linear in the number of envvars. Remember – envvars are not a high-performance dictionary and you should not abuse them. Python couples its environment to the C library, which can cause surprising inconsistencies. If you’ve programmed some Python, you’ve probably used the dictionary. On startup, is built from the C library’s array. But those dictionary values are NOT the “ground truth” for child processes. Rather, each change to invokes the native function, which in turn calls the C library’s . Note that the propagation is one-directional: modifying will call , but not the other way around. Call , and won’t be updated. The Linux kernel is very liberal about the format of environment variables, and so is . For example, your C program can manipulate the environment – the global array – such that several variables share the same name but have different values. And when you execute a child process, it will inherit this “broken” setup. You don’t even need an equals sign separating name from value! The usual entry is , but nothing prevents you from adding to the array. The kernel happily accepts any null-terminated string as an “environment variable” definition. It just imposes a size limitation: Single variable : 128 KiB on a typical x64 Intel CPU. This is for the whole definition – name + equal sign + value. It’s computed as . No modern hardware uses pages smaller than 4 KiB, so you can treat it as a lower bound, unless you need to deal with some legacy embedded systems. Total : 2 MiB on a typical machine. This limit is shared by envvars and the command line arguments. The calculation is a bit more complicated (see the man page): On a typical system, the limiting factor is the . Remember, initially the envvars are dumped on the stack! To prevent unpredictable crashes, the system allows only 1/4 of the stack for the envvars. But the fact that you can do something does not mean that you should. For example, if you start Bash with the “broken” environment – duplicated names and entries without – it deduplicates the variables and drops the nonsense. One interesting edge case is a space inside the variable name . My beloved shell – Nushell – has no problem with the following assignment: Python is fine with it, too. Bash, on the other hand, can’t reference it because whitespace isn’t allowed in variable names. Fortunately, the variable isn’t lost – Bash keeps such entries in a special hashmap called and still passes them to child processes. So what name and value can you safely use for your envvar? A popular misconception, repeated on StackOverflow and by ChatGPT, is that POSIX permits only uppercase envvars, and everything else is undefined behavior. But this is seriously NOT what the standard says : These strings have the form name=value; names shall not contain the character ‘=’. For values to be portable across systems conforming to POSIX.1-2017, the value shall be composed of characters from the portable character set (except NUL and as indicated below). There is no meaning associated with the order of strings in the environment. If more than one string in an environment of a process has the same name, the consequences are undefined. Environment variable names used by the utilities in the Shell and Utilities volume of POSIX.1-2017 consist solely of uppercase letters, digits, and the <underscore> ( ‘_’ ) from the characters defined in Portable Character Set and do not begin with a digit. Other characters may be permitted by an implementation; applications shall tolerate the presence of such names. Uppercase and lowercase letters shall retain their unique identities and shall not be folded together. The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities. Yes, POSIX-specified utilities use uppercase envvars, but that’s not prescriptive for your programs. Quite the contrary: you’re encouraged to use lowercase for your envvars so they don’t collide with the standard tools. The only strict rule is that a variable name cannot contain an equals sign. POSIX requires compliant applications to preserve all variables that conform to this rule. But in reality, not many applications use lowercase. The proper etiquette in software development is to use . …to use for names, and UTF-8 for values. You shouldn’t hit problems on Linux. If you want to be super safe: instead of UTF-8, use the POSIX-mandated Portable Character Set (PCS) – essentially ASCII without control characters. …and I hope it wasn’t a boring read. is the (the executable path), is the array of command line arguments – the implicit first (“zero”) argument is usually the executable name, is the array of envvars (typically much longer). Single variable : 128 KiB on a typical x64 Intel CPU. This is for the whole definition – name + equal sign + value. It’s computed as . No modern hardware uses pages smaller than 4 KiB, so you can treat it as a lower bound, unless you need to deal with some legacy embedded systems. Total : 2 MiB on a typical machine. This limit is shared by envvars and the command line arguments. The calculation is a bit more complicated (see the man page): On a typical system, the limiting factor is the . Remember, initially the envvars are dumped on the stack! To prevent unpredictable crashes, the system allows only 1/4 of the stack for the envvars.

0 views

JIT: so you want to be faster than an interpreter on modern CPUs…

Since my previous blog entry about JIT compiler for PostgreSQL, sadly not much happened due to a lack of time, but still some things were done (biggest improvement was the port to ARM64, a few optimizations, implementing more opcodes…). But I am often asking myself how to really beat the interpreter… And on “modern” CPUs, with a well written interpreter, that’s far more complicated than many would imagine. So in order to explain all this and show how I am planning to improve performance (possibly of the interpreter itself too, thus making this endeavor self-defeating), let’s first talk about… If you already know about all the topics mentioned in this title, feel free to jump to the next section . Note that the following section is over-simplified to make the concepts more accessible. I am writing this blog post on a Zen 2+ CPU. If I upgraded to a Zen 3 CPU, same motherboard, same memory, I would get an advertised 25% performance jump in single thread benchmarks while the CPU frequency would be only 2% higher. Why such a discrepancy? Since the 90s and the Pentium-class CPUs, x86 has followed RISC CPUs in the super-scalar era. Instead of running one instruction per cycle, when conditions are right, several instructions can be executed at the same time. Let’s consider the following pseudo-code: X and Y can be calculated at the same time. The CPU can execute these on two integer units, fetch the results and store them. The only issue is the computation of Z: everything must be done before this step, making it impossible for the CPU to go further without waiting for the previous results. But now, what if the code was written as follow: Every step would require waiting for the previous one, slowing down the CPU terribly. Hence the most important technique used to implement superscalar CPUs: out-of-order execution. The CPU will fetch the instructions, dispatch them in several instruction queues, and resolve dependencies to compute Y before computing Z1 in order to have it ready sooner. The CPU is spending less time idling, thus the whole thing is faster. But, alas, what would happen with the following function? Should the CPU wait for X and Y before deciding which Z to compute? Here is the biggest trick: it will try its luck and compute something anyway. This way, if its bet was right, a lot of time will be saved, otherwise the mistake result will be dropped and the proper computation will be done instead. This is called branch prediction, it has been the source of many fun security issues (hello meltdown), but the performance benefits are so huge that one would never consider disabling this. Most interpreters will operate on an intermediate representation, using opcodes instead of directly executing from an AST or similar. So you could use the following main loop for an interpreter. This is how many, many interpreters were written. But this has a terrible drawback at least when compiled that way: it has branches all over the place from a single starting point (most if not all optimizing compilers will generate a jump table to optimize the dispatch, but this will still jump from the same point). The CPU will have a hard time predicting the right jump, and is thus losing a lot of performance. If this was the only way an interpreter could be written, generating a function by stitching the code together would save a lot of time, likely giving a more than 10% performance improvement. If one look at Python, removing this switch made the interpreter 15 to 20% faster. Many project, including PostgreSQL, use this same technique, called “computed gotos”. After a first pass to fill in “label” targets in each step, the execution would be When running a short sequence of operations in a loop, the jumps will be far more predictable, making the branch predictor’s job easier, and thus improving the speed of the interpreter. Now that we have a very basic understanding of modern CPUs and the insane level of optimization they reach, let’s talk about fighting the PostgreSQL interpreter on performance. I will not discuss optimizing the tuple deforming part (aka. going from on-disc structure to the “C” structure used by the code), this will be a topic for a future blog post when I implement it in my compiler. As you may know, PostgreSQL has a very complete type system with operators overloading. Even this simple query ends up being a call to int4eq, a strict function that will perform the comparison. Since it is a strict function, PostgreSQL must check that the arguments are not null, otherwise the function is not called and the result will be null. If you execute a very basic query like the one in the title, PostgreSQL will have the following opcodes: The EEOP_FUNCEXPR_STRICT_2 will perform the null check, and then call the function. If we unroll all the opcodes in real C code, we end up with the following: We can already spot one optimization: why do we check the two arguments, including our constant, against null? It will never change for the entire run of this query and thus each comparison is going to use an ALU, and branch depending on that comparison. But of course the CPU will notice the corresponding branch pattern, and will thus be able to remain active and feed its other units. What is the real cost of such a pointless comparison? For this purpose, I’ve broken a PostgreSQL instance and replaced all FUNCEXPR_STRICT with a check on one argument only, and one with no STRICT check (do not try this at home!). Doing 10 times a simple SELECT * FROM demo WHERE a = 42 on a 100 million rows table, with no index, here are the two perf results: So, even if this is not the optimization of the century, it’s not that expensive to make, so… why not do it? (Patch coming to pgsql-hackers soon) But a better optimization is to go all-in on inlining. Indeed, instead of jumping through a pointer to the int4eq code (again, something that the CPU will optimize a lot), one could have a special opcode for this quite common operation. With this change alone (but keeping the two null checks, so there are still optimizations possible), we end up with the following perf results. Let’s sum up these results. The biggest change comes, quite obviously, from inlining the int4eq call. Why is it that much better? Because it reduces by quite a lot the number of instructions to run, and it removes a call to an address stored in memory. And this is again an optimization I could do on my JIT compiler that can also be done on the interpreter with the same benefits. The biggest issue here is that you must keep the number of opcodes within (unspecified) limits: too many opcodes could make the compiler job far worse. Well. At first, I thought the elimination of null checks could not be implemented easily in the interpreter. The first draft in my compiler was certainly invalid, but gave me interesting numbers (around 5%, as seen above) and made me want to go ahead. And I realized that implementing it cleanly in the interpreter was far easier than implementing it in my JIT compiler … Then I went with optimizing another common case, the call to int4eq, and, well… One could also add an opcode for that in the interpreter, and thus the performance gain of the JIT compiler are going to be minimal compared to the interpreter. Modern CPUs don’t make my job easy here. Most of the cost of an interpreter is taken away by the branch predictor and the other optimizations implemented in silicon. So is all hope lost, am I to declare the interpreter the winner against the limitations of the copy-patch method I have available for my JIT? Of course not, see you in the next post to discuss the biggest interpreter bottleneck! PS: help welcome. Last year I managed to spend some time working on this during my work time. Since then I’ve changed job, and can hardly get some time on this. I also tried to get some sponsoring to work on this and present at future PostgreSQL conferences, to no luck :/ If you can help in any way on this project, feel free to reach me (code contribution, sponsoring, missions, job offers, nudge nudge wink wink). Since I’ve been alone on this, a lot of things are dibbles on scratch paper, I benchmark code and stuff in my head when life gives me some boring time but testing it for real is of course far better. I have some travel planned soon so I hope for next part to be released before next year, with interesting results since my experiences have been as successful as anticipated.

0 views
Corrode 1 months ago

Prime Video

Are you one of over 240 million subscribers of Amazon’s Prime Video service? If so, you might be surprised to learn that much of the infrastructure behind Prime Video is built using Rust. They use a single codebase for media players, game consoles, and tablets. In this episode, we sit down with Alexandru Ene, a Principal Engineer at Amazon, to discuss how Rust is used at Prime Video, the challenges they face in building a global streaming service, and the benefits of using Rust for their systems. CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch. Start for free today and enjoy 40% off any paid plan by using this link . Prime Video is a streaming service offered by Amazon that provides a wide range of movies, TV shows, and original content to its subscribers. With over 240 million subscribers worldwide, Prime Video is one of the largest streaming platforms in the world. In addition to its vast content library, Prime Video also offers features such as offline viewing, 4K streaming, and support for multiple devices. On the backend, Prime Video relies on a variety of technologies to deliver its content, including Rust, which is used for building high-performance and reliable systems that can handle the demands of a global audience. Alexandru worked on the transition of Prime Video’s user interface from JavaScript to Rust. He has been with Amazon for over 8 years and previously worked at companies like Ubisoft and EA. He has a background in computer science and is an active open source maintainer. Alexandru lives in London. Ferris Makes Emulators Ep.001 - The Journey Begins - First episode of a famous series where Jake Taylor wrote a Nintendo 64 emulator in Rust from scratch CMake - Very common build system used in C++ applications Conan - C++ Package Manager community project C++ Smart Pointers - Still a footgun Herb Sutter: The Free Lunch Is Over - The seminal 2005 paper that highlights the importance of concurrency, well past C++’s mainstream adoption Rust in Production: cURL - Baseline library used everywhere, written in C, but performant and safe Prime Video Platforms - One app runs on all of these WebAssembly (WASM) - Enabling Rust code with good performance that you can still download and run like JavaScript, avoiding the need for firmware updates on some devices Entity Component System - Used in the UI Rust code for pages in the app Bevy - Game engine written in Rust Leptos - UI framework that makes reactive programming in Rust easier tokio - The de facto standard async runtime for Rust SIMD - A nice feature set some CPUs support WebAssembly Micro Runtime - A tiny WASM runtime well suited for IoT platforms WebAssembly Working Group Amazon Prime Video Rust & WASM for UI: Faster Prime Video on ANY Device - Alexandru Ene, QCon San Francisco 2024 Alexandru Ene on LinkedIn Alexandru’s Blog Alexandru Ene on GitHub

0 views
NorikiTech 1 months ago

Motivation for learning Rust

Part of the “ Rustober ” series. Rust is my next first-choice language for making things. It now possesses a combination of traits (get it?) that make it an obvious choice. I’m writing Swift full-time at work (and have been since 2014) and I would have happily picked Swift if it ever became what it was supposed to: I simply cannot use Swift, my strongest language, for any of the projects I do outside of work. Frustrated, I have tried other languages in their strong domains: For me, a major constraint is that I work on multiple things and sometimes leave them for a year or more, and then come back and work on them some more. All of the languages above were used for different projects. Consequently, I need to remember all of them at once if I need to pick a project up again. However, looking back, all of them could be done in Rust without major changes. Why haven’t I learned Rust earlier then? I made an attempt a few years ago but found it too difficult for my taste. Considering the constraint above, it made sense. If it were one of many, it wouldn’t have worked. To work in big languages like C++ and Rust (or let’s be honest, in any language except Go), you need to commit, otherwise after a year of not using it, it goes completely out of your head. Besides reviving your half-dead project you’ll need to relearn the language as well. I also thought it was changing too fast (all of the above were changing much slower) and there was too much drama in the community, so its future was uncertain. Yet in the last few years Rust has proven that it is stable, mature and developer-friendly. The language, tooling and the amount of available libraries are improving (in a good way). I believe it’s a solid career choice as well on the horizon of a few years. Hell, it even plays well with AI because of the strong type system and borrow checker. I’m starting to use it in earnest to build the first project in it, let’s see how it goes. Remarkably, it seems much closer to Swift than I expected, with common use of optionals, , traits and enums with associated types. P.S. C# and Java are missing from the above list. While I did use both for a time, I was never comfortable fully committing to either because it’s a completely new set of libraries and ways of doing things. Swift never became truly cross-platform — I was really hoping for it. Rust did. Apple butchered the language after 4.0 to make SwiftUI look nice. A lot of complexity was added even though the community wasn’t happy. Swift compile times get worse and worse, and they are never mentioned as a priority. Rust compile times constantly improve. Et cetera… my list of grievances with Swift is very long. While I love Common Lisp, it’s simply too niche — I would need to implement most of what I need myself, and I have limited time. I couldn’t find in myself the will to commit to it without second-guessing. C++ is a solid choice for lower-level tasks, but nobody uses it to write web services, and half of my ideas are web services. I liked using C to write all of Typingvania . I understand its appeal and draw, but in C, most things felt tedious , and far from everything could be abstracted enough to be ergonomic. Handling memory didn’t bother me, I learned most of my lessons through objects’ lifetimes in C++. Finally, I wrote multiple things in Go, and it probably came closest to what I was looking for (which was the reason why I committed to it in the first place). However, I got progressively more disillusioned because of its C interop, its lack of constraints, the poor generics (too little, too late), and general inelegance (as a result of its nonexpressiveness). Yes, I could achieve a lot quickly and pick the project back up later, as long as I used Go exclusively, and not as part of something else.

0 views
Matthias Endler 1 months ago

On Choosing Rust

Since my professional writing on Rust has moved to the corrode blog , I can be a bit more casual on here and share some of my personal thoughts on the recent debate around using Rust in established software. The two projects in question are git ( kernel thread , Hacker News Discussion ) and the recently rewritten coreutils in Rust , which will ship with Ubuntu 25.10 Quizzical Quokka . What prompted me to write this post is a discussion on Twitter and a blog post titled “Are We Chasing Language Hype Over Solving Real Problems?” . In both cases, the authors speculate about the motivations behind choosing Rust, and as someone who helps teams use Rust in production, I find those takes… hilarious. Back when I started corrode, people always mentioned that Rust wasn’t used for anything serious. I knew about the production use cases from client work, but there was very little public information out there. As a consequence, we started the ‘Rust in Production’ podcast to show that companies indeed choose Rust for real-world applications. However, people don’t like to be proven wrong, so that conspiracy theory has now morphed into “Big Rust” trying to take over the world. 😆 Let’s look at some of the claims made in the blog post and Twitter thread and see how these could be debunked pretty easily. “GNU Core Utils has basically never had any major security vulnerabilities in its entire existence” If only that were true. A quick CVE search shows multiple security issues over the decades, including buffer overflows and path traversal vulnerabilities. Just a few months ago, a heap buffer under-read was found in , which would cause a leak of sensitive data if an attacker sends a specially crafted input stream. The GNU coreutils are one of the most widely used software packages worldwide with billions of installations and hundreds (thousands?) of developers looking at the code. Yes, vulnerabilities still happen. No, it is not easy to write correct, secure C code. No, not even if you’re extra careful and disciplined. is five thousand lines long. (Check out the source code ). That’s a lot of code for printing file names and metadata and a big attack surface! “Rust can only ever match C performance at best and is usually slower” Work by Trifecta shows that it is possible to write Rust code that is faster than C in some cases. Especially in concurrent workloads and with memory safety guarantees. If writing safe C code is too hard, try writing safe concurrent C code! That’s where Rust shines. You can achieve ridiculous levels of parallelization without worrying about security issues. And no, you don’t need to litter your code with blocks. Check out Steve Klabnik’s recent talk about Oxide where he shows that their bootloader and their preemptive multitasking OS, hubris – both pretty core systems code – only contain 5% of code each. You can write large codebases in Rust with no unsafe code at all. As a trivial example, I sat down to rewrite in Rust one day. The result was 3x faster than GNU on my machine. You can read the post here . All I did was use to copy data, which saves one memory copy. Performance is not only dependent on the language but on the algorithms and system calls you use. If you play into Rust’s strengths, you can match C’s performance. At least there is no technical limitation that would prevent this. And I personally feel more willing to aggressively optimize my code in Rust, because I don’t have to worry about introducing memory safety bugs. It feels like I’m not alone . “We reward novelty over necessity in the industry” This ignores that most successful companies (Google, Meta, etc.) primarily use battle-tested tech stacks, not bleeding-edge languages. These companies have massive codebases and cannot afford to rewrite everything in the latest trendy language. But they see the value of using Rust for new components and gradually rewriting existing ones. That’s because 70% of security vulnerabilities are memory safety issues and these issues are extremely costly to fix. If these companies could avoid switching to a new language to do so, they would. Besides, Rust is not exactly new anymore. Rust 1.0 was released 10+ years ago! The industry is moving slowly, but not that slowly. You’d be surprised to find out how many established companies use Rust without even announcing it or thinking of it as “novelty”. “100% orchestrated” Multiple people in the Twitter thread were convinced this is some coordinated master plan rather than developers choosing better tools, while the very maintainers of git and coreutils openly discussed their motivations in public forums for everyone to see. “They’re trying to replace/erase C. It’s not going to happen” They are right. C is not going away anytime soon. There is just so much C/C++ code out there in the wild, and rewriting everything in Rust is not feasible. The good news is that you can incrementally rewrite C/C++ code in Rust, one component at a time. That’s what the git maintainers are planning, by using Rust for new components. “They’re rewriting software with a GNU license into software with an MIT license” Even if you use Rust, you can still license your code under GPL or any other license you want. Git itself remains GPL, and many Rust projects use various licenses, not only MIT. The license fear is often brought up by people who don’t understand how open source licensing works or it might just be FUD. MIT code is still compatible with GPL code and you can use both of them in the same project without issues. It’s just that the end product (the thing you deliver to your users, i.e. binary executables) is now covered by GPL because of its virality. “It’s just developers being bored and wanting to work with shiny new languages” The aging maintainers of C projects are retiring, and there are fewer new developers willing to pick up C just to maintain legacy code in their free time. C developers are essentially going extinct. New developers want to work with modern languages and who can blame them? Or would you want to maintain a 40-year-old COBOL codebase or an old Perl script? We have to move on. “Why not build something completely new instead of rewriting existing tools?” It’s not that easy. The code is only part of the story. The other part is the ecosystem, the tooling, the integrations, the documentation, and the user base. All of that takes years to build. Users don’t want to change their workflows, so they want drop-in replacements. Proven interfaces and APIs, no matter how crude and old-fashioned, have a lot of value. But yes, new tools are being built in Rust as well. “They don’t know how to actually solve problems, just chase trends” Talk about dismissing the technical expertise of maintainers who’ve been working on these projects for years or decades and understand the pain points better than anyone. If they were just chasing trends, they wouldn’t be maintaining these projects in the first place! These people are some of the most experienced developers in the world, and yet people want to tell them how to do their jobs. “It’s part of the woke mind virus infecting software” Imagine thinking memory safety is a political conspiracy. Apparently preventing buffer overflows is now an ideological stance. The closest thing to this is the White House’s technical report which recommends memory-safe languages for government software and mandating memory safety for software receiving federal funding is a pretty reasonable take. Conclusion I could go on, but I think you get my point. People who give Rust an honest chance know that it offers advantages in terms of memory safety, concurrency, and maintainability. It’s not about chasing hype but about long-term investment in software quality. As more companies successfully adopt Rust every day, it increasingly becomes the default choice for many new projects. If you’re interested in learning more about using Rust in production, check out my other blog or listen to the Rust in Production podcast . Oh, and if you know someone who posts such takes, stop arguing and send them a link to this post.

0 views
Dayvster 1 months ago

Are We Chasing Language Hype Over Solving Real Problems?

## Intro As you may have heard or seen, there is a bit of controversy around Ubuntu adopting a rewritten version of GNU Core Utils in Rust. This has sparked a lot of debate in the tech community. This decision by Canonical got me thinking about this whole trend or push of rewriting existing software in Rust which seems to be happening a lot lately. To put it bluntly I was confused by the need to replace GNU Core Utils with a new implementation as GNU Core Utils has been around since arguably the 90s and more realistically 2000s and it has been battle tested and proven to be reliable, efficient, effective and most importantly secure as it had basically never had any major security vulnerabilities in its entire existence. So why then would we deem it necessary to replace it with a new implementation in Rust? Why would anyone go through the trouble of rewriting something that already works perfectly fine and has been doing so for decades? When the end result at best is going to be a tool that does the same thing as the original and in the very best case scenario offer the same performance? What bothers me even more is the bigger pattern this points to. Are we as developers more interested in chasing new languages and frameworks than actually solving real problems? I strongly subscribe to the idea that software development 60% problem solving and 40% creative exploration and innovation. But lately it feels like the balance is shifting more and more towards the latter. We seem to be more interested in trying out the latest and greatest languages and frameworks than actually solving real problems. ## The Hype of New Languages and Shiny Object Syndrome We've all been there in the past haven't we? Getting excited about a new programming language or framework that promises to solve all our problems and make our lives easier. It's easy to get caught up in the hype and want to try out the latest and greatest technology, but it's important to remember that just because something is new doesn't mean it's better. We need to be careful not to fall into the trap of "shiny object syndrome" where we chase after the latest trends without considering whether they actually solve our problems or improve our workflows. It's important to evaluate new technologies based on their merits and how they fit into our existing systems and workflows, rather than simply jumping on the bandwagon because everyone else is doing it. Now for the important question: **Do I think the developers of coreutils-rs are doing this just because Rust is the new hotness?** Short and simple: No, no I do not. I believe they have good intentions and are likely trying to improve upon the existing implementation in some way. However, I do not agree with them that there is a need for a rewritten version of GNU Core Utils in Rust. I also do not agree that GNU Core Utils is inherently insecure or unsafe. ### Why do we get Exited About New Languages? It's also important to briefly touch upon the psychological aspect of why we get excited about new languages. New languages often come with new features, syntax, and paradigms that can be appealing to developers. They may also promise to solve problems that existing languages struggle with, such as performance, concurrency, or memory safety. Additionally, new languages can offer a fresh perspective on programming and can inspire creativity and innovation. Not to mention the community aspect, new usually means a changing of the guard, new people, new ideas, new ways of thinking about problems. All of these factors can contribute to the excitement and enthusiasm that developers feel when a new language is introduced. This enthusiasm can sometimes lead to an almost zealous approach of wanting everything and anything to be written only in the new language by this new and fresh community of developers. This can lead to a situation where existing and well-established software is rewritten in the new language, even if there is no real need for it. This can be seen as a form of "language evangelism" where developers are trying to promote their favorite language by rewriting existing software in it. ## The Case of GNU Core Utils As I've briefly touched upon earlier, GNU Core Utils is a collection of basic file, shell and text manipulation utilities that are fundamental to the operation of Unix-like operating systems. These utilities include commands like `ls`, `cp`, `mv`, `rm`, `cat`, `echo`, and many others. They are essential for performing everyday tasks in the command line interface (CLI) and are used by system administrators, developers, and users alike. Some of these can run hundreds of times per second, so performance is absolutely crucial. Even a small reduction in performance to a utility that is run by some OS critical daemon can have a significant impact on the overall performance of the system. GNU core utils has been optimized for this for about 30+ years at this point and is it really worth just tossing all of those lessons and optimizations out the window just to rewrite it in a new language? I've also briefly touched upon that at best in the absolute **best case scenario** a rewritten version of GNU Core Utils in Rust would be able to match the performance of the original implementation. As we know GNU Core Utils are mostly written in C and some C++ mixed in sparingly. So far benchmarks have shown time and time again that at best with a lot of optimizations and tweaks Rust can only ever match the performance of C and in most cases it is actually slower. So the best case outcome of this rewrite is that we get a tool that does the same thing as the original and at best offers the same performance. So what is the actual benefit of this rewrite? Where is the value, what is the actual problem that is being solved here? ## When Hype Overshadows Real Problems This is the crux of the issue, it's very very easy to get swept up in the excitement of a new language and want to use it for everything and anything under the sun. As developers we love novelty and communities with enthusiasm and fresh ideas. It's stimulating it's fun it feels like progress it feels like we are finally building something again instead of just rehashing and maintaining. We all know from personal experience that creating a new project is more fun and enjoyable than maintaining and existing one and this is a natural human tendency. Now do I think this is one of the reasons the developers of coreutils-rs are doing this? Yes, I do. But in the end they are solving a problem that does not exist. ### It's not just about Core Utils Now with how often I've mentioned this specific example of GNU Core Utils you might think I want to single them out or have some sort of grudge or specific issue with this particular project. No, not really... I think this project is indicative of the larger issue we face in the tech community. It's very easy to get caught up in the excitement of new languages and frameworks and want to use them for everything and anything. This can lead to a situation where we are rewriting existing software in new languages without considering whether it actually solves any real problems or improves our workflows. ## Problem Solving Should Be Our North Star At the end of the day, software development is about solving real problems, not about chasing novelty, shiny new languages, or personal curiosity. Every line of code we write, every framework we adopt, every library we integrate should be justified by the problems it helps solve, not by the hype around it. Yet increasingly, it feels like the industry is losing sight of this. We celebrate engineers for building in the “new hot language,” for rewriting tools, or for adopting the latest framework, even when the original solution worked perfectly fine. We reward novelty over necessity, excitement over impact. This is not a phenomenon isolated to just core utils or systems programming, it happens in web development, mobile development, data science, and pretty much every other area of software development. We too often abandon tried and true solutions and ideas of new and exciting shiny ones without considering whether they actually solve any real problems or improve our workflows. For example web development went full circle with React Server Components where we went from separation of concerns straight back to PHP style mixing of HTML and logic, server rendering and delivering interactive components to the client. Or the whole GraphQL craze where traditional REST APIs were abandoned en masse for a new and exciting way of doing things that promised to solve the dreaded problem of "over-fetching" and "under-fetching" of data. Yet in reality, it introduced a whole new set of problems and complexities that were not present in traditional REST APIs. Or perhaps the whole microservices and microfrontend craze where a lot of projects were abandoned or rewritten to be split into smaller and smaller pieces, **Was it all bad? Should we always just stick to what works and only ever maintain legacy projects and systems? Heck no!** There is definitely a place for innovation and new ideas in software development. New languages, frameworks, and tools can bring fresh perspectives and approaches to problem-solving. However, it's important to evaluate these new technologies based on their merits and how they fit into our existing systems and workflows, rather than simply jumping on the bandwagon because everyone else is doing it. We need to be more critical and thoughtful about the technologies we adopt and the projects we undertake. We need to ask ourselves whether a new language or framework actually solves a real problem or improves our workflows, or if we're just chasing novelty for its own sake. ## A Final Thought At the end of the day, it’s not about Rust, React, GraphQL, or the latest microservices fad. It’s about solving real problems. Every line of code, every framework, every rewrite should have a purpose beyond curiosity or hype. We live in a culture that rewards novelty, celebrates “cool” tech, and often mistakes excitement for progress. But progress isn’t measured in how many new languages you touch, or how many shiny rewrites you ship, it’s measured in impact, in the problems you actually solve for users, teams, and systems. So next time you feel the pull of the newest hot language, framework, or tool, pause. Ask yourself: “Am I solving a real problem here, or just chasing excitement?” Because at the end of the day, engineering isn’t about what’s trendy, it’s about what works, what matters, and what actually makes a difference. And that, my friends, is the craft we should all be sharpening.

0 views
Dayvster 1 months ago

Why I Still Reach for C for Certain Projects

## The Project That Made Me Choose C Again So a while back I got tired of xserver and decided to give wayland a try. I use Arch (BTW) with a tiling window manager (dwm) and I wanted to try something new, since I had a couple of annoyances with xserver. I've heard some good things about wayland so I thought you know what, why not let's give it a shot. After 30-45min my Arch and Hyprland setup was done and ready to go. I was pretty happy with it, but I was missing some features I've previously had such as notifications for when somebody posts new content to their RSS feed. I quite like RSS feeds I use them to keep up to date with blogs, news, streams, releases etc. So I thought to myself, why not write a small program that checks my RSS feeds and sends me a notification when there's something new. Now the way I was gonna go about this was pretty simple. I would write a simple C daemon that would run in the background, check my RSS feeds on a random interval between 5-15min and if there was something new it would send out a notification using `notify-send`. Then comes the tricky part I wanted `swaync` to allow me to offer two buttons on the notification, one to open the link in my default browser and one to ignore the notification and mark it as ignored in a flat file on my system so that I could see if I have to remove certain feeds that I tend to ignore a lot. Now here's the problem, you can't really do that with `swaync`, I mean it does support buttons but it doesn't really let you handle the button clicks in a way that would allow you to do what I wanted. So I had to come up with a workaround. The workaround was to have another C program run as a daemon that would listen on the DBus for a very specific notfication that would contain an action with a string of `openUrl("url")` or `ignoreUrl("url")` and then handle the action accordingly. This way I could have `swaync` send out a notification with the buttons and when the user(me) clicks on one of the buttons it would send out a DBus notification that my second daemon would pick up and handle. ## Why C Was the Right Choice Here? Now you might be wondering why I chose C for this project and not something like python which would allow me to write this specific program much faster and enjoy the rest of my weekend. Well my answer to you is simple, I don't like python and I didn't feel like wasting multiple 100s of MBs of RAM for something this simple and small. In fact I DID write it in Python and Go first with their respective DBus libraries, both were super easy to work with and I had an initial working prototype within less than an hour. But as I ran a simple `htop` I saw that the python version was using around 150MB of RAM and the Go version was using around 50MB of RAM. Now don't get me wrong, I'm not on an old thinkpad I have RAM to spare 32Gb to be exact. But why waste it on something this small and simple. Plus C is just so much more fun and exciting to work with. So I set up my C project which was just a simple `Makefile` and a couple of `.c` and `.h` files. I imported `dbus/dbus.h` and got to work. Now I'd be lying if I said it took me no time at all, in fact it took me roughly 3-4h which is a lot longer than python or Go. But in the end I had a working prototype that was using around 1-2MB of RAM and was super fast and responsive. I also got to brush up on my DBus skills which were a bit rusty since I hadn't worked with it in a while. Now getting a simple program like that from 150 to 50 to 1-2Mb of RAM usage is a huge performance improvement and it really showcases the power and strength of C as a programming language. Look at this this way I may have spent multiple hours longer writing this program in C but it will just continue to run in the background using a fraction of the resources that python or Go would have used for a long time to come. Additionally this probably won't be the only modification I make to my system now imagine if I were this lackluster with 10-20 small programs that I run in the background and let's make a wild assumption that each of those programs would use about 50-200Mb of RAM. Now we're looking at 500-4000Mb of RAM usage, or precisely one chrome tab! No I don't think that's an acceptable tradeoff for a couple of hours of my time. So in the end I think C was the right choice for this project and I'm pretty happy with the result. ## Why Modern Languages Aren't Always the Best Fit That long rant above is a cute story / anecdote. But I feel like it also highlights a bigger issue that developers face today. A lot of times we just wanna complete a certain task as quickly as possible and have it running before we've had our second coffee. I get it we're all busy people and when resources are as cheap as they are today saving a few Mb of RAM or a few ms of CPU time isn't really a big deal. But there is something so nice and satisfying about writing a small program in C that does exactly what you want it to do and nothing more. It's like a breath of fresh air in a world where everything is becoming more and more bloated and resource hungry. It feels a bit zen taking everything down to the bare minimum and just focusing on the task at hand. I'd liken developing in C to woodworking, you have to be precise and careful with your cuts, but in the end you get a beautiful product that you can be uniquely proud of. Sure you could go pick up a flat pack from your local IKEA and have a nice looking table within an hour. But it will be souless and generic and not really made well, not something you can proudly say to your visitors "Hey I built this myself". I mean you could... but they probably won't really be as impressed with your flat pack assembly as if you were to show them a hand crafted table you made yourself. ## C Excels in Low-Level System Programming and Efficiency So maybe the example project that I described above in the beginning of the article is too rickety or not really high brow enough that's fair I get that. But even if you think my project was stupid or silly or could have been done better or easier there is no denying that C simply is the gold standard for low level systems programming and efficiency. C gives you unparalleled control over system resources, memory management, and performance optimization. This makes it the go-to choice for developing operating systems, embedded systems, and performance-critical applications where every byte of memory and every CPU cycle counts. For example, the Linux kernel, which powers a vast majority of servers, desktops, and mobile devices worldwide, is primarily written in C. This is because C allows developers to write code that can directly interact with hardware and manage system resources efficiently. If you drive any modern vehicle that was created in the past decade or so, chances are that you have multiple microcontrollers in your car that are running C code to manage everything from engine performance, emergency breaking systems (AEB, ABS), distance control systems (ACC) and so on. For these systems you are very limited in the amount of RAM usage you can afford and most importantly you can not afford too much latency or delays in processing. Now how often have you heard about these systems failing and leading to fatal crashes? Not very often I'd wager. Of course safety standards, regulations and certifications play a big role in this but the choice of programming language is also a big factor. C is a mature and well-established language with a long history of use in safety-critical systems, making it a reliable choice for such applications. It does it's job exceptionally well and it does it with a very small footprint and most importantly you can run it in perpetuity on very limited hardware without worry over failure or issues due to resource exhaustion. ## Real World Scenarios Where C Shines Here are a few real-world scenarios where C is the preferred choice: - **Operating Systems**: As mentioned earlier, the Linux kernel is written in C. Other operating systems like Windows and macOS also have significant portions written in C. - **Embedded Systems**: C is widely used in embedded systems development for devices like microcontrollers, IoT devices, and automotive systems due to its efficiency and low-level hardware access. - **Game Development**: Many game engines and performance-critical game components are written in C or C++ to achieve high performance and low latency. - **Database Systems**: Many database management systems, such as MySQL and PostgreSQL, are implemented in C to ensure efficient data handling and query processing. - **Network Programming**: C is often used for developing network protocols and applications due to its ability to handle low-level socket programming and efficient data transmission. - **Compilers and Interpreters**: Many programming language compilers and interpreters are written in C to leverage its performance and low-level capabilities. - **High-Performance Computing**: C is used in scientific computing and simulations where performance is critical, such as in numerical libraries and parallel computing frameworks. - **System Utilities**: Many system utilities and command-line tools in Unix-like operating systems are written in C for efficiency and direct system access. - **Device Drivers**: C is the language of choice for writing device drivers that allow the operating system to communicate with hardware devices. - **Cryptography**: Many cryptographic libraries and algorithms are implemented in C to ensure high performance and security. - **Audio/Video Processing**: Libraries like FFmpeg are written in C to handle multimedia processing efficiently. - **Web Servers**: Popular web servers like Apache and Nginx are written in C to handle high loads and provide fast response times. You get my point, C is everywhere and it excels in scenarios where performance, efficiency, and low-level system access are paramount. ## When C Isn’t the Right Choice ? Now as you can see I'm a pretty big proponent of C and you'll often hear people who exclusively develop in C (sadly not me), say: "Anything and everything under the sun can be written in C". Which is true, if you were to remove all programming languages from existence save one, my choice would always be C as we could potentially rebuild everything from the ground up. However even I have to admit that C may not always be the best choice everything has it's ups and downs. So C may not be your best choice when: - **Rapid Prototyping**: If you need to quickly prototype an idea or concept, higher-level languages like Python or JavaScript may be more suitable due to their ease of use and extensive libraries. - **Web Development**: For building web applications, languages like JavaScript (with frameworks like React, Angular, or Vue) or Python (with frameworks like Django or Flask) are often preferred due to their ease of use and extensive web development libraries. - **Data Science and Machine Learning**: Languages like Python and R are commonly used in data science and machine learning due to their extensive libraries and frameworks (e.g., TensorFlow, PyTorch, Pandas). - **Mobile App Development**: For mobile app development, languages like Swift (for iOS) and Kotlin (for Android) are often preferred due to their platform-specific features and ease of use. - **Memory Safety**: If memory safety is a primary concern, languages like Rust or Go may be more suitable due to their built-in memory safety features and garbage collection(Go). - **Ease of Learning**: If you're new to programming, languages like Python or JavaScript may be easier to learn due to their simpler syntax and extensive learning resources. - **Community and Ecosystem**: If you need access to a large community and ecosystem of libraries and frameworks, languages like JavaScript, Python, or Java may be more suitable due to their extensive ecosystems. ## Conclusion So there you have it why I still reach for C for certain projects. It’s not always the fastest or easiest choice, but for small, efficient programs that need to run lean and mean, it often makes sense. C isn’t flashy or trendy, but it gets the job done, and sometimes that’s all that matters. Curious to hear how others tackle similar projects C, Rust, Go, or something else?

0 views
Dayvster 1 months ago

In Defense of C++

## The Reputation of C++ C++ has often and frequently been criticized for its complexity, steep learning curve, and most of all for its ability to allow the developers using it to not only shoot themselves in the foot, but to blow off their whole leg in the process. But do these criticisms hold up under scrutiny? Well, in this blog post, I aim to tackle some of the most common criticisms of C++ and provide a balanced perspective on its strengths and weaknesses. ## C++ is "Complex" C++ is indeed a complex language, with a vast array of features and capabilities. For any one thing you wish to achieve in C++, there are about a dozen different ways to do it, each with its own trade-offs and implications. So, as a developer, how are you to know which approach is the best one for your specific use case? Surely you have to have a deep understanding of the language to make these decisions, right? **Not really...** I mean, don't get me wrong, it helps, but it's not a hard requirement. Premature optimization is the root of all evil, and in C++, you can write perfectly fine code without ever needing to worry about the more complex features of the language. You can write simple, readable, and maintainable code in C++ without ever needing to use templates, operator overloading, or any of the other more advanced features of the language. There's this idea that for everything you want to do in any programming language, you need to use the most efficient and correct approach possible. Python has this with their pythonic way of doing things, Java has this, C# has this, and Go has this. Heck, even something as simple as painting HTML onto a browser needs to be reinvented every couple of years and argued about ad nauseam. Here's the thing, though, in most cases, there is no one right way to do something. The hallowed "best approach" is often just a matter of personal or team preference. The idea that if you just write your code in the "best" and correct way, you'll never need to worry about maintaining it is just plain wrong. Don't worry so much about using the "best" approach; worry more about writing code that is easy to read and understand. If you do that, you'll be fine. ## C++ is "Outdated" C++ is very old, in fact, it came out in 1985, to put it into perspective, that's 4 years before the first version of Windows was released, and 6 years before the first version of Linux came out, or to drive the point even further home, back when the last 8-bit computer was released. So yes, C++ is quite old by any standard. But does that make it outdated? **Hell no** it's not like C++ has just been sitting around unchanged from its 1985 release. C++ has been actively developed and improved upon for over 40 years now, with new features and capabilities being added all the time. The most recent version of the C++ standard, C++20, was released in 2020 and introduced a number of new features and improvements to the language. C++23 has introduced significant enhancements, particularly in the standard library and constexpr capabilities. Notably, concepts, ranges, and coroutines have been expanded, bringing modern programming paradigms to C++ and making the language more powerful and expressive th an ever before. **But Dave, what we mean by outdated is that other languages have surpassed C++ and provide a better developer experience.** Matter of personal taste, I guess, C++ is still one of the most widely used programming languages with a huge ecosystem of libraries and tools. It's used in a wide range of applications, from game development to high-performance computing to embedded systems. Many of the most popular and widely used software applications in the world are written in C++. I don't think C++ is outdated by any stretch of the imagination; you have to bend the definition of outdated quite a bit to make that claim. ## C++ is "Unsafe" Ah, finally, we get to the big one, and yes, I will draw comparisons to Rust as it's the "memory safe" language that a lot of people claim will or should replace C++. **In fact, let's get the main point out of the way right now.** ### Rewrites of C++ codebases to Rust always yield more memory-safe results than before. Countless companies have cited how they improved their security or the amount of reported bugs or memory leaks by simply rewriting their C++ codebases in Rust. **Now is that because of Rust?** I'd argue in some small part, yes. However, I think the biggest factor is that any rewrite of an existing codebase is going to yield better results than the original codebase. When you rewrite a codebase, you have the opportunity to rethink and redesign the architecture, fix bugs, and improve the overall quality of the code. You get to leverage all the lessons learned from the previous implementation, all the issues that were found and fixed, and you already know about. All the headaches that would be too much of a pain to fix in the existing codebase, you can just fix them in the new one. Imagine if you will that you've built a shed, it was a bit wobbly, and you didn't really understand proper wood joinery when you first built it, so it has a few other issues, like structural integrity and a leaky roof. After a few years, you build a new one, and this time you know all the mistakes you made the first time around, so you build it better, stronger, and more weatherproof. In the process, you decide to replace the materials you've previously used, say for example, instead of using maple, you opt for oak. Is it correct to say that the new shed is better only because you used oak instead of maple? Or is that a really small part of the overall improvement? That's how I feel when I see these companies claim that rewriting their C++ codebases in Rust has made them more memory safe. It's not because of Rust, it's because they took the time to rethink and redesign their codebase and implemented all the lessons learned from the previous implementation. ### But that does not deny the fact that C++ is unsafe. Yes, C++ can be unsafe if you don't know what you're doing. But here's the thing: all programming languages are unsafe if you don't know what you're doing. You can write unsafe code in Rust, you can write unsafe code in Python, you can write unsafe code in JavaScript. Memory safety is just one aspect of safety in programming languages; you can still write unsafe code in memory-safe programming languages. Just using Rust will not magically make your application safe; it will just make it a lot harder to have memory leaks or safety issues. The term "unsafe" is a bit too vague in this context, and I think it's being used as a catch-all term, which to me reeks of marketing speak. ### Can C++ be made safer? Yes, C++ can be made safer; in fact, it can even be made memory safe. There are a number of libraries and tools available that can help make C++ code safer, such as smart pointers, static analysis tools, and memory sanitizers. Heck, if you wish, you can even add a garbage collector to C++ if you really want to(please don't). But the easiest and most straightforward way to make C++ safer is to simply learn about smart pointers and use them wherever necessary. Smart pointers are a way to manage memory in C++ without having to manually allocate and deallocate memory. They automatically handle the memory management for you, making it much harder to have memory leaks or dangling pointers. This is the main criticism of C++ in the first place. ## C++ is Hard to Read Then don't write it that way. C++ is a multi-paradigm programming language; you can write procedural code, object-oriented code, functional code, or a mix of all three. You can write simple and readable code in C++ if you want to. You can also write complex and unreadable code in C++ if you want to. It's all about personal or team preference. Here's a rule of thumb I like to follow for C++: make it look as much like C as you possibly can, and avoid using too many advanced features of the language unless you really need to. Use smart pointers, avoid raw pointers, and use the standard library wherever possible. You can do a heck of a lot of programming by just using C++ as you would C and introducing complexity only when you really need to. ### But doesn't that defeat the whole purpose of C++? Why not just use C then? C++ is a superset of C you can write C code in C++, and it will work just fine. C++ adds a lot of features and capabilities to C. If you were to start with C, then you are locked with C, and that's fine for a lot of cases, don't get me wrong, but C++ gives you the option to use more advanced features of the language when you need them. You can start with C and then gradually introduce C++ features as you need them. You don't have to use all the features of C++ if you don't want to. Again, going back to my shed analogy, if you build a shed out of wood, you can always add a metal roof later if you need to. You don't have to build the whole shed out of metal if you don't want to. ## C++ has a confusing ecosystem C++ has a large ecosystem built over the span of 40 years or so, with a lot of different libraries and tools available. This can make it difficult to know which libraries and tools to use for a specific task. But this is not unique to C++; every programming language has this problem. Again, the simple rule of thumb is to use the standard library wherever possible; it's well-maintained and has a lot of useful features. For other tasks like networking or GUI development, there are a number of well-known libraries that are widely used and well-maintained. Do some research and find out which libraries are best suited for your specific use case. **Avoid boost like the plague.** Boost is a large collection of libraries that are widely used in the C++ community. However, many of the libraries in boost are outdated and no longer maintained. They also tend to be quite complex and difficult to use. If you can avoid using boost, do so. Unless you are writing a large and complex application that requires the specific features provided by Boost, you are better off using other libraries that are more modern and easier to use. Do not add the performance overhead and binary size bloat of Boost to your application unless you really need to. ## C++ is not a good choice for beginners Programming is not a good choice for beginners, woodworking is not a good choice for beginners, and car mechanics is not a good choice for beginners. Programming is hard; it takes time and effort to learn, as all things do. There is no general language that is really good for beginners; everything has its trade-offs. Fact is, if you wanna get into something like **systems programming or game development** then starting with Python or JavaScript won't really help you much. You will eventually need to learn C or C++. If your goal is to become a web developer or data scientist, then start with Python or JavaScript. If you just want a job in the programming industry, I don't know, learn Java or C#, both great languages that get a lot of undeserved hate, but offer a lot of job opportunities. Look, here's the thing: if you're just starting out in programming, yeah, it's gonna be hard no matter what language you choose. I'd actually argue that starting with C or C++ is far better than starting with something that obscures a lot of the underlying concepts of programming, I'd argue further that by starting with Python or Javascript you are doing yourself a disservice in the long run and trading off the pain of learning something when your understanding of a topic is still fresh and malleable for the pain of learning something later when you have a lot more invested in your current understanding of programming. But hey, that's just my opinion. ## C++ vs Rust: Friends or Rivals? Rust has earned a lot of love in recent years, and for good reason. It takes memory safety seriously, and its borrow checker enforces discipline that C++ often leaves to the programmer. That said, Rust is still building its ecosystem, and the learning curve can feel just as steep — just in different ways. C++ may not prevent you from shooting yourself in the foot, but it gives you decades of battle-tested tooling, compilers, and libraries that power everything from Chrome to Unreal Engine. In practice, many teams use Rust and C++ together rather than treating them as enemies. Rust shines in new projects where safety is the priority, while C++ continues to dominate legacy systems and performance-critical domains. ## Is C++ Still Used in 2025? The short answer: absolutely. Despite the constant chatter that it’s outdated, C++ remains one of the most widely used languages in the world. Major browsers like Chrome and Firefox are still written in it. Game engines like Unreal run on it. Automotive systems, financial trading platforms, and even AI frameworks lean heavily on C++ for performance and control. New standards (C++20, C++23) keep modernizing the language, ensuring it stays competitive with younger alternatives. If you peel back the layers of most large-scale systems we rely on daily, you’ll almost always find C++ humming away under the hood. ## Conclusion C++ is a powerful and versatile programming language that has stood the test of time. While it does have its complexities and challenges, it remains a relevant and widely used language in today's tech landscape. With the right approach and mindset, C++ can be a joy to work with and can yield high-performance and efficient applications. So next time you hear someone criticize C++, take a moment to consider the strengths and capabilities of this venerable language before dismissing it outright. Hope you enjoyed this blog post. If you did, please consider sharing it with your friends and colleagues. If you have any questions or comments, please feel free to reach out to me on [Twitter](https://twitter.com/dayvster).

0 views

Implementing Forth in Go and C

I first ran into Forth about 20 years ago when reading a book about designing embedded hardware . The reason I got the book back then was to actually learn more about the HW aspects, so having skimmed the Forth chapter I just registered an "oh, this is neat" mental note and moved on with my life. Over the last two decades I heard about Forth a few more times here and there, such as that time when Factor was talked about for a brief period, maybe 10-12 years ago or so. It always occupied a slot in the "weird language" category inside my brain, and I never paid it much attention. Until June this year, when a couple of factors combined fortuitously: And something clicked. I'm going to implement a Forth, because... why not? So I spent much of my free hacking time over the past two months learning about Forth and implementing two of them. It's useful to think of Forth (at least standard Forth , not offshoots like Factor) as having two different "levels": Another way to look at it (useful if you belong to a certain crowd) is that user-level Forth is like Lisp without macros, and hacker-level Forth has macros enabled. Lisp can still be great and useful without macros, but macros take it to an entire new level and also unlock the deeper soul of the language. This distinction will be important when discussing my Forth implementations below. There's a certain way Forth is supposed to be implemented; this is how it was originally designed, and if you get closer to the hacker level, it becomes apparent that you're pretty much required to implement it this way - otherwise supporting all of the language's standard words will be very difficult. I'm talking about the classical approach of a linked dictionary, where a word is represented as a "threaded" list [1] , and this dictionary is available for user code to augment and modify. Thus, much of the Forth implementation can be written in Forth itself. The first implementation I tried is stubbornly different. Can we just make a pure interpreter? This is what goforth is trying to explore (the Go implementation located in the root directory of that repository). Many built-in words are supported - definitely enough to write useful programs - and compilation (the definition of new Forth words using : word ... ; ) is implemented by storing the actual string following the word name in the dictionary, so it can be interpreted when the word is invoked. This was an interesting approach and in some sense, it "works". For the user level of Forth, this is perfectly usable (albeit slow). However, it's insufficient for the hacker level, because the host language interpreter (the one in Go) has all the control, so it's impossible to implement IF...THEN in Forth, for example (it has to be implemented in the host language). That was a fun way to get a deeper sense of what Forth is about, but I did want to implement the hacker level as well, so the second implementation - ctil - does just that. It's inspired by the jonesforth assembly implementation, but done in C instead [2] . ctil actually lets us implement major parts of Forth in Forth itself. For example, variable : Conditionals: These are actual examples of ctil's "prelude" - a Forth file loaded before any user code. If you understand Forth, this code is actually rather mind-blowing. We compile IF and the other words by directly laying our their low-level representation in memory, and different words communicate with each other using the data stack during compilation . Forth made perfect sense in the historic context in which it was created in the early 1970s. Imagine having some HW connected to your computer (a telescope in the case of Forth's creator), and you have to interact with it. In terms of languages at your disposal - you don't have much, even BASIC wasn't invented yet. Perhaps your machine still didn't have a C compiler ported to it; C compilers aren't simple, and C isn't very great for exploratory scripting anyway. So you mostly just have your assembly language and whatever you build on top. Forth is easy to implement in assembly and it gives you a much higher-level language; you can use it as a calculator, as a REPL, and as a DSL for pretty much anything due to its composable nature. Forth certainly has interesting aspects; it's a concatenative language , and thus inherently point-free . A classical example is that instead of writing the following in a more traditional syntax: You just write this: There is no need to explicitly pass parameters, or to explicitly return results. Everything happens implicitly on the stack. This is useful for REPL-style programming where you use your language not necessarily for writing large programs, but more for interactive instructions to various HW devices. This dearth of syntax is also what makes Forth simple to implement. All that said, in my mind Forth is firmly in the "weird language" category; it's instructive to learn and to implement, but I wouldn't actually use it for anything real these days. The stack-based programming model is cool for very terse point-free programs, but it's not particularly readable and hard to reason about without extensive comments, in my experience. Consider the implementation of a pretty standard Forth word: +! . It expects and address at the top of stack, and an addend below it. It adds the addend to the value stored at that address. Here's a Forth implementation from ctil's prelude: Look at that stack wrangling! It's really hard to follow what goes where without the detailed comments showing the stack layout on the right of each instruction (a common practice for Forth programs). Sure, we can create additional words that would make this simpler, but that just increases the lexicon of words to know. My point is, there's fundamental difficulty here. When you see this C code: Even without any documentation, you can immediately know several important things: Written in Forth [3] : How can you know the arity of the functions without adding explicit comments? Sure, if you have a handful of words like bar and foo you know like the back of your hand, this is easy. But imagine reading a large, unfamiliar code base full of code like this and trying to comprehend it. The source code of my goforth project is on GitHub ; both implementations are there, with a comprehensive test harness that tests both. The learn Forth itself, I found these resources very useful: To learn how to implement Forth: Implementing Forth is a great self-improvement project for a coder; there's a pleasantly challenging hump of understanding to overcome, and you gain valuable insights into stack machines, interpretation vs. compilation and mixing these levels of abstraction in cool ways. Also, implementing programming languages from scratch is fun! It's hard to beat the feeling of getting to interact with your implementation for the first time, and then iterating on improving it and making it more featureful. Just one more word ! Which is another deviation from the norm. Forth is really supposed to be implemented in assembly - this is what it was designed for, and it's very clear from its structure that it must be so in order to achieve peak performance. But where's the fun in doing things the way they were supposed to be done? Besides, jonesforth is already a perfectly fine Forth implementation in assembly, so I wouldn't have learned much by just copying it. I had a lot of fun coding in C for this one; it's been a while since I last wrote non-trivial amounts of C, and I found it very enjoyable.

0 views
Xe Iaso 2 months ago

Final Fantasy 14 on macOS with a 36 key keyboard

Earlier this year, I was finally sucked into Final Fantasy 14. I've been loving my time in it, but most of my playtime was on my gaming tower running Fedora. I knew that the game does support macOS, and I did get it working on my MacBook for travel, but there was one problem: I wasn't able to get my bars working with mouse and keyboard. A 36 key keyboard and MMO mouse combination for peak gaming. Final Fantasy 14 has a ridiculous level of customization. Every UI element can be moved and resized freely. Every action your player character can take is either bindable to arbitrary keybinds or able to be put in hotbars. Here's my hotbars for White Mage: My bars for the White Mage job, showing three clusters of actions along with a strip of quick actions up top. My bars have three "layers" to them: I have things optimized so that the most common actions I need to do are on the base layer. This includes spells like my single target / area of effect healing spells and my burst / damage over time spells. However, critical things like health regeneration, panic button burst healing, shields, and status dispelling are all in the shift and control layers. When I don't have instinctive access to these spells with button combos, I have to manually click on the buttons. This sucks. I ended up fixing this by installing Karabiner Elements , giving it access to the accessibility settings it needs, and enabling my mouse to be treated as a keyboard in its configuration UI. There's some other keyboard hacks that I needed to do. My little split keyboard runs QMK , custom keyboard firmware written in C that has a stupid number of features. In order to get this layout working with FFXIV, I had to use a combination of the following features: Here is what my keymap looks like: I use the combination of this to also do programming. I've been doing a few full blown Anubis features via this keyboard such as log filters . I'm still not up to full programming speed with it, but I'm slowly internalizing the keymap and getting faster with practice. Either way, Final Fantasy 14 is my comfort game and now I can play it on the go with all the buttons I could ever need. I hope this was interesting and I'm going to be publishing more of these little "how I did a thing" posts like this in the future. Let me know what you think about this!

0 views
xenodium 2 months ago

Emacs elevator pitch

Me: Have you heard of Emacs? Me: On the surface, it looks like a text editor… Me: But really, it's more like a gateway to a world moulded to your needs. Me: Emacs ships with an RPN calculator and even a doctor . Me: Naturally, it doesn't do everything I want it to do nor how I want it to. Me: Luckily, I can throw elisp at it and make it do things my way. Stranger: Huh?? Me: Emacs didn't quite do all the things I wanted it to, so… Me: I made it play music how I wanted it to . Me: Control my operating system ( 1 , 2 , 3 , 4 ). Me: Help me learn Japanese . Me: Trim videos … or video screenshots . Me: Talk to the LLM robots ( 1 , 2 ). Me: Preview SwiftUI layouts . Me: Batch-apply all sorts of utils . Me: Send notes to my Kindle . Me: Tweak the debugger . Me: Enhance my shell . Stranger: ??!? Me: Do what I mean . Me: Tweak my email client ( 1 , 2 ). Me: Bring closer macOS integration ( 1 , 2 , 3 , 4 ). Me: Scan QR codes . Me: Record Screencasts . Me: Build iOS apps . Me: Blog about all sorts of things . Me: Tailor completion . Me: Easily clone repos . Me: Use my preferred eye candy . Me: Evaluate Objective-C code . Stranger: Sir… Me: Write however I want . Me: Stitch images . Me: Make multiple cursors do what I want . Me: Make searching smarter . Me: Look up where I took photos . Me: SQLite feel like a spreadsheet . Me: Easily insert SF Symbols . Me: Build an emotional zone . Me: Tweak my file manager ( 1 , 2 , 3 ). Me: Generate documentation . Me: Gosh, I could keep going… Stranger: Sir this is a Wendy's . This post is part of the Emacs Carnival . This month is Your Elevator Pitch for Emacs, hosted by Jeremy Friesen . Learned something new? Enjoying this blog or my projects ? I am an 👉 indie dev 👈. Help make it sustainable by ✨ sponsoring ✨ Need a blog? I can help with that . Maybe buy my iOS apps too ;)

0 views
matklad 3 months ago

Zig's Lovely Syntax

It’s a bit of a silly post, because syntax is the least interesting detail about the language, but, still, I can’t stop thinking how Zig gets this detail just right for the class of curly-braced languages, and, well, now you’ll have to think about that too. On the first glance, Zig looks almost exactly like Rust, because Zig borrows from Rust liberally. And I think that Rust has great syntax, considering all the semantics it needs to express (see “Rust’s Ugly Syntax” ). But Zig improves on that, mostly by leveraging simpler language semantics, but also through some purely syntactical tasteful decisions. How do you spell a number ninety-two? Easy, . But what type is that? Statically-typed languages often come with several flavors of integers: , , . And there’s often a syntax for literals of a particular types: , , . Zig doesn’t have suffixes, because, in Zig, all integer literals have the same type: : The value of an integer literal is known at compile time and is coerced to a specific type on assignment or ascription: To emphasize, this is not type inference, this is implicit comptime coercion. This does mean that code like generally doesn’t work, and requires an explicit type. Raw or multiline strings are spelled like this: This syntax doesn’t require a special form for escaping itself: It nicely dodges indentation problems that plague every other language with a similar feature. And, the best thing ever: lexically, each line is a separate token. As Zig has only line-comments, this means that is always whitespace. Unlike most other languages, Zig can be correctly lexed in a line-by-line manner. Raw strings is perhaps the biggest improvement of Zig over Rust. Rust brute-forces the problem with syntax, which does the required job, technically, but suffers from the mentioned problems: indentation is messy, nesting quotes requires adjusting hashes, unclosed raw literal breaks the following lexical structure completely, and rustfmt’s formatting of raw strings tends to be rather ugly. On the plus side, this syntax at least cannot be expressed by a context-free grammar! For the record, Zig takes C syntax (not that C would notice): The feels weird! It will make sense by the end of the post. Here, I want only to note part, which matches the assignment syntax . This is great! This means that grepping for gives you all instances where a field is written to. This is hugely valuable: most of usages are reads, but, to understand the flow of data, you only need to consider writes. Ability to mechanically partition the entire set of usages into majority of boring reads and a few interesting writes does wonders for code comprehension. Where Zig departs from C the most is the syntax for types. C uses a needlessly confusing spiral rule. In Zig, all types are prefix: While pointer type is prefix, pointer dereference is postfix, which is a more natural subject-verb order to read: Zig has general syntax for “raw” identifiers: It is useful to avoid collisions with keywords, or for exporting a symbol whose name is otherwise not a valid Zig identifier. It is a bit more to type than Kotlin’s delightful , but manages to re-use Zig’s syntax for built-ins ( ) and strings. Like, Rust, Zig goes for function declaration syntax. This is such a massive improvement over C/Java style function declarations: it puts token (which is completely absent in traditional C family) and function name next to each other, which means that textual search for allows you to quickly find the function. Then Zig adds a little twist. While in Rust we write The arrow is gone! Now that I’ve used this for some time, I find arrow very annoying to type, and adding to the visual noise. Rust needs the arrow: Rust has lambdas with an inferred return type, and, in a lambda, the return type is optional. So you need some sort of an explicit syntax to tell the parser if there is return type: And it’s understandable that lambdas and functions would want to use compatible syntax. But Zig doesn’t have lambdas, so it just makes the type mandatory. So the main is Related small thing, but, as name of the type, I think I like more than . Zig is using and for binding values to names: This is ok, a bit weird after Rust’s, whose would be in Zig, but not really noticeable after some months. I do think this particular part is not great, because , the more frequent one, is longer. I think Kotlin nails it: , , . Note all three are monosyllable, unlike and ! Number of syllables matters more than the number of letters! Like Rust, Zig uses syntax for ascribing types, which is better than because optional suffixes are easier to parse visually and mechanically than optional prefixes. Zig doesn’t use and and spells the relevant operators as and : This is easier to type and much easier to read, but there’s also a deeper reason why they are not sigils. Zig marks any control flow with a keyword. And, because boolean operators short-circuit, they are control flow! Treating them as normal binary operator leads to an entirely incorrect mental model. For bitwise operations, Zig of course uses and . Both Zig and Rust have statements and expressions. Zig is a bit more statement oriented, and requires explicit returns: Furthermore, because there are no lambdas, scope of return is always clear. Relatedly, the value of a block expression is void. A block is a list of statements, and doesn’t have an optional expression at the end. This removes the semicolon problem — while Rust rules around semicolons are sufficiently clear (until you get to macros), there’s some constant mental overhead to getting them right all the time. Zig is more uniform and mechanical here. If you need a block that yields a value, Zig supports a general syntax for breaking out of a labeled block: Rust makes pedantically correct choice regarding s: braces are mandatory: This removes the dreaded “dangling else” grammatical ambiguity. While theoretically nice, it makes -expression one-line feel too heavy. It’s not the braces, it’s the whitespace around them: But the ternary is important! Exploding a simple choice into multi-line condition hurts readability. Zig goes with the traditional choice of making parentheses required and braces optional: By itself, this does create a risk of style bugs. But in Zig formatter (non-configurable, user-directed) is a part of the compiler, and formatting errors that can mask bugs are caught during compilation. For example, is an error due to inconsistent whitespace around the minus sign, which signals a plausible mixup of infix and binary minus. No such errors are currently produced for incorrect indentation (the value add there is relatively little, given ), but this is planned. NB: because Rust requires branches to be blocks, it is forced to make synonym with . Otherwise, the ternary would be even more unusable! Syntax design is tricky! Whether you need s and whether you make or mandatory in ifs are not orthogonal! Like Python, Zig allows on loops. Unlike Python, loops are expressions, which leads to a nicely readable imperative searches: Zig doesn’t have syntactically-infinite loop like Rust’s or Go’s . Normally I’d consider that a drawback, because these loops produce different control flow, affecting reachability analysis in the compiler, and I don’t think it’s great to make reachability dependent on condition being visibly constant. But! As Zig places semantics front and center, and the rules for what is and isn’t a comptime constant are a backbone of every feature, “anything equivalent to ” becomes sufficiently precise. Incidentally, these days I tend to write “infinite” loops as Almost always there is an up-front bound for the number of iterations until the break, and its worth asserting this bound, because debugging crashes is easier than debugging hangs. , , , , and all use the same Ruby/Rust inspired syntax for naming captured values: I like how the iterator comes first, and then the name of an item follows, logically and syntactically. I have a very strong opinion about variable shadowing. It goes both ways: I spent hours debugging code which incorrectly tried to use a variable that was shadowed by something else, but I also spent hours debugging code that accidentally used a variable that should have been shadowed! I really don’t know whether on balance it is better to forbid or encourage shadowing! Zig of course forbids shadowing, but what’s curious is that it’s just one episode of the large crusade against any complexity in name resolution. There’s no “prelude”, if you want to use anything from std, you need to import it: There are no glob imports, if you want to use an item from std, you need to import it: Zig doesn’t have inheritance, mixins, argument-dependent lookup, extension functions, implicit or traits, so, if you see , that is guaranteed to be a boring method declared on type. Similarly, while Zig has powerful comptime capabilities, it intentionally disallows declaring methods at compile time. Like Rust, Zig used to allow a method and a field to share a name, because it actually is syntactically clear enough at the call site which is which. But then this feature got removed from Zig. More generally, Zig doesn’t have namespaces. There can be only one kind of in scope, while Rust allows things like I am astonished at the relative lack of inconvenience in Zig’s approach. Turns out that is all the syntax you’ll ever need for accessing things? For the historically inclined, see “The module naming situation” thread in the rust mailing list archive to learn the story of how rust got its syntax. The lack of namespaces touches on the most notable (by its absence) feature of Zig syntax, which deeply relates to the most profound aspect of Zig’s semantics. Everything is an expression. By which I mean, there’s no separate syntactic categories of values, types, and patterns. Values, types, and patterns are of course different things. And usually in the language grammar it is syntactically obvious whether a particular text fragment refers to a type or a value: So the standard way is to have separate syntax families for the three categories, which need to be internally unambiguous, but can be ambiguous across the categories because the place in the grammar dictates the category: when parsing , everything until is a pattern, stuff between and is a type, and after we have a value. There are two problems here. First, there’s a combinatorial explosion of sorts in the syntax, because, while three categories describe different things, it turns out that they have the same general tree-ish shape. The second problem is that it might be hard to maintain category separation in the grammar. Rust started with the three categories separated by a bright line. But then, changes happen. Originally, Rust only allowed syntax for assignment. But today you can also write to do unpacking like Similarly, the turbofish used to move the parser from the value to the type mode, but now const parameters are values that can be found in the type position! The alternative is not to pick this fight at all. Rather than trying to keep the categories separately in the syntax, use the same surface syntax to express all three, and categorize later, during semantic analysis. In fact, this is already happens in the example — these are different things! One is a place (lvalue) and another is a “true” value (rvalue), but we use the same syntax for both. I don’t think such syntactic unification necessarily implies semantic unification, but Zig does treat everything uniformly, as a value with comptime and runtime behavior (for some values, runtime behavior may be missing, for others — comptime): The fact that you can write an where a type goes is occasionally useful. But the fact that simple types look like simple values syntactically consistently make the language feel significantly less busy. As a special case of everything being an expression, instances of generic types look like this: Just a function call! Though, there’s some resistance to trickery involved to make this work. Usually, languages rely on type inference to allow eliding generic arguments. That in turn requires making argument syntax optional, and that in turn leads to separating generic and non-generic arguments into separate parameter lists and some introducer sigil for generics, like or . Zig solves this syntactic challenge in the most brute-force way possible. Generic parameters are never inferred, if a function takes 3 comptime arguments and 2 runtime arguments, it will always be called with 5 arguments syntactically. Like with the (absence of) importing flourishes, a reasonable reaction would be “wait, does this mean that I’ll have to specify the types all the time?” And, like with import, in practice this is a non-issue. The trick are comptime closures. Consider a generic : We have to specify type when creating an instance of an . But subsequently, when we are using the array list, we don’t have to specify the type parameter again, because the type of variable already closes over . This is the major truth of object-orienting programming, the truth so profound that no one even notices it: in real code, 90% of functions are happiest as (non-virtual) methods. And, because of that, the annotation burden in real-world Zig programs is low. While Zig doesn’t have Hindley-Milner constraint-based type inference, it relies heavily on one specific way to propagate types. Let’s revisit the first example: This doesn’t compile: and are different values, we can’t select between two at runtime because they are different. We need to coerce the constants to a specific runtime type: But this doesn’t kick the can sufficiently far enough and essentially reproduces the with two incompatible branches. We need to sink coercion down the branches: And that’s exactly how Zig’s “Result Location Semantics” works. Type “inference” runs a simple left-to-right tree-walking algorithm, which resembles interpreter’s . In fact, is exactly what happens. Zig is not a compiler, it is an interpreter. When evaluates an expression, it gets: When interpreting code like the interpreter passes the result location ( ) and type down the tree of subexpressions. If branches store result directly into object field (there’s a inside each branch, as opposed to one after the ), and each coerces its comptime constant to the appropriate runtime type of the result. This mechanism enables concise syntax for specifying enums: When evaluates the switch, it first evaluates the scrutinee, and realizes that it has type . When evaluating arm, it sets result type to for the condition, and a literal gets coerced to . The same happens for the second arm, where result type further sinks down the . Result type semantics also explains the leading dot in the record literal syntax: Syntactically, we just want to disambiguate records from blocks. But, semantically, we want to coerce the literal to whatever type we want to get out of this expression. In Zig, is a shorthand for . I must confess that did weird me out a lot at first during writing code (I don’t mind reading the dot). It’s not the easiest thing to type! But that was fixed once I added snippet, expanding to . The benefits to lightweight record literal syntax are huge, as they allow for some pretty nice APIs. In particular, you get named and default arguments for free: I don’t really miss the absence of named arguments in Rust, you can always design APIs without them. But they are free in Zig, so I use them liberally. Syntax wise, we get two features (calling functions and initializing objects) for the price of one! Finally, the thing that weirds out some people when they see Zig code, and makes others reconsider their choice GitHub handles, even when they haven’t seen any Zig: syntax for built-in functions. Every language needs to glue “userspace” code with primitive operations supported by the compiler. Usually, the gluing is achieved by making the standard library privileged and allowing it to define intrinsic functions without bodies, or by adding ad-hoc operators directly to the language (like Rust’s ). And Zig does have a fair amount of operators, like or . But the release valve for a lot of functionality are built-in functions in distinct syntactic namespace, so Zig separates out , , , , , , , , , and . There’s no need to overload casting when you can give each variant a name. There’s also for type ascription. The types goes first, because the mechanism here is result type semantics: evaluates the first argument as a type, and then uses that as the type for the second argument. Curiously, I think actually can be implemented in the userspace: In Zig, a type of function parameter may depend on values of preceding (comptime) ones! My favorite builtin is . First, it’s the most obvious way to import code: Its crystal clear where the file comes from. But, second, it is an instance of reverse syntax sugar. You see, import isn’t really a function. You can’t do The argument of has to be a string, syntactically. It really is syntax, except that the function-call form is re-used, because it already has the right shape. So, this is it. Just a bunch of silly syntactical decisions, which add up to a language which is positively enjoyable to read. As for big lessons, obviously, the less features your language has, the less syntax you’ll need. And less syntax is generally good, because varied syntactic constructs tend to step on each other toes. Languages are not combinations of orthogonal aspects. Features tug and pull the language in different directions and their combinations might turn to be miraculous features in their own right, or might drag the language down. Even with a small feature-set fixed, there’s still a lot of work to pick a good concrete syntax: unambiguous to parse, useful to grep, easy to read and not to painful to write. A smart thing is of course to steal and borrow solutions from other languages, not because of familiarity, but because the ruthless natural selection tends to weed out poor ideas. But there’s a lot of inertia in languages, so there’s no need to fear innovation. If an odd-looking syntax is actually good, people will take to it. Is there anything about Zig’s syntax I don’t like? I thought no, when starting this post. But in the process of writing it I did discover one form that annoys me. It is the while with the increment loop: This is two-thirds of a C-style loop (without the declarator), and it sucks for the same reason: control flow jumps all over the place and is unrelated to the source code order. We go from condition, to the body, to the increment. But in the source order the increment is between the condition and the body. In Zig, this loop sucks for one additional reason: that separating the increment I think is the single example of control flow in Zig that is expressed by a sigil, rather than a keyword. This form used to be rather important, as Zig lacked a counting loop. It has form now, so I am tempted to call the while-with-increment redundant. Annoyingly, is almost equivalent to But not exactly: if contains a , or , the version would run the one extra time, which is useless and might be outright buggy. Oh well.

0 views