Latest Posts (20 found)

i'm just having fun

IT IS ONLY COMPUTER i work professionally on a compiler and write about build systems in my free time and as a result people often say things to me like "reading your posts points to me how really smart you are" or "reading a lot of this shit makes me feel super small". this makes me quite uncomfortable and is not the reaction i'm seeking when i write blog posts. i mean, in some sense if you work as a professional programmer it is a competition, because the job market sucks right now. but i think usually when people say they feel dumb, it's not in the sense of "how am i supposed to get a job when jyn exists" but more "jyn can do things i can't and that makes me feel bad". all the things i know i learned by experimenting with them, or by reading books or posts or man pages or really obscure error messages. sometimes there's a trick to it but sometimes it's just hard work . i am not magic. you can learn these things too . if you don't want to spend a bunch of time learning about how computers work, you don't have to! not knowing about gory computer internals does not make you dumb or computer illiterate or anything. everyone has their own specialty and mine is compilers and build systems. i don't know jack shit about economics or medicine! having a different specialty than me doesn't mean you're dumb. i really hate that computing and STEM have this mystique in our society. to the extent that engineering demonstrates intelligence, it's by repeatedly forcing you to confront the results of your own mistakes , in such a way that errors can't be ignored. there are lots of ways to do that which don't involve programming or college-level math! performance art and carpentry and running your own business or household all force you to confront your own mistakes in this way and deserve no less respect than STEM. by and large, when i learn new things about computers, it's because i'm fucking around. the fucking around is the point. if all the writing helps people learn and come up with cool new ideas, that's neat too. half the time the fucking around is just to make people say "jyn NO". half the time it's because i want to make art with my code. i really, sincerely, believe that art is one of the most important uses for a computer. i'm not doing this for the money. i happened to get very lucky that my passion pays very well, but i got into this industry before realizing how much programmers actually make, and now that i work for a european company i don't make US tech salaries anyway. i do it for the love of the game. some extracts from the jyn computer experience: you really shouldn't take advice from me lol however! if you are determined to do so anyway, what i can do is point you towards: highest thing i can recommend is building a tool for yourself. maybe it's a spreadsheet that saves you an hour of work a week. maybe it's a little website you play around with. maybe it's something in RPGmaker. the exact thing doesn't matter, the important part is that it's fun and you have something real at the end of it, which motivates you to keep going even when the computer is breaking in three ways you didn't even know were possible. second thing i can recommend is looking at things other people have built. you won't understand all of it and that's ok. pick a part of it that looks interesting and do a deep dive on how it works. i can recommend the following places to look when you're getting started: most importantly, remember: Mozilla Development Network StackOverflow alice maz, "how I think when I think about programming"

0 views

what is a build system, anyway?

Andrew Nesbitt recently wrote a post titled What is a Package Manager ? This post attempts to do the same for build systems. At a high level, build systems are tools or libraries that provide a way to define and execute a series of transformations from input data to output data that are memoized by caching them in an object store . Transformations are called steps or rules 1 and define how to execute a task that generates zero or more outputs from zero or more inputs. A rule is usually the unit of caching ; i.e. the cache points are the outputs of a rule, and cache invalidations must happen on the inputs of a rule. Rules can have dependencies on previous outputs, forming a directed graph called a dependency graph . Dependencies that form a cyclic graph are called circular dependencies and are usually banned. 2 Outputs that are only used by other rules, but not “interesting” to the end-user, are called intermediate outputs . A output is outdated , dirty , or stale if one of its dependencies is modified, or, transitively , if one of its dependencies is outdated. Stale outputs invalidate the cache and require the outputs to be rebuilt . An output that is cached and not dirty is up-to-date . Rules are outdated if any of their outputs are outdated. If a rule has no outputs, it is always outdated. Each invocation of the build tool is called a build . A full build or clean build occurs when the cache is empty and all transformations are executed as a batch job . A cache is full if all its rules are up-to-date. An incremental build occurs when the cache is partially full but some outputs are outdated and need to be rebuilt. Deleting the cache is called cleaning . A build is correct or sound if all possible incremental builds have the same result as a full build. 3 A build is minimal (occasionally optimal ) if rules are rerun at most once per build, and only run if necessary for soundness ( Build Systems à la Carte , Pluto ). In order for a build to be sound, all possible cache invalidations must be tracked as dependencies. A build system without caching is called a task runner or batch compiler . Note that task runners still often support dependencies even if they don't support caching. Build systems with caching can emulate a task runner by only defining tasks with zero outputs, but they are usually not designed for this use case. 4 Some examples of build systems: , , rustc. Some examples of task runners: , shell scripts, gcc . A build can be either inter-process , in which case the task is usually a single process execution and its input and output files, or intra-process , in which case a task is usually a single function call and its arguments and return values. In order to track dependencies, either all inputs and outputs must be declared in source code ahead of time, or it must be possible to infer them from the execution of a task. Build systems that track changes to a rule definition are called self-tracking . Past versions of the rule are called its history ( Build Systems à la Carte ). The act of inferring dependencies from runtime behavior is called tracing . If a traced rule depends on a dependency that hasn’t been built yet, the build system may either error, suspend the task and resume it later once the dependency is built, or abort the task and restart it later once the dependency is built ( Build Systems à la Carte ). Inter-process builds often declare their inputs and outputs, and intra-process builds often infer them, but this is not inherent to the definition. 5 Some example of intra-process builds include spreadsheets, the wild linker, and memoization libraries such as python’s . A build graph is applicative if all inputs, outputs, and rules are declared ahead of time. We say in this case the graph is statically known . Very few build systems are purely applicative, almost all have an escape hatch. The graph is monadic if not all outputs are known ahead of time, or if rules can generate other rules dynamically at runtime. Inputs that aren’t known ahead of time are called dynamic dependencies . Dynamic dependencies are weaker than a fully monadic build system, in the sense that they can express fewer build graphs. 6 Build systems that do not require declaring build rules are always monadic. Some examples of monadic build systems include Shake , ninja , and Cargo build scripts. Some examples of applicative build systems include (with recursive make and self-rebuilding Makefiles disallowed), Bazel (excluding native rules), and map/reduce libraries with memoization, such as this unison program . If a dirty rule R has an outdated output, reruns, and creates a new output that matches the old one, the build system has an opportunity to avoid running later rules that depend on R. Taking advantage of that opportunity is called early cutoff . See the rustc-dev-guide for much more information about early cutoff. 7 In unsound build systems, it’s possible that the build system does not accurately detect that it needs to rebuild. Such systems sometimes offer a way to force-rerun a target: keeping the existing cache, but rerunning a single rule. For inter-process build systems, this often involves ing a file to set its modification date to the current time. A build executor runs tasks and is responsible for scheduling tasks in an order that respects all dependencies, often using heuristics such as dependency depth or the time taken by the task on the last run. They also detect whether rule inputs have been modified, making the rule outdated; this is called rebuild detection . The build executor is responsible for restarting or suspending tasks in build systems that support it. Executors usually schedule many tasks in parallel, but this is not inherent to the definition. Executors often provide progress reporting , and sometimes allow querying the dependency graph. Occasionally they trace the inputs used by the task to enforce they match the declared dependencies, or to automatically add them to an internal dependency graph. In the context of inter-process builds, an artifact is an output file generated by a rule. 8 A source file is an input file that is specific to the current project 9 (sometimes repository or workspace ) as opposed to a system dependency that is reused across multiple projects. A project is loosely defined but generally refers to the set of all input and output files that the build system knows about, usually contained in a single directory. Source files can be generated , which means they are an output of a previous rule. Build files contain rule definitions, including (but not limited to) task definitions, input and output declarations, and metadata such as a human-readable description of the rule. Inputs are usually split into explicit inputs passed to the spawned process, implicit inputs that are tracked by the build system but not used in the task definition, and order-only inputs that must exist before the rule can execute, but do not invalidate the cache when modified. Process executions have more inputs than just files, such as the rule itself, environment variables, the current time, the current working directory, and occasionally network services or local daemons 10 . The set of all inputs that are not source files or command line arguments is called the environment . Processes can be sandboxed to prevent them from depending on the network, a daemon, or occasionally system dependencies; this is sometimes called a sandboxed environment or isolated environment . System dependencies are more expansive than I think they are often understood to be. They include compilers, linkers, programming language libraries 11 , and static and dynamically linked object files , but also the dynamic loader, language runtime, and various system configuration files. The subset of these dependencies needed for building a minimal program in a given language, along with various tools for inspecting and modifying the outputs at runtime, are called a toolchain . Toolchains are inherently specific to a given language, but sometimes (e.g. in GCC) a single compiler will support multiple languages as inputs. A build is hermetic (rarely, self-contained or isolated 12 ) if it uses no system dependencies and instead defines all its dependencies in the project ( Bazel ). Sandboxing and hermeticity are orthogonal axes; neither one implies the other. For example, docker builds are sandboxed but not hermetic, and nix shells are hermetic but not sandboxed. Compiler or linkers sometimes have their own incremental caches . Reusing the cache requires you to trust the compiler to be sound when incrementally rebuilding. This is usually implicit, but hermetic or sandboxed builds require an opt-in to reuse the cache. Bazel calls this kind of reuse a persistent worker . A build is deterministic if it creates the same output every time in some specific environment. A build is reproducible if it is deterministic and also has the same output in any environment , as long as the system dependencies remain the same. Caching can be remote or local. Remote caching is almost always unsound unless the build is both hermetic and reproducible (i.e. its only environment dependencies are controlled by the build system). Downloading files from the remote cache is called materializing them. Most build systems with remote caching defer materialization as long as possible, since in large build graphs the cache is often too large to fit on disk. Builds where the cache is never fully materialized are called shallow builds ( Build Systems à la Carte ). Remote caching usually, but not necessarily, uses content addressed hashing in a key-value store to identify which artifact to download. Some example build systems that use remote caching: Bazel, Buck2, nix, . Build systems usually have a way to run a subset of the build. The identifier used to specify which part of the build you want to run is called a target . 13 Targets are usually the filenames of an artifact, but can also be abstract names of one or more rules. Bazel-descended build systems call these names labels . Make-descended build systems call these phony targets . Some build systems, such as cargo, do not use target identifiers but instead only have subcommands with arguments; the combination of arguments together specifies a set of targets. Some example targets: Inter-process build systems are often divided into a configuration step and a build step. A build system that only runs the configuration step, and requires another tool for the build step, is called a meta-build system . Usually this meta-build system discovers the rules that need to be executed (often through file globbing or some other programmatic way to describe dependencies), then serializes these rules into an action graph , which can be stored either in-memory or on-disk. On-disk serialized action graphs are usually themselves build files, in the sense that you can write them by hand but you wouldn't want to. Configuration steps usually allow the developer to choose a set of configuration flags (occasionally, build flags ) that affect the generated rules. Some build systems also integrate directly with the package manager , but this is uncommon, and usually the build system expects all packages to be pre-downloaded into a known location. Some examples of meta-build systems are CMake, meson, and autotools. Advanced build systems can integrate with a virtual file system (VFS) to check-out source control files on-demand, rather than eagerly ( EdenFS ). A VFS can also persistently store content hashes and provide efficient change detection for files, avoiding the need for file watching or constant re-hashing. The equivalent of system dependencies within a process is non-local state , including environment variables, globals, thread-locals, and class member fields (for languages where is passed implicitly). Especially tricky are function calls that do inter-process communication (IPC), which are basically never sound to cache. Tracing intra-process builds is very very hard since it’s easy to call a function that depends on global state without you knowing. 14 In this intra-process context, most object stores are in-memory caches . A build system that supports saving ( persisting ) the cache to disk is said to have persistence . The system for persisting the cache is sometimes called a database , even if it is not a general-purpose database in the sense the term is normally used ( Salsa ). Tracing intra-process build systems are sometimes called a query system . 15 They work similarly to their inter-process equivalents: the interface looks like normal function calls, and the build system tracks which functions call which other functions, so it knows which to rerun later. Some examples of tools with tracing intra-process build systems: salsa , the rustc query system . Intra-process build systems that allow you to explicitly declare dependencies usually come from the background of functional reactive programming (FRP). FRP is most often used in UI and frontend design, but many of the ideas are the same as the build systems used for compiling programs. Unlike any of the build systems we've talked about so far, FRP libraries let you look at past versions of your outputs, which is sometimes called remembering state ( React ). To make this easier to reason about, rules can be written as event handlers . Some examples of libraries with dependency declarations: React . A build system is pretty much anything that lets you specify dependencies on a previous artifact 😄 Some more weird examples of build systems: Hopefully this post has given you both a vocabulary to talk about build systems and a context to compare them! Nearly all build systems are inconsistent about whether a rule refers to an abstract description of how to build an output (i.e., can be reused for multiple sets of inputs and outputs), or a concrete instantiation of that description for a specific set of inputs and outputs. We have to live with the ambiguity, unfortunately. ↩ Weird things can happen here though; for example early cutoff can allow circular dependencies. This sometimes comes up for generated build.ninja files. ↩ The pluto paper defines this as “after a build, generated files consistently reflect the latest source files”. Neither my definition nor pluto's definition are particularly well-defined if the build is non-deterministic. Defining this formally would probably require constructing an isomorphism between all programs with the same runtime behavior; but “runtime behavior” is not well-defined for a general-purpose build system that can output artifacts that are not programs. ↩ As we'll see later, the reverse is also true: a common design for build systems is to automatically inject cache points into an existing task runner, or to design the rule file to look as similar to a shell script or function call as possible. ↩ In particular, nearly all modern inter-process build systems have a limited form of tracing where they ask the compiler to generate "dep-info" files 16 that show which files were used (usually through imports) by a given source file. Note that this dep-info is not available until after the first time a build has run, and that this only works if the compiler supports it. ↩ For more information about the spectrum of designs between applicative and monadic, see the post-modern build system . ↩ Note that the dev-guide assumes that tasks are expensive relative to the cost of constructing the graph. This is true in the context of rustc, where LLVM codegen 17 normally dominates compilation time, but it isn't true for e.g. spreadsheets . ↩ It's possible for tasks to create files that aren't tracked by the build system, but these aren't called artifacts. I don't know a good word for these; "byproducts" is the closest but some build systems use that to mean any intermediate artifacts. ↩ I'm not super happy with this definition because it conflicts with how compilers use the term, but I do think it describes how most build systems think about files. ↩ Poorly written rules can also depend on which other rules are executing at the same time, which is called a race condition . Note this does not require the rule to be unsound, only for it to use intermediate files the build system doesn’t know about. ↩ for C, header files; for other languages, usually source files or intermediate representations. ↩ Yes, this overlaps with the term for sandboxing. Try to avoid the word "isolated" if possible. ↩ This has no relation to a target platfom , which is related to cross-compiling. I wish we had better names for these things. ↩ I would actually describe this as much harder than tracing an inter-process build system, since there aren't very good systems for tracking memory access . See this post about unstable fingerprints for an idea of what bugs this causes in practice. ↩ This actually has very strong analogies to the way "query" is used in a database context: just like a tracing query system, a database has to be able to restart a query's transaction if the data it's trying to access has been modified. ↩ What is a dep-info file? Good question! It's a makefile. It's literally a makefile. Don't you just love proving backslashes by induction ? ↩ Or, more rarely, type-checking, borrow-checking, or coherence checking. ↩ Github Actions (jobs and workflows) Static site generators Docker-compose files Systemd unit files Andrew Nesbitt, “What is a Package Manager?” jyn, “build system tradeoffs” Jade Lovelace, “The postmodern build system” Casey Rodarmor, “Just Programmer's Manual” Fabien Sanglard, “Driving Compilers” The Rust Project Contributors, “Incremental compilation in detail” The Rust Project Contributors, “Queries: demand-driven compilation” “functools — Higher-order functions and operations on callable objects — Python 3.14.2 documentation” Hillel Wayne, “The Capability-Tractability Tradeoff” Neil Mitchell, “Shake Build System” “The Ninja build system” Peter Miller, “Recursive Make Considered Harmful” Rebecca Mark and Paul Chiusano, “Incremental evaluation via memoization · Unison programming language” “Hermeticity  |  Bazel” “Persistent Workers  |  Bazel” “Labels  |  Bazel” “Commandments of reproducible builds” Mokhov et. al., Build Systems à la Carte “Phony Targets (GNU make)” Facebook, “Sapling: A Scalable, User-Friendly Source Control System.” Erdweg et. al., "A Sound and Optimal Incremental Build System with Dynamic Dependencies" David Lattimore, “Designing Wild's incremental linking” Bo Lord, “How to Recalculate a Spreadsheet” “salsa - A generic framework for on-demand, incrementalized computation” “Defining the database struct - Salsa” Felix Klock and Mark Rousskov on behalf of the Rust compiler team, “Announcing Rust 1.52.1” Yaron Minsky, “Jane Street Blog - Breaking down FRP ” “Adding Interactivity – React” “State: A Component's Memory – React” Nearly all build systems are inconsistent about whether a rule refers to an abstract description of how to build an output (i.e., can be reused for multiple sets of inputs and outputs), or a concrete instantiation of that description for a specific set of inputs and outputs. We have to live with the ambiguity, unfortunately. ↩ Weird things can happen here though; for example early cutoff can allow circular dependencies. This sometimes comes up for generated build.ninja files. ↩ The pluto paper defines this as “after a build, generated files consistently reflect the latest source files”. Neither my definition nor pluto's definition are particularly well-defined if the build is non-deterministic. Defining this formally would probably require constructing an isomorphism between all programs with the same runtime behavior; but “runtime behavior” is not well-defined for a general-purpose build system that can output artifacts that are not programs. ↩ As we'll see later, the reverse is also true: a common design for build systems is to automatically inject cache points into an existing task runner, or to design the rule file to look as similar to a shell script or function call as possible. ↩ In particular, nearly all modern inter-process build systems have a limited form of tracing where they ask the compiler to generate "dep-info" files 16 that show which files were used (usually through imports) by a given source file. Note that this dep-info is not available until after the first time a build has run, and that this only works if the compiler supports it. ↩ For more information about the spectrum of designs between applicative and monadic, see the post-modern build system . ↩ Note that the dev-guide assumes that tasks are expensive relative to the cost of constructing the graph. This is true in the context of rustc, where LLVM codegen 17 normally dominates compilation time, but it isn't true for e.g. spreadsheets . ↩ It's possible for tasks to create files that aren't tracked by the build system, but these aren't called artifacts. I don't know a good word for these; "byproducts" is the closest but some build systems use that to mean any intermediate artifacts. ↩ I'm not super happy with this definition because it conflicts with how compilers use the term, but I do think it describes how most build systems think about files. ↩ Poorly written rules can also depend on which other rules are executing at the same time, which is called a race condition . Note this does not require the rule to be unsound, only for it to use intermediate files the build system doesn’t know about. ↩ for C, header files; for other languages, usually source files or intermediate representations. ↩ Yes, this overlaps with the term for sandboxing. Try to avoid the word "isolated" if possible. ↩ This has no relation to a target platfom , which is related to cross-compiling. I wish we had better names for these things. ↩ I would actually describe this as much harder than tracing an inter-process build system, since there aren't very good systems for tracking memory access . See this post about unstable fingerprints for an idea of what bugs this causes in practice. ↩ This actually has very strong analogies to the way "query" is used in a database context: just like a tracing query system, a database has to be able to restart a query's transaction if the data it's trying to access has been modified. ↩ What is a dep-info file? Good question! It's a makefile. It's literally a makefile. Don't you just love proving backslashes by induction ? ↩ Or, more rarely, type-checking, borrow-checking, or coherence checking. ↩

0 views

I want a better build executor

This post is part 4/4 of a series about build systems . The market fit is interesting. Git has clearly won, it has all of the mindshare, but since you can use jj to work on Git repositories, it can be adopted incrementally. This is, in my opinion, the only viable way to introduce a new VCS: it has to be able to be partially adopted. If you've worked with other determinism-based systems, one thing they have in common is they feel really fragile, and you have to be careful that you don't do something that breaks the determinism. But in our case, since we've created every level of the stack to support this, we can offload the determinism to the development environment and you can basically write whatever code you want without having to worry about whether it's going to break something. In my last post , I describe an improved build graph serialization. In this post, I describe the build executor that reads those files. Generally, there are three stages to a build: There are a lot more things an executor can do than just spawning processes and showing a progress report! This post explores what those are and sketches a design for a tool that could improve on current executors. Ninja depends on mtimes, which have many issues . Ideally, it would take notes from and look at file attributes, not just the mtime, which eliminates many more false positives. I wrote earlier about querying the build graph . There are two kinds of things you can query: The configuration graph (what bazel calls the target graph ), which shows dependencies between "human meaningful" packages; and the action graph , which shows dependencies between files. Queries on the action graph live in the executor; queries on the configuration graph live in the configure script. For example, / , , and query the configuration graph; and query the action graph. Cargo has no stable way to query the action graph. Note that “querying the graph” is not a binary yes/no. Ninja's query language is much more restricted than Bazel's. Compare Ninja's syntax for querying “the command line for all C++ files used to build the target ” 2 : to Bazel's: Bazel’s language has graph operators, such as union, intersection, and filtering, that let you build up quite complex predicates. Ninja can only express one predicate at a time, with much more limited filtering—but unlike Bazel, allows you to filter to individual parts of the action, like the command line invocation, without needing a full protobuf parser or trying to do text post-processing. I would like to see a query language that combines both these strengths: the same nested predicate structure of Bazel queries, but add a new predicate that takes another predicate as an argument for complex output filtering: We could even go so far as to give this a jq-like syntax: For more complex predicates that have multiple sets as inputs, such as set union and intersection, we could introduce a operator: In my previous post , I talked about two main uses for a tracing build system: first, to automatically add dependency edges for you; and second, to verify at runtime that no dependency edges are missing. This especially shines when the action graph has a way to express negative dependencies, because the tracing system sees every attempted file access and can add them to the graph automatically. For prior art, see the Shake build system . Shake is higher-level than an executor and doesn't work on an action graph, but it has built-in support for file tracing in all three of these modes: warning about incorrect edges; adding new edges to the graph when they're detected at runtime; and finally, fully inferring all edges from the nodes alone . I would want my executor to only support linting and hard errors for missing edges. Inferring a full action graph is scary and IMO belongs in a higher-level tool, and adding dependency edges automatically can be done by a tool that wraps the executor and parses the lints. What's really cool about this linting system is that it allows you to gradually transition to a hermetic build over time, without frontloading all the work to when you switch to the tool. The main downside of tracing is that it's highly non-portable, and in particular is very limited on macOS. One possible alternative I've thought of is to do a buck2-style unsandboxed hermetic builds, where you copy exactly the specified inputs into a tempdir and run the build from the tempdir. If that fails, rerun the build from the main source directory. This can't tell which dependency edges are missing, but it can tell you a dependency is missing without fully failing the build. The downside to that is it assumes command spawning is a pure function, which of course it's not; anything that talks to a socket is trouble because it might be stateful. Tracing environment variable access is … hard. Traditionally access goes through the libc function, but it’s also possible to take an in a main function, in which case accesses are just memory reads. That means we need to trace memory reads somehow. On x86 machines, there’s something called PIN that can do this directly in the CPU without needing compile time instrumentation. On ARM there’s SPE , which is how works, but I’m not sure whether it can be configured to track 100% of memory accesses. I need to do more research here. On Linux, this is all abstracted by . I’m not sure if there’s equivalent wrappers on Windows and macOS. There’s also DynamicRIO , which supports a bunch of platforms, but I believe it works in a similar way to QEMU, by interposing itself between the program and the CPU, which comes with a bunch of overhead. That could work as an opt-in. One last way to do this is with a SIGSEGV signal handler , but that requires that environment variables are in their own page of memory and therefore a linker script. This doesn’t work for environment variables specifically, because they aren’t linker symbols in the normal sense, they get injected by the C runtime . In general, injecting linker scripts means we’re modifying the binaries being run and might cause unexpected build or runtime failures. Here I describe more concretely the tool I want to build, which I’ve named . It would read the constrained clojure action graph serialization format (Magma) that I describe in the previous post; perhaps with a way to automatically convert Ninja files to Magma. Like Ekam , Ronin would have a continuous rebuild mode (but unlike Bazel and Buck2, no background server). Like Shake, It would have runtime tracing, with all of options, to allow gradually transitioning to a hermetic build. And it would have bazel-like querying for the action graph, both through CLI arguments with an jq syntax and through a programmatic API. Finally, it would have pluggable backends for file watching, tracing, stat-ing, progress reporting, and checksums, so that it can take advantage of systems that have more features while still being reasonably fast on systems that don’t. For example, on Windows stats are slow, so it would cache stat info; but on Linux stats are fast so it would just directly make a syscall. Like Ninja, Ronin would keep a command log with a history of past versions of the action graph. It would reuse the bipartite graph structure , with one half being files and the other being commands. It would parse depfiles and dyndeps files just after they’re built, while the cache is still hot. Like , ronin would use a single-pass approach to support early cutoff. It would hash an "input manifest" to decide whether to rebuild. Unlike , it would store a mapping from that hash back to the original manifest so you can query why a rebuild happened. Tracing would be built on top of a FUSE file system that tracked file access. 3 Unlike other build systems I know, state (such as manifest hashes, content hashes, and removed outputs) would be stored in an SQLite database, not in flat files. Kinda. Ronin takes a lot of ideas from buck2. It differs in two major ways: The main advantage of Ronin is that it can slot in underneath existing build systems people are already using—CMake and Meson—without needing changes to your build files at all. In this post I describe what a build executor does, some features I would like to see from an executor (with a special focus on tracing), and a design for a new executor called that allows existing projects generating ninja files to gradually transition to hermetic builds over time, without a “flag day” that requires rewriting the whole build system. I don’t know yet if I will actually build this tool, that seems like a lot of work 5 😄 but it’s something I would like to exist in the world. In many ways Conan profiles are analogous to ninja files: profiles are the interface between Conan and CMake in the same way that ninja files are the interface between CMake and Ninja. Conan is the only tool I'm aware of where the split between the package manager and the configure step is explicit. ↩ This is not an apple to apples comparison; ideally we would name the target by the output file, not by its alias. Unfortunately output names are unpredictable and quite long in Bazel. ↩ macOS does not have native support for FUSE. MacFuse exists but does not support getting the PID of the calling process. A possible workaround would be to start a new FUSE server for each spawned process group. FUSE on Windows is possible through winfsp . ↩ An earlier version of this post read "Buck2 only supports non-hermetic builds for system toolchains , not anything else", which is not correct. ↩ what if i simply took buck2 and hacked it to bits,,, ↩ Resolving and downloading dependencies. The tool that does this is called a package manager . Common examples are , , Conan 1 , and the resolver . Configuring the build based on the host environment and build targets. I am not aware of any common name for this, other than maybe configure script (but there exist many tools for this that are not just shell scripts). Common examples are CMake, Meson, autotools, and the Cargo CLI interface (e.g. and ). Executing a bunch of processes and reporting on their progress. The tool that does this is called a build executor . Common examples are , , , and the phase of . It does not expect to be a top-level build system. It is perfectly happy to read (and encourages) generated files from a higher level configure tool. This allows systems like CMake and Meson to mechanically translate Ninja files into this new format, so builds for existing projects can get nice things. It allows you to gradually transition from non-hermetic to hermetic builds, without forcing you to fix all your rules at once, and with tracing to help you find where you need to make your fixes. Buck2 doesn’t support tracing at all. It technically supports non-hermetic builds, but you don't get many benefits compared to using a different build system, and it's still high cost to switch build systems 4 . In many ways Conan profiles are analogous to ninja files: profiles are the interface between Conan and CMake in the same way that ninja files are the interface between CMake and Ninja. Conan is the only tool I'm aware of where the split between the package manager and the configure step is explicit. ↩ This is not an apple to apples comparison; ideally we would name the target by the output file, not by its alias. Unfortunately output names are unpredictable and quite long in Bazel. ↩ macOS does not have native support for FUSE. MacFuse exists but does not support getting the PID of the calling process. A possible workaround would be to start a new FUSE server for each spawned process group. FUSE on Windows is possible through winfsp . ↩ An earlier version of this post read "Buck2 only supports non-hermetic builds for system toolchains , not anything else", which is not correct. ↩ what if i simply took buck2 and hacked it to bits,,, ↩

0 views

I want a better action graph serialization

This post is part 3/4 of a series about build systems . The next post and last post is I want a better build executor . As someone who ends up getting the ping on "my build is weird" after it has gone through a round of "poke it with a stick", I would really appreciate the mechanisms for [correct dependency edges] rolling out sooner rather than later. In a previous post , I talked about various approaches in the design space of build systems. In this post, I want to zero in on one particular area: action graphs. First, let me define "action graph". If you've ever used CMake, you may know that there are two steps involved: A "configure" step ( ) and a build step ( or ). What I am interested here is what generates , the Makefiles it has created. As the creator of ninja writes , this is a serialization of all build steps at a given moment in time, with the ability to regenerate the graph by rerunning the configure step. This post explores that design space, with the goal of sketching a format that improves on the current state while also enabling incremental adoption. When I say "design space", I mean a serialization format where files are machine-generated by a configure step, and have few enough and restricted enough features that it's possible to make a fast build executor . Not all build systems serialize their action graph. and run persistent servers that store it in memory and allow querying it, but never serialize it to disk. For large graphs, this requires a lot of memory; has actually started serializing parts of its graph to reduce memory usage and startup time . The nix evaluator doesn’t allow querying its graph at all; nix has a very strange model where it never rebuilds because each change to your source files is a new “ input-addressed derivation ” and therefore requires a reconfigure. This is the main reason it’s only used to package software, not as an “inner” build system, because that reconfigure can be very slow. I’ve talked to a couple Nix maintainers and they’ve considered caching parts of the configure step, without caching its outputs (because there are no outputs, other than derivation files!) in order to speed this up. This is much trickier because it requires serializing parts of the evaluator state. Tools that do serialize their graph include CMake, Meson, and the Chrome build system ( GN ). Generally, serializing the graph comes in handy when: In the last post I talked about 4 things one might want from a build system: For a serialization format, we have slightly different constraints. Throughout this post, I'll dive into detail on how these 3 overarching goals apply to the serialization format, and how well various serializations achieve that goal. The first one we'll look at, because it's the default for CMake, is and Makefiles. Make is truly in the Unix spirit: easy to implement 2 , very hard to use correctly. Make is ambiguous , complicated , and makes it very easy to implicitly do a bunch of file system lookups . It supports running shell commands at the top-level, which makes even loading the graph very expensive. It does do pretty well on minimizing reconfigurations, since the language is quite flexible. Ninja is the other generator supported by CMake. Ninja is explicitly intended to work on a serialized action graph; it's the only tool I'm aware of that is. It solves a lot of the problems of Make : it removes many of the ambiguities; it doesn't have any form of globbing; and generally it's a much simpler and smaller language. Unfortunately, Ninja's build file format still has some limitations. First, it has no support for checksums. It's possible to work around that by using and having a wrapper script that doesn't overwrite files unless they've changed, but that's a lot of extra work and is annoying to make portable between operating systems. Ninja files also have trouble expressing correct dependency edges. Let's look at a few examples, one by one. In each of these cases, we either have to reconfigure more often than we wish, or we have no way at all of expressing the dependency edge. See my previous post about negative dependencies. The short version is that build files need to specify not just the files they expect to exist, but also the files they expect not to exist. There's no way to express this in a ninja file, short of reconfiguring every time a directory that might contain a negative dependency is modified, which itself has a lot of downsides. Say that you have a C project with just a . You rename it to and ninja gives you an error that main.c no longer exists. Annoyed of editing ninja files by hand, you decide to write a generator 3 : Note this that this registers an implicit dependency on the current directory. This should automatically detect that you renamed your file and rebuild for you. Oh. Right. Generating build.ninja also modifies the current directory, which creates an infinite loop. It's possible to work around this by putting your C file in a source directory: There's still a problem here, though—did you notice it? Our old target is still lying around. Ninja actually has enough information recorded to fix this: . But it's not run automatically. The other problem is that this approach rebuilds far too often. In this case, we wanted to support renames, so in Ninja's model we need to depend on the whole directory. But that's not what we really depended on—we only care about files. I would like to see a action graph format that has an event-based system, where it says "this file was created, make any changes to the action graph necessary", and cuts the build short if the graph wasn't changed. For flower , I want to go further and support deletions : source files and targets that are optional, that should not fail the build if they aren't present, but should cause a rebuild if they are created, modified, or deleted. Ninja has no way of expressing this. Ninja has no way to express “this node becomes dirty when an environment variable changes”. The closest you can get is hacks with and the checksum wrapper/restat hack, but it’s a pain to express and it gets much worse if you want to depend on multiple variables. At this point, we have a list of constraints for our file format: Ideally, it would even be possible to mechanically translate existing .ninja files to this new format. This sketches out a new format that could improve over Ninja files. It could look something like this: I’d call this language Magma, since it’s the simplest kind of set with closure. Here's a sample action graph in Magma: Note some things about Magma: Kinda. Magma itself has a lot in common with Starlark: it's deterministic, hermetic, immutable, and can be evaluated in parallel. The main difference between the languages themselves is that Clojure has (equivalent to sympy symbolic variables) and Python doesn't. Some of these could be rewritten to keyword arguments, and others could be rewritten to structs, or string keys for a hashmap, or enums; but I'm not sure how much benefit there is to literally using Starlark when these files are being generated by a configure step in any case. Probably it's possible to make a 1-1 mapping between the two in any case. Buck2 has support for metadata that describes how to execute a built artifact. I think this is really interesting; is a much nicer interface than , partly because of shell quoting and word splitting issues, and partly just because it's more discoverable. I don't have a clean idea for how to fit this into a serialization layer. "Don't put it there and use a instead" works , but makes it hard to do things like allow the build graph to say that an artifact needs set or something like that, you end up duplicating the info in both files. Perhaps one option could be to attach a key/value pair to s. Well, yes and no. Yes, in that this has basically all the features of ninja and then some. But no, because the rules here are all carefully constrained to avoid needing to do expensive file I/O to load the build graph. The most expensive new feature is , and it's intended to avoid an even more expensive step (rerunning the configuration step). It's also limited to changed files; it can't do arbitrary globbing on the contents of the directory the way that Make pattern rules can. Note that this also removes some features in ninja: shell commands are gone, process spawning is much less ambiguous, files are no longer parsed automatically. And because this embeds a clojure interpreter, many things that were hard-coded in ninja can instead be library functions: , response files, , . In this post, we have learned some downsides of Make and Ninja's build file formats, sketched out how they could possibly be fixed, and designed a language called Magma that has those characteristics. In the next post, I'll describe the features and design of a tool that evaluates and queries this language. see e.g. this description of how it works in buck2 ↩ at least a basic version—although several of the features of GNU Make get rather complicated. ↩ is https://github.com/ninja-build/ninja/blob/231db65ccf5427b16ff85b3a390a663f3c8a479f/misc/ninja_syntax.py . ↩ technically these aren't true monadic builds because they're constrained a lot more than e.g. Shake rules, they can't fabricate new rules from whole cloth. but they still allow you to add more outputs to the graph at runtime. ↩ This goes all the way around the configuration complexity clock and skips the "DSL" phase to simply give you a real language. ↩ This is totally based and not at all a terrible idea. ↩ This has a whole bunch of problems on Windows, where arguments are passed as a single string instead of an array, and each command has to reimplement its own parsing. But it will work “most” of the time, and at least avoids having to deal with Powershell or CMD quoting. ↩ To make it possible to distinguish the two on the command line, could unambiguously refer to the group, like in Bazel. ↩ You don’t have a persistent server to store it in memory. When you don’t have a server, serializing makes your startup times much faster, because you don’t have to rerun the configure step each time. You don’t have a remote build cache. When you have a remote cache, the rules for loading that cache can be rather complicated because they involve network queries 1 . When you have a local cache, loading it doesn’t require special support because it’s just opening a file. You want to support querying, process spawning, and progress updates without rewriting the logic yourself for every OS (i.e. you don't want to write your own build executor). a "real" language in the configuration step reflection (querying the build graph) file watching support for discovering incorrect dependency edges We care about it being simple and unambiguous to load the graph from the file, so we get fast incremental rebuild speed and graph queries. In particular, we want to touch the filesystem as little as possible while loading. We care about supporting "weird" dependency edges, like dynamic dependencies and the depfiles emitted by a compiler after the first run, so that we're able to support more kinds of builds. And finally, we care about minimizing reconfigurations : we want to be able to express as many things as possible in the action graph so we don't have the pay the cost of rerunning the configure step. This tends to be at odds with fast graph loading; adding features at this level of the stack is very expensive! Negative dependencies File rename dependencies Optional file dependencies Optional checksums to reduce false positives Environment variable dependencies "all the features of ninja" (depfiles, monadic builds through 4 , a statement, order-only dependencies) A very very small clojure subset (just , , EDN , and function calls) for the text itself, no need to make loading the graph harder than necessary 5 . If people really want an equivalent of or I suppose this could learn support for and , but it would have much simpler rules than Clojure's classpath. It would not have support for looping constructs, nor most of clojure's standard library. -inspired dependency edges: (for changes in file attributes), (for changes in the checksum), (for optional dependencies), ; plus our new edge. A input function that can be used anywhere a file path can (e.g. in calls to ) so that the kind of edge does not depend on whether the path is known in advance or not. Runtime functions that determine whether the configure step needs to be re-run based on file watch events 6 . Whether there is actually a file watcher or the build system just calculates a diff on its next invocation is an implementation detail; ideally, one that's easy to slot in and out. “phony” targets would be replaced by a statement. Groups are sets of targets. Groups cannot be used to avoid “input not found” errors; that niche is filled by . Command spawning is specified as an array 7 . No more dependency on shell quoting rules. If people want shell scripts they can put that in their configure script. Redirecting stdout no longer requires bash syntax, it's supported natively with the parameter of . Build parameters can be referred to in rules through the argument. is a thunk ; it only registers an intent to add edges in the future, it does not eagerly require to exist. Our input edge is generalized and can apply to any rule, not just to the configure step. It executes when a file is modified (or if the tool doesn’t support file watching, on each file in the calculated diff in the next tool invocation). Our edge provides the file event type, but not the file contents. This allows ronin to automatically map results to one of the three edge kinds: , , . and are not available through this API. We naturally distinguish between “phony targets” and files because the former are s and the latter are s. No more accidentally failing to build if an file is created. 8 We naturally distinguish between “groups of targets” and “commands that always need to be rerun”; the latter just uses . Data can be transformed in memory using clojure functions without needing a separate process invocation. No more need to use in your build system. see e.g. this description of how it works in buck2 ↩ at least a basic version—although several of the features of GNU Make get rather complicated. ↩ is https://github.com/ninja-build/ninja/blob/231db65ccf5427b16ff85b3a390a663f3c8a479f/misc/ninja_syntax.py . ↩ technically these aren't true monadic builds because they're constrained a lot more than e.g. Shake rules, they can't fabricate new rules from whole cloth. but they still allow you to add more outputs to the graph at runtime. ↩ This goes all the way around the configuration complexity clock and skips the "DSL" phase to simply give you a real language. ↩ This is totally based and not at all a terrible idea. ↩ This has a whole bunch of problems on Windows, where arguments are passed as a single string instead of an array, and each command has to reimplement its own parsing. But it will work “most” of the time, and at least avoids having to deal with Powershell or CMD quoting. ↩ To make it possible to distinguish the two on the command line, could unambiguously refer to the group, like in Bazel. ↩

0 views

negative build dependencies

This post is part 2/4 of a series about build systems . The next post is I want a better action graph serialization . This post is about a limitation of the dependencies you can express in a build file. It uses Ninja just because it's simple and I'm very familiar with it, but the problem exists in most build systems. Say you have a C project with two different include paths: and : and the following build.ninja: 1 In ninja, means an “implicit dependency”, i.e. it causes a rebuild but does not get bound to . and some small implementations, just so we can see that it actually works: This works ok on our first build, and when changes: But say now that we add our own file: Now we have a problem. If we remove our build artifacts, ninja does rerun, and shows us what that problem is: We switched out what resolved to, but ninja didn't know about the changed dependency. What we want is a way to tell it to rebuild if the file is created. To my knowledge, ninja has no way of expressing this kind of negative dependency edge. One possibility is to depend on the whole directory of . The semantics of this are that gets marked dirty whenever a file is added, moved, or deleted. This has two problems: Another way to avoid needing negative dependencies is to simply have a hermetic build , so that our rule never even makes the file available to the command. That works, but puts us firmly out of Ninja's design space; hermetic builds come with severe tradeoffs. Yet another idea is to introduce an early-cutoff point: This has the problem that it's O(n 3 ) : For each file, for each include, for each directory in the search path, the preprocessor will try to that file to see whether it exists. One possible workaround is to calculate the list of include files in the build system : Look at the order of the search paths, list each path recursively, ignore filenames that are overridden by an include earlier in the search path. Save that to disk. Next time a directory is modified, check if the list of include files has changed. If so, only then rerun for all files. This requires the build system to have some quite deep knowledge of how the language works, but I do think it would work today without changes to Ninja. Thanks to Jade Lovelace for this suggestion. Thanks to David Chisnall and Ben Boeckel ( @mathstuf ) for making me aware of this issue. People familiar with ninja might say this looks odd and it should use a with so dependencies are tracked automatically. That makes your files shorter and means you need to run the configure step less often, but it doesn't actually solve the problem of negative dependencies. Ninja still doesn't know when it needs to regenerate the depfile. ↩ It rebuilds too often. We only want to rerun when is added, not for any other file creation. We don’t actually have a consistent way to find all directories that need to be marked in this way. Here we just hardcoded src, but in larger builds we would use depfiles , and depfiles do not contain any information about negative dependencies. This matters a lot when there are a dozen+ directories in the search path! For each file, run to get a list of include files. Save that to disk. Whenever a directory is changed, rerun . If our list has changed, rerun a full build for that file. People familiar with ninja might say this looks odd and it should use a with so dependencies are tracked automatically. That makes your files shorter and means you need to run the configure step less often, but it doesn't actually solve the problem of negative dependencies. Ninja still doesn't know when it needs to regenerate the depfile. ↩

0 views

brownouts reveal system boundaries

One of the many basic tenets of internal control is that a banking organization ensure that employees in sensitive positions be absent from their duties for a minimum of two consecutive weeks. Such a requirement enhances the viability of a sound internal control environment because most frauds or embezzlements require the continual presence of the wrongdoer. Failure free operations require experience with failure. Yesterday, Cloudflare ’s global edge network was down across the world. This post is not about why that happened or how to prevent it. It’s about the fact that this was inevitable. Infinite uptime does not exist . If your business relies on it, sooner or later, you will get burned. Cloudflare’s last global edge outage was on July 2, 2019. They were down yesterday for about 3 hours (with a long tail extending about another 2 and a half hours). That’s an uptime of 99.99995% over the last 6 years. Hyperscalers like Cloudflare, AWS, and Google try very very hard to always be available, to never fail. This makes it easy to intertwine them in your architecture, so deeply you don’t even know where. This is great for their business. I used to work at Cloudflare, and being intertwined like this is one of their explicit goals. My company does consulting, and one of our SaaS tools is a time tracker. It was down yesterday because it relied on Cloudflare. I didn’t even know until it failed! Businesses certainly don’t publish their providers on their homepage. The downtime exposes dependencies that were previously hidden. This is especially bad for “cascading” dependencies, where a partner of a partner of a partner has a dependency on a hyperscaler you didn’t know about. Failures like this really happen in real life; Matt Levine writes about one such case where a spectacular failure in a fintech caused thousands of families to lose their life savings. What I want to do here is make a case that cascading dependencies are bad for you, the business depending on them. Not just because you go down whenever everyone else goes down, but because depending on infinite uptime hides error handling issues in your own architecture . By making failures frequent enough to be normal, organizations are forced to design and practice their backup plans. Backup plans don’t require running your own local cloud. My blog is proxied through cloudflare; my backup plan could be “failover DNS from cloudflare to github when cloudflare is down”. Backup plans don’t have to be complicated. A hospital ER could have a backup plan of “keep patient records for everyone currently in the hospital downloaded to an offline backup sitting in a closet somewhere”, or even just “keep a printed copy next to the hospital bed”. The important thing here is to have a backup plan, to not just blithely assume that “the internet” is a magic and reliable thing. One way to avoid uptime reliance is brownouts, where services are down or only partially available for a predetermined amount of time. Google intentionally brownouts their internal infrastructure so that nothing relies on another service being up 100% at the time 1 . This forces errors to be constantly tested, and exposes dependency cycles. Another way is Chaos Monkey, pioneered at Netflix, where random things just break and you don’t know which ahead of time. This requires a lot of confidence in your infrastructure, but reveals kinds of failures you didn’t even think were possible. I would like to see a model like this for the Internet, where all service providers are required to have at least 24 hours of outages in a year. This is a bit less than 3 nines of uptime (about 5 minutes a day): enough that the service is usually up, but not so much that you can depend on it to always be up. In my experience, and according to studies about failure reporting , both people and organizations tend to chronically underestimate tail risks. Maybe you’re just a personal site and you don’t need 100% reliability. That’s ok. But if other people depend on you, and others depend on them, and again, eventually we end up with hospitals and fire stations and water treatment plants depending on the internet. The only way I see to prevent this is to make the internet unreliable enough that they need a backup plan. People fail. Organizations fail. You can’t control them. What you can control is whether you make them a single point of failure. You have backups for your critical data. Do you have backups for your critical infrastructure? Of course, they don't brown-out their external-facing infra. That would lose them customers. ↩ Of course, they don't brown-out their external-facing infra. That would lose them customers. ↩

0 views

the terminal of the future

Terminal internals are a mess. A lot of it is just the way it is because someone made a decision in the 80s and now it’s impossible to change. This is what you have to do to redesign infrastructure. Rich [Hickey] didn't just pile some crap on top of Lisp [when building Clojure]. He took the entire Lisp and moved the whole design at once. At a very very high level, a terminal has four parts: I lied a little bit above. "input" is not just text. It also includes signals that can be sent to the running process. Converting keystrokes to signals is the job of the PTY. Similar, "output" is not just text. It's a stream of ANSI Escape Sequences that can be used by the terminal emulator to display rich formatting. I do some weird things with terminals. However, the amount of hacks I can get up to are pretty limited, because terminals are pretty limited. I won't go into all the ways they're limited, because it's been rehashed many times before . What I want to do instead is imagine what a better terminal can look like. The closest thing to a terminal analog that most people are familiar with is Jupyter Notebook . This offers a lot of cool features that are not possible in a "traditional" VT100 emulator: high fidelity image rendering a "rerun from start" button (or rerun the current command; or rerun only a single past command) that replaces past output instead of appending to it "views" of source code and output that can be rewritten in place (e.g. markdown can be viewed either as source or as rendered HTML) a built-in editor with syntax highlighting, tabs, panes, mouse support, etc. Jupyter works by having a "kernel" (in this case, a python interpreter) and a "renderer" (in this case, a web application displayed by the browser). You could imagine using a Jupyter Notebook with a shell as the kernel, so that you get all the nice features of Jupyter when running shell commands. However, that quickly runs into some issues: It turns out all these problems are solveable. There exists today a terminal called Warp . Warp has built native integration between the terminal and the shell, where the terminal understands where each command starts and stops, what it outputs, and what is your own input. As a result, it can render things very prettily: It does this using (mostly) standard features built-in to the terminal and shell (a custom DCS): you can read their explanation here . It's possible to do this less invasively using OSC 133 escape codes ; I'm not sure why Warp didn't do this, but that's ok. iTerm2 does a similar thing, and this allows it to enable really quite a lot of features : navigating between commands with a single hotkey; notifying you when a command finishes running, showing the current command as an "overlay" if the output goes off the screen. This is really three different things. The first is interacting with a long-lived process. The second is suspending the process without killing it. The third is disconnecting from the process, in such a way that the process state is not disturbed and is still available if you want to reconnect. To interact with a process, you need bidirectional communication, i.e. you need a "cell output" that is also an input. An example would be any TUI, like , , or 1 . Fortunately, Jupyter is really good at this! The whole design is around having interactive outputs that you can change and update. Additionally, I would expect my terminal to always have a "free input cell", as Matklad describes in A Better Shell , where the interactive process runs in the top half of the window and an input cell is available in the bottom half. Jupyter can do this today, but "add a cell" is manual, not automatic. "Suspending" a process is usually called " job control ". There's not too much to talk about here, except that I would expect a "modern" terminal to show me all suspended and background processes as a de-emphasized persistent visual, kinda like how Intellij will show you "indexing ..." in the bottom taskbar. There are roughly three existing approaches for disconnecting and reconnecting to a terminal session (Well, four if you count reptyr ). Tmux / Zellij / Screen These tools inject a whole extra terminal emulator between your terminal emulator and the program. They work by having a "server" which actually owns the PTY and renders the output, and a "client" that displays the output to your "real" terminal emulator. This model lets you detach clients, reattach them later, or even attach multiple clients at once. You can think of this as a "batteries-included" approach. It also has the benefit that you can program both the client and the server (although many modern terminals, like Kitty and Wezterm are programmable now); that you can organize your tabs and windows in the terminal (although many modern desktop environments have tiling and thorough keyboard shortcuts); and that you get street cred for looking like Hackerman. The downside is that, well, now you have an extra terminal emulator running in your terminal, with all the bugs that implies . iTerm actually avoids this by bypassing the tmux client altogether and acting as its own client that talks directly to the server. In this mode, "tmux tabs" are actually iTerm tabs, "tmux panes" are iTerm panes, and so on. This is a good model, and I would adopt it when writing a future terminal for integration with existing tmux setups. Mosh is a really interesting place in the design space. It is not a terminal emulator replacement; instead it is an ssh replacement. Its big draw is that it supports reconnecting to your terminal session after a network interruption. It does that by running a state machine on the server and replaying an incremental diff of the viewport to the client . This is a similar model to tmux, except that it doesn't support the "multiplexing" part (it expects your terminal emulator to handle that), nor scrollback (ditto). Because it has its own renderer, it has a similar class of bugs to tmux . One feature it does have, unlike tmux, is that the "client" is really running on your side of the network, so local line editing is instant. alden / shpool / dtach / abduco / diss These all occupy a similar place in the design space: they only handle session detach/resume with a client/server, not networking or scrollback, and do not include their own terminal emulator. Compared to tmux and mosh, they are highly decoupled. I'm going to treat these together because the solution is the same: dataflow tracking. Take as an example pluto.jl , which does this today by hooking into the Julia compiler. Note that this updates cells live in response to previous cells that they depend on. Not pictured is that it doesn't update cells if their dependencies haven't changed. You can think of this as a spreadsheet-like Jupyter, where code is only rerun when necessary. You may say this is hard to generalize. The trick here is orthogonal persistence . If you sandbox the processes, track all IO, and prevent things that are "too weird" unless they're talking to other processes in the sandbox (e.g. unix sockets and POST requests), you have really quite a lot of control over the process! This lets you treat it as a pure function of its inputs, where its inputs are "the whole file system, all environment variables, and all process attributes". Once you have these primitives—Jupyter notebook frontends, undo/redo, automatic rerun, persistence, and shell integration—you can build really quite a lot on top. And you can build it incrementally, piece-by-piece: jyn, you may say, you can't build vertical integration in open source . you can't make money off open source projects . the switching costs are too high . All these things are true. To talk about how this is possible, we have to talk about incremental adoption. if I were building this, I would do it in stages, such that at each stage the thing is an improvement over its alternatives. This is how works and it works extremely well: it doesn't require everyone on a team to switch at once because individual people can use , even for single commands, without a large impact on everyone else. When people think of redesigning the terminal, they always think of redesigning the terminal emulator . This is exactly the wrong place to start. People are attached to their emulators. They configure them, they make them look nice, they use their keybindings. There is a high switching cost to switching emulators because everything affects everything else . It's not so terribly high, because it's still individual and not shared across a team, but still high. What I would do instead is start at the CLI layer. CLI programs are great because they're easy to install and run and have very low switching costs: you can use them one-off without changing your whole workflow. So, I would write a CLI that implements transactional semantics for the terminal . You can imagine an interface something like , where everything run after is undoable. There is a lot you can do with this alone, I think you could build a whole business off this. Once I had transactional semantics, I would try to decouple persistence from tmux and mosh. To get PTY persistence, you have to introduce a client/server model, because the kernel really really expects both sides of a PTY to always be connected. Using commands like alden , or a library like it (it's not that complicated), lets you do this simply, without affecting the terminal emulator nor the programs running inside the PTY session. To get scrollback, the server could save input and output indefinitely and replay them when the client reconnects. This gets you "native" scrollback—the terminal emulator you're already using handles it exactly like any other output, because it looks exactly like any other output—while still being replayable and resumable from an arbitrary starting point. This requires some amount of parsing ANSI escape codes 2 , but it's doable with enough work. To get network resumption like mosh, my custom server could use Eternal TCP (possibly built on top of QUIC for efficiency). Notably, the persistence for the PTY is separate from the persistence for the network connection. Eternal TCP here is strictly an optimization: you could build this on top of a bash script that runs in a loop, it's just not as nice an experience because of network delay and packet loss. Again, composable parts allow for incremental adoption. At this point, you're already able to connect multiple clients to a single terminal session, like tmux, but window management is still done by your terminal emulator, not by the client/server. If you wanted to have window management integrated, the terminal emulator could speak the tmux -CC protocol, like iTerm. All parts of this stage can be done independently and in parallel from the transactional semantics, but I don't think you can build a business off them, it's not enough of an improvement over the existing tools. This bit depends on the client/server model. Once you have a server interposed between the terminal emulator and the client, you can start doing really funny things like tagging I/O with metadata. This lets all data be timestamped 3 and lets you distinguish input from output. xterm.js works something like this. When combined with shell integration, this even lets you distinguish shell prompts from program output, at the data layer. Now you can start doing really funny things, because you have a structured log of your terminal session. You can replay the log as a recording, like asciinema 4 ; you can transform the shell prompt without rerunning all the commands; you can import it into a Jupyter Notebook or Atuin Desktop ; you can save the commands and rerun them later as a script. Your terminal is data. This is the very first time that we touch the terminal emulator, and it's intentionally the last step because it has the highest switching costs. This makes use of all the nice features we've built to give you a nice UI. You don't need our CLI anymore unless you want nested transactions, because your whole terminal session starts in a transaction by default. You get all the features I mention above , because we've put all the pieces together. This is bold and ambitious and I think building the whole thing would take about a decade. That's ok. I'm patient. You can help me by spreading the word :) Perhaps this post will inspire someone to start building this themselves. there are a lot of complications here around alternate mode , but I'm just going to skip over those for now. A simple way to handle alternate mode (that doesn't get you nice things) is just to embed a raw terminal in the output cell. ↩ otherwise you could start replaying output from inside an escape, which is not good . I had a detailed email exchange about this with the alden author which I have not yet had time to write up into a blog post; most of the complication comes when you want to avoid replaying the entire history and only want to replay the visible viewport. ↩ hey, this seems awfully like asciinema ! ↩ oh, that's why it seemed like asciinema. ↩ The " terminal emulator ", which is a program that renders a grid-like structure to your graphical display. The " pseudo-terminal " (PTY), which is a connection between the terminal emulator and a "process group" which receives input. This is not a program. This is a piece of state in the kernel. The "shell", which is a program that leads the "process group", reads and parses input, spawns processes, and generally acts as an event loop. Most environments use bash as the default shell. The programs spawned by your shell, which interact with all of the above in order to receive input and send output. high fidelity image rendering a "rerun from start" button (or rerun the current command; or rerun only a single past command) that replaces past output instead of appending to it "views" of source code and output that can be rewritten in place (e.g. markdown can be viewed either as source or as rendered HTML) a built-in editor with syntax highlighting, tabs, panes, mouse support, etc. Your shell gets the commands all at once, not character-by-character, so tab-complete, syntax highlighting, and autosuggestions don't work. What do you do about long-lived processes? By default, Jupyter runs a cell until completion; you can cancel it, but you can't suspend, resume, interact with, nor view a process while it's running. Don't even think about running or . The "rerun cell" buttons do horrible things to the state of your computer (normal Jupyter kernels have this problem too, but "rerun all" works better when the commands don't usually include ). Undo/redo do not work. (They don't work in a normal terminal either, but people attempt to use them more when it looks like they should be able to.) Tmux / Zellij / Screen These tools inject a whole extra terminal emulator between your terminal emulator and the program. They work by having a "server" which actually owns the PTY and renders the output, and a "client" that displays the output to your "real" terminal emulator. This model lets you detach clients, reattach them later, or even attach multiple clients at once. You can think of this as a "batteries-included" approach. It also has the benefit that you can program both the client and the server (although many modern terminals, like Kitty and Wezterm are programmable now); that you can organize your tabs and windows in the terminal (although many modern desktop environments have tiling and thorough keyboard shortcuts); and that you get street cred for looking like Hackerman. The downside is that, well, now you have an extra terminal emulator running in your terminal, with all the bugs that implies . iTerm actually avoids this by bypassing the tmux client altogether and acting as its own client that talks directly to the server. In this mode, "tmux tabs" are actually iTerm tabs, "tmux panes" are iTerm panes, and so on. This is a good model, and I would adopt it when writing a future terminal for integration with existing tmux setups. Mosh Mosh is a really interesting place in the design space. It is not a terminal emulator replacement; instead it is an ssh replacement. Its big draw is that it supports reconnecting to your terminal session after a network interruption. It does that by running a state machine on the server and replaying an incremental diff of the viewport to the client . This is a similar model to tmux, except that it doesn't support the "multiplexing" part (it expects your terminal emulator to handle that), nor scrollback (ditto). Because it has its own renderer, it has a similar class of bugs to tmux . One feature it does have, unlike tmux, is that the "client" is really running on your side of the network, so local line editing is instant. alden / shpool / dtach / abduco / diss These all occupy a similar place in the design space: they only handle session detach/resume with a client/server, not networking or scrollback, and do not include their own terminal emulator. Compared to tmux and mosh, they are highly decoupled. Runbooks (actually, you can build these just with Jupyter and a PTY primitive). Terminal customization that uses normal CSS, no weird custom languages or ANSI color codes. Search for commands by output/timestamp. Currently, you can search across output in the current session, or you can search across all command input history, but you don't have any kind of smart filters, and the output doesn't persist across sessions. Timestamps and execution duration for each command. Local line-editing, even across a network boundary. IntelliSense for shell commands , without having to hit tab and with rendering that's integrated into the terminal. " All the features from sandboxed tracing ": collaborative terminals, querying files modified by a command, "asciinema but you can edit it at runtime", tracing build systems. Extend the smart search above to also search by disk state at the time the command was run. Extending undo/redo to a git-like branching model (something like this is already support by emacs undo-tree ), where you have multiple "views" of the process tree. Given the undo-tree model, and since we have sandboxing, we can give an LLM access to your project, and run many of them in parallel at the same time without overwriting each others state, and in such a way that you can see what they're doing, edit it, and save it into a runbook for later use. A terminal in a prod environment that can't affect the state of the machine, only inspect the existing state. Gary Bernhardt, “A Whole New World” Alex Kladov, “A Better Shell” jyn, “how i use my terminal” jyn, “Complected and Orthogonal Persistence” jyn, “you are in a box” jyn, “there's two costs to making money off an open source project…” Rebecca Turner, “Vertical Integration is the Only Thing That Matters” Julia Evans, “New zine: The Secret Rules of the Terminal” Julia Evans, “meet the terminal emulator” Julia Evans, “What happens when you press a key in your terminal?” Julia Evans, “What's involved in getting a "modern" terminal setup?” Julia Evans, “Bash scripting quirks & safety tips” Julia Evans, “Some terminal frustrations” Julia Evans, “Reasons to use your shell's job control” “signal(7) - Miscellaneous Information Manual” Christian Petersen, “ANSI Escape Codes” saoirse, “withoutboats/notty: A new kind of terminal” Jupyter Team, “Project Jupyter Documentation” “Warp: The Agentic Development Environment” “Warp: How Warp Works” “Warp: Completions” George Nachman, “iTerm2: Proprietary Escape Codes” George Nachman, “iTerm2: Shell Integration” George Nachman, “iTerm2: tmux Integration” Project Jupyter, “Jupyter Widgets” Nelson Elhage, “nelhage/reptyr: Reparent a running program to a new terminal” Kovid Goyal, “kitty” Kovid Goyal, “kitty - Frequently Asked Questions” Wez Furlong, “Wezterm” Keith Winstein, “Mosh: the mobile shell” Keith Winstein, “Display errors with certain characters Matthew Skala, “alden: detachable terminal sessions without breaking scrollback” Ethan Pailes, “shell-pool/shpool: Think tmux, then aim... lower” Ned T. Crigler, “crigler/dtach: A simple program that emulates the detach feature of screen” Marc André Tanner, “martanne/abduco: abduco provides session management” yazgoo, “yazgoo/diss: dtach-like program / crate in rust” Fons van der Plas, “Pluto.jl — interactive Julia programming environment” Ellie Huxtable, “Atuin Desktop: Runbooks that Run” Toby Cubitt, “undo-tree” “SIGHUP - Wikipedia” Jason Gauci, “How Eternal Terminal Works” Marcin Kulik, “Record and share your terminal sessions, the simple way - asciinema.org” “Alternate Screen | Ratatui” there are a lot of complications here around alternate mode , but I'm just going to skip over those for now. A simple way to handle alternate mode (that doesn't get you nice things) is just to embed a raw terminal in the output cell. ↩ otherwise you could start replaying output from inside an escape, which is not good . I had a detailed email exchange about this with the alden author which I have not yet had time to write up into a blog post; most of the complication comes when you want to avoid replaying the entire history and only want to replay the visible viewport. ↩ hey, this seems awfully like asciinema ! ↩ oh, that's why it seemed like asciinema. ↩

0 views

build system tradeoffs

This post is part 1/4 of a series about build systems . The next post is negative build dependencies . If I am even TEMPTED to use , in my goddamn build system, you have lost. I am currently employed to work on the build system for the Rust compiler (often called or ). As a result, I think about a lot of build system weirdness that most people don't have to. This post aims to give an overview of what builds for complicated projects have to think about, as well as vaguely gesture in the direction of build system ideas that I like. This post is generally designed to be accessible to the working programmer, but I have a lot of expert blindness in this area, and sometimes assume that "of course people know what a feldspar is!" . Apologies in advance if it's hard to follow. What makes a project’s build complicated? The first semi-complicated thing people usually want to do in their build is write an integration test. Here's a rust program which does so: This instructs cargo to, when you run , compile as a standalone program and run it, with as the entrypoint. We'll come back to this program several times in this post. For now, notice that we are invoking inside of . I actually forgot this one in the first draft because Cargo solves this so well in the common case 1 . In many hand-written builds ( cough cough ), specifying dependencies by hand is very broken, for parallelism simply doesn't work, and running on errors is common. Needless to say, this is a bad experience. The next step up in complexity is to cross-compile code. At this point, we already start to get some idea of how involved things get: How hard it is to cross-compile code depends greatly on not just the build system, but the language you're using and the exact platform you're targeting. The particular thing I want to point out is your standard library has to come from somewhere . In Rust, it's usually downloaded from the same place as the compiler. In bytecode and interpreted languages, like Java, JavaScript, and Python, there's no concept of cross-compilation because there is only one possible target. In C, you usually don't install the library itself, but only the headers that record the API 2 . That brings us to our next topic: Generally, people refer to one of two things when they say "libc". Either they mean the C standard library, , or the C runtime, . Libc matters a lot for two reasons. Firstly, C is no longer a language , so generally the first step to porting any language to a new platform is to make sure you have a C toolchain 3 . Secondly, because libc is effectively the interface to a platform, Windows , macOS , and OpenBSD have no stable syscall boundary—you are only allowed to talk to the kernel through their stable libraries (libc, and in the case of Windows several others too). To talk about why they've done this, we have to talk about: Many languages have a concept of " early binding ", where all variable and function references are resolved at compile time, and " late binding ", where they are resolved at runtime. C has this concept too, but it calls it "linking" instead of "binding". "late binding" is called "dynamic linking" 4 . References to late-bound variables are resolved by the "dynamic loader" at program startup. Further binding can be done at runtime using and friends. Platform maintainers really like dynamic linking , for the same reason they dislike vendoring : late-binding allows them to update a library for all applications on a system at once. This matters a lot for security disclosures, where there is a very short timeline between when a vulnerability is patched and announced and when attackers start exploiting it in the wild. Application developers dislike dynamic linking for basically the same reason: it requires them to trust the platform maintainers to do a good job packaging all their dependencies, and it results in their application being deployed in scenarios that they haven't considered or tested . For example, installing openssl on Windows is really quite hard. Actually, while I was writing this, a friend overheard me say "dynamically linking openssl" and said "oh god you're giving me nightmares". Perhaps a good way to think about dynamically linking as commonly used is a mechanism for devendoring libraries in a compiled program . Dynamic linking has other use cases, but they are comparatively rare. Whether a build system (or language) makes it easy or hard to dynamically link a program is one of the major things that distinguishes it. More about that later. Ok. So. Back to cross-compiling. To cross-compile a program, you need: Where does your toolchain come from? ... ... ... ... It turns out this is a hard problem . Most build systems sidestep it by "not worrying about it"; basically any Makefile you find is horribly broken if you update your compiler without running afterwards. Cargo is a lot smarter—it caches output in , and rebuilds if changes. "How do you deal with toolchain invalidations" is another important things that distinguishes a build system, as we'll see later. toolchains are a special case of a more general problem: your build depends on your whole build environment , not just the files passed as inputs to your compiler. That means, for instance, that people can—and often do—download things off the internet, embed previous build artifacts in later ones , and run entire nested compiler invocations . Once we get towards these higher levels of complexity, people want to start doing quite complicated things with caching. In order for caching to be sound, we need the same invocation of the compiler to emit the same output every time, which is called a reproducible build . This is much harder than it sounds! There are many things programs do that cause non-determinism that programmers often don’t think about (for example, iterating a hashmap or a directory listing). At the very highest end, people want to conduct builds across multiple machines, and combine those artifacts. At this point, we can’t even allow reading absolute paths, since those will be different between machines. The common tool for this is a compiler flag called , and allows the build system to map an absolute path to a relative one. is also how rustc is able to print the sources of the standard library when emitting diagnostics, even when running on a different machine than where it was built. At this point, we have enough information to start talking about the space of tradeoffs for a build system. Putting your config in a YAML file does not make it declarative! Limiting yourself to a Turing-incomplete language does not automatically make your code easier to read! — jyn The most common unforced error I see build systems making is forcing the build configuration to be written in a custom language 6 . There are basically two reasons they do this: Right. So, given that making a build "declarative" is a lie, you may as well give programmers a real language. Some common choices are: "But wait, jyn!", you may say. "Surely you aren't suggesting a build system where you have to run a whole program every time you figure out what to rebuild??" I mean ... people are doing it. But just because they're doing it doesn't mean it's a good idea, so let's look at the alternative, which is to serialize your build graph . This is easier to see than explain, so let's look at an example using the Ninja build system 8 : Ninjafiles give you the absolute bare minimum necessary to express your build dependencies: You get "rules", which explain how to build an output; "build edges", which state when to build; and "variables", which say what to build 9 . That's basically it. There's some subtleties about "depfiles" which can be used to dynamically add build edges while running the build rule. Because the features are so minimal, the files are intended to be generated, using a configure script written in one of the languages we talked about earlier. The most common generators are CMake and GN , but you can use any language you like because the format is so simple. What's really cool about this is that it's trivial to parse, which means that it's very easy to write your own implementation of ninja if you want. It also means that you can get a lot of the properties we discussed before, i.e.: It turns out these properties are very useful. "jyn, you're taking too long to get to the point!" look I’m getting there, I promise. The main downsides to this approach is that it has to be possible to serialize your build graph. In one sense, I see this as good, actually, because you have to think through everything your build does ahead of time. But on the other hand, if you have things like nested ninja invocations, or like our -> example from earlier 10 , all the tools to query the build graph don't include the information you expect. Re-stat'ing all files in a source directory is expensive. It would be much nicer if we could have a pull model instead of a push model, where the build tool gets notified of file changes and rebuilds exactly the necessary files. There are some tools with native integration for this, like Tup , Ekam , jj , and Buck2 , but generally it's pretty rare. That's ok if we have reflection, though! We can write our own file monitoring tool, ask the build system which files need to rebuilt for the changed inputs, and then tell it to rebuild only those files. That prevents it from having to recursively stat all files in the graph. See Tup's paper for more information about the big idea here. Ok, let's assume we have some build system that uses some programming language to generate a build graph, and it rebuilds exactly the necessary outputs on changes to our inputs. What exactly are our inputs? There are basically four major approaches to dependency tracking in the build space. This kind of build system externalizes all concerns out to you, the programmer. When I say "externalizes all concerns", I mean that you are required to write in all your dependencies by hand, and the tool doesn't help you get them right. Some examples: A common problem with this category of build system is that people forget to mark the build rule itself as an input to the graph, resulting in dead artifacts left laying around, and as a result, unsound builds. In my humble opinion, this kind of tool is only useful as a serialization layer for a build graph, or if you have no other choice. Here's a nickel, kid, get yourself a better build system . Sometimes build systems (CMake, Cargo, maybe others I don't know) do a little better and use the compiler's built-in support for dependency tracking (e.g. or ), and automatically add dependencies on the build rules themselves. This is a lot better than nothing, and much more reliable than tracking dependencies by hand. But it still fundamentally trusts the compiler to be correct, and doesn't track environment dependencies. This kind of build system always does a full build, and lets you modify the environment in arbitrary ways as you do so. This is simple, always correct, and expensive. Some examples: These are ok if you can afford them. But they are expensive! Most people using Github Actions are only doing so because GHA is a hyperscaler giving away free CI time like there's no tomorrow. I suspect we would see far less wasted CPU-hours if people had to consider the actual costs of using them. This is the kind of phrase that's well-known to people who work on build systems and basically unheard of outside it, alongside "monadic builds". "hermetic" means that the only things in your environment are those you have explicitly put there . This sometimes called "sandboxing", although that has unfortunate connotations about security that don't always apply here. Some examples of this: This has a lot of benefits! It statically guarantees that you cannot forget any of your inputs; it is 100% reliable, assuming no issues with the network or with the implementing tool 🙃; and it gives you very very granular insight into what your dependencies actually are. Some things you can do with a hermetic build system: The main downside is that you have to actually specify all those dependencies (if you don't, you get a hard error instead of an unsound build graph, which is the main difference between hermetic systems and "not my problem"). Bazel and Buck2 give you starlark, so you have a ~real 11 language in which to do it, but it's still a ton of work. Both have an enormous "prelude" module that just defines where you get a compiler toolchain from 12 . Nix can be thought of as taking this "prelude" idea all the way, by expanding the "prelude" (nixpkgs) to "everything that's ever been packaged for NixOS". When you write , your nix build is logically in the same build graph as the nixpkgs monorepo; it just happens to have an enormous remote cache already pre-built. Bazel and Buck2 don’t have anything like nixpkgs, which is the main reason that using them requires a full time dedicated build engineer: that engineer has to keep writing build rules from scratch any time you add an external dependency. They also have to package any language toolchains that aren’t in the prelude. Nix has one more interesting property, which is that all its packages compose. You can install two different versions of the same package and that's fine because they use different store paths. They fit together like lesbians' fingers interlock. Compare this to docker, which does not compose 13 . In docker, there is no way to say "Inherit the build environment from multiple different source images". The closest you can get is a "multi-stage build", where you explicitly copy over individual files from an earlier image to a later image. It can't blindly copy over all the files because some of them might want to end up at the same path, and touching fingers would be gay. The last kind I'm aware of, and the rarest I've seen, is tracing build systems. These have the same goal as hermetic build systems: they still want 100% of your dependencies to be specified. But they go about it in a different way. Rather than sandboxing your code and only allowing access to the dependencies you specify, they instrument your code, tracing its file accesses, and record the dependencies of each build step. Some examples: The advantage of these is that you get all the benefits of a hermetic build system without any of the cost of having to write out your dependencies. The first main disadvantage is that they require the kernel to support syscall tracing, which essentially means they only work on Linux. I have Ideas™ for how to get this working on macOS without disabling SIP, but they're still incomplete and not fully general; I may write a follow-up post about that. I don't yet have ideas for how this could work on Windows, but it seems possible . The second main disadvantage is that not knowing the graph up front causes many issues for the build system. In particular: I have been convinced that tracing is useful as a tool to generate your build graph, but not as a tool actually used when executing it. Compare also gazelle , which is something like that for Bazel, but based on parsing source files rather than tracking syscalls. Combining paradigms in this way also make it possible to verify your hermetic builds in ways that are hard to do with mere sandboxing. For example, a tracing build system can catch missing dependencies: and it can also detect non-reproducible builds: There's more to talk about here—how build systems affect the dynamics between upstream maintainers and distro packagers; how .a files are bad file formats ; how mtime comparisons are generally bad ; how configuration options make the tradeoffs much more complicated; how FUSE can let a build system integrate with a VCS to avoid downloading unnecessary files into a shallow checkout; but this post is quite long enough already. the uncommon case mostly looks like incremental bugs in rustc itself , or issues around rerunning build scripts. ↩ see this stackexchange post for more discussion about the tradeoffs between forward declarations and requiring full access to the source. ↩ even Rust depends on crt1.o when linking ! ↩ early binding is called "static linking". ↩ actually, Zig solved this in the funniest way possible , by bundling a C toolchain with their Zig compiler. This is a legitimately quite impressive feat. If there's any Zig contributors reading—props to you, you did a great job. ↩ almost every build system does this, so I don't even feel compelled to name names. ↩ Starlark is not tied to hermetic build systems. The fact that the only common uses of it are in hermetic build systems is unfortunate. ↩ H.T. Julia Evans ↩ actually variables are more general than this, but for $in and $out this is true. ↩ another example is "rebuilding build.ninja when the build graph changes". it's more common than you think because the language is so limited that it's easier to rerun the configure script than to try and fit the dependency info into the graph. ↩ not actually turing-complete ↩ I have been informed that the open-source version of Bazel is not actually hermetic-by-default inside of its prelude, and just uses system libraries. This is quite unfortunate; with this method of using Bazel you are getting a lot of the downsides and little of the upside. Most people I know using it are doing so in the hermetic mode. ↩ there's something called , but it composes containers, not images. ↩ A compiler for that target. If you're using clang or Rust, this is as simple as passing . If you're using gcc, you need a whole-ass extra compiler installed . A standard library for that target. This is very language-specific, but at a minimum requires a working C toolchain 5 . A linker for that target. This is usually shipped with the compiler, but I mention it specifically because it's usually the most platform specific part. For example, "not having the macOS linker" is the reason that cross-compiling to macOS is hard . Programmers aren't used to treating build system code as code. This is a culture issue that's hard to change, but it's worthwhile to try anyway. There is some idea of making builds "declarative". (In fact, observant readers may observe that this corresponds to the idea of "constrained languages" I talk about in an earlier post.) This is not by itself a bad idea! The problem is it doesn't give them the properties you might want. For example, one property you might want is "another tool can reimplement the build algorithm". Unfortunately this quickly becomes infeasible for complicated algorithms . Another you might want is "what will rebuild next time a build occurs?". You can't get this from the configuration without—again—reimplementing the algorithm. Starlark 7 “the same language that the build system was written in” (examples: Clojure , Zig , JavaScript ) "Show me all commands that are run on a full build" ( ) "Show me all commands that will be run the next time an incremental build is run" ( ) "If this particular source file is changed, what will need to be rebuilt?" ( ) Github Actions cache actions (and in general, most CI caching I'm aware of requires you to manually write out the files your cache depends on) Ansible playbooks Basically most build systems, this is extremely common (kinda—assuming that is reliable, which it often isn't.) Github Actions ( rules) Docker containers (time between startup and shutdown) Initramfs (time between initial load and chroot into the full system) Systemd service startup rules Most compiler invocations (e.g. ) Dockerfiles (more generally, OCI images) Bazel / Buck2 "Bazel Remote Execution Protocol" (not actually tied to Bazel), which lets you run an arbitrary set of build commands on a remote worker On a change to some source code, rerun only the affected tests. You know statically which those are, because the build tool forced you to write out the dependency edges. Remote caching. If you have the same environment everywhere, you can upload your cache to the cloud, and download and reuse it again on another machine. You can do this in CI—but you can also do it locally! The time for a """full build""" can be almost instantaneous because when a new engineer gets onboarded they can immediately reuse everyone else's build cache. The rust compiler, actually "A build system with orthogonal persistence" ( previously ; previously ; previously ) If you change the graph, it doesn't find out until the next time it reruns a build. This can lead to degenerate cases where the same rule has to be run multiple times until it doesn't access any new inputs. If you don't cache the graph, you have that problem on every edge in the graph . This is the problem Ekam has , and makes it very slow to run full builds. Its solution is to run in "watch" mode, where it caches the graph in-memory instead of on-disk. If you do cache the graph, you can only do so for so long before it becomes prohibitively expensive to do that for all possible executions. For "normal" codebases this isn't a problem, but if you're Google or Facebook, this is actually a practical concern. I think it is still possible to do this with a tracing build system (by having your cache points look a lot more like many Bazel BUILD files than a single top-level Ninja file), but no one has ever tried it at that scale. If the same file can come from many possible places, due to multiple search paths (e.g. a include header in C, or any import really in a JVM language), then you have a very rough time specifying what your dependencies actually are. The best ninja can do is say “depend on the whole directory containing that file”, which sucks because it rebuilds whenever that directory changes, not just when your new file is added. It’s possible to work around this with a (theoretical) serialization format other than Ninja, but regardless, you’re adding lots of file s to your hot path. The build system does not know which dependencies are direct (specified by you, the owner of the module being compiled) and which are transient (specified by the modules you depend on). This makes error reporting worse, and generally lets you do fewer kinds of queries on the graph. My friend Alexis Hunt, a build system expert, says "there are deeper pathologies down that route of madness". So. That's concerning. emitting untracked outputs overwriting source files (!), using an input file that was registered for a different rule reading the current time, or absolute path to the current directory iterating all files in a directory (this is non-deterministic) machine and kernel-level sources of randomness. Most build systems do not prioritize correctness. Prioritizing correctness comes with severe, hard to avoid tradeoffs. Tracing build systems show the potential to avoid some of those tradeoffs, but are highly platform specific and come with tradeoffs of their own at large enough scale. Combining a tracing build system with a hermetic build system seems like the best of both worlds. Writing build rules in a "normal" (but constrained) programming language, then serializing them to a build graph, has surprisingly few tradeoffs. I'm not sure why more build systems don't do this. Alan Dipert, Micha Niskin, Joshua Smith, “Boot: build tooling for Clojure” Alexis Hunt, Ola Rozenfield, and Adrian Ludwin, “bazelbuild/remote-apis: An API for caching and execution of actions on a remote system.” Andrew Kelley, “zig cc: a Powerful Drop-In Replacement for GCC/Clang” Andrew Thompson, “Packagers don’t know best” Andrey Mokhov et. al., “Non-recursive Make Considered Harmful” apenwarr, “mtime comparison considered harmful” Apple Inc., “Statically linked binaries on Mac OS X” Aria Desires, “C Isn’t A Language Anymore” Charlie Curtsinger and Daniel W. Barowy, “curtsinger-lab/riker: Always-Correct and Fast Incremental Builds from Simple Specifications” Chris Hopman and Neil Mitchell, “Build faster with Buck2: Our open source build system” Debian, “Software Packaging” Debian, “Static Linking” Dolstra, E., & The CppNix contributors., “Nix Store” Eyal Itkin, “The .a File is a Relic: Why Static Archives Were a Bad Idea All Along” Felix Klock and Mark Rousskov on behalf of the Rust compiler team, “Announcing Rust 1.52.1” Free Software Foundation, Inc., “GNU make” GitHub, Inc., “actions/cache: Cache dependencies and build outputs in GitHub Actions” Google Inc., “bazel-contrib/bazel-gazelle: a Bazel build file generator for Bazel projects” Google LLC, “Jujutsu docs” Jack Lloyd and Steven Fackler, “rust-openssl” Jack O’Connor, “Safety and Soundness in Rust” Jade Lovelace, “The postmodern build system” Julia Evans, “ninja: a simple way to do builds” jyn, “Complected and Orthogonal Persistence” jyn, “Constrained Languages are Easier to Optimize” jyn, “i think i have identified what i dislike about ansible” Kenton Varda, “Ekam Build System” László Nagy, “rizsotto/Bear: a tool that generates a compilation database for clang tooling” Laurent Le Brun, “Starlark Programming Language” Mateusz “j00ru” Jurczyk, “Windows X86-64 System Call Table (XP/2003/Vista/7/8/10/11 and Server)” Michał Górny, “The modern packager’s security nightmare” Mike Shal, “A First Tupfile” Mike Shal, “Build System Rules and Algorithms” Mike Shal, “tup” Nico Weber, “Ninja, a small build system with a focus on speed” NLnet, “Ripple” OpenJS Foundation, “Grunt: The JavaScript Task Runner” “Preprocessor Options (Using the GNU Compiler Collection (GCC))” Randall Munroe, “xkcd: Average Familiarity” Richard M. Stallman and the GCC Developer Community, “Invoking GCC” Rich Hickey, “Clojure - Vars and the Global Environment” Stack Exchange, “What are the advantages of requiring forward declaration of methods/fields like C/C++ does?” Stack Overflow, “Monitoring certain system calls done by a process in Windows” System Calls Manual, “dlopen(3)” System Manager’s Manual, “ld.so(8)” The Apache Groovy project, “The Apache Groovy™ programming language” The Chromium Authors, “gn” Theo de Raadt, “Removing syscall(2) from libc and kernel” The Rust Project Contributors, “Bootstrapping the compiler” The Rust Project Contributors, “Link using the linker directly” The Rust Project Contributors, “Rustdoc overview - Multiple runs, same output directory” The Rust Project Contributors, “The Cargo Book” The Rust Project Contributors, “Command-line Arguments - The rustc book” The Rust Project Contributors, “Queries: demand-driven compilation” The Rust Project Contributors, “What Bootstrapping does” Thomas Pöchtrager, “MacOS Cross-Toolchain for Linux and *BSD” “What is a compiler toolchain? - Stack Overflow” Wikipedia, “Dynamic dispatch” Wikipedia, “Late binding” Wikipedia, “Name binding” william woodruff, “Weird architectures weren’t supported to begin with” Zig contributors, “Zig Build System“ the uncommon case mostly looks like incremental bugs in rustc itself , or issues around rerunning build scripts. ↩ see this stackexchange post for more discussion about the tradeoffs between forward declarations and requiring full access to the source. ↩ even Rust depends on crt1.o when linking ! ↩ early binding is called "static linking". ↩ actually, Zig solved this in the funniest way possible , by bundling a C toolchain with their Zig compiler. This is a legitimately quite impressive feat. If there's any Zig contributors reading—props to you, you did a great job. ↩ almost every build system does this, so I don't even feel compelled to name names. ↩ Starlark is not tied to hermetic build systems. The fact that the only common uses of it are in hermetic build systems is unfortunate. ↩ H.T. Julia Evans ↩ actually variables are more general than this, but for $in and $out this is true. ↩ another example is "rebuilding build.ninja when the build graph changes". it's more common than you think because the language is so limited that it's easier to rerun the configure script than to try and fit the dependency info into the graph. ↩ not actually turing-complete ↩ I have been informed that the open-source version of Bazel is not actually hermetic-by-default inside of its prelude, and just uses system libraries. This is quite unfortunate; with this method of using Bazel you are getting a lot of the downsides and little of the upside. Most people I know using it are doing so in the hermetic mode. ↩ there's something called , but it composes containers, not images. ↩

0 views

the core of rust

NOTE: this is not a rust tutorial. Every year it was an incredible challenge to fit teaching Rust into lectures since you basically need all the concepts right from the start to understand a lot of programs. I never knew how to order things. The flip side was that usually when you understand all the basic components in play lots of it just fits together. i.e. there's some point where the interwovenness turns from a barrier into something incredibly valuable and helpful. One thing I admire in a language is a strong vision. Uiua , for example, has a very strong vision: what does it take to eliminate all local named variables from a language? Zig similarly has a strong vision: explicit, simple language features, easy to cross compile, drop-in replacement for C. Note that you don’t have to agree with a language’s vision to note that it has one. I expect most people to find Uiua unpleasant to program in. That’s fine. You are not the target audience. There’s a famous quote by Bjarne Strousup that goes “Within C++, there is a much smaller and cleaner language struggling to get out.” Within Rust, too, there is a much smaller and cleaner language struggling to get out: one with a clear vision, goals, focus. One that is coherent, because its features cohere . This post is about that language. Rust is hard to learn. Not for lack of trying—many, many people have spent person-years on improving the diagnostics, documentation, and APIs—but because it’s complex. When people first learn the language, they are learning many different interleaving concepts: These concepts interlock. It is very hard to learn them one at a time because they interact with each other, and each affects the design of the others. Additionally, the standard library uses all of them heavily. Let’s look at a Rust program that does something non-trivial: 1 I tried to make this program as simple as possible: I used only the simplest iterator combinators, I don't touch at all, I don't use async, and I don't do any complicated error handling. Already, this program has many interleaving concepts. I'll ignore the module system and macros, which are mostly independent of the rest of the language. To understand this program, you need to know that: If you want to modify this program, you need to know some additional things: This is a lot of concepts for a 20 line program. For comparison, here is an equivalent javascript program: For this JS program, you need to understand: I'm cheating a little here because returns a list of paths and doesn't. But only a little. My point is not that JS is a simpler language; that's debatable. My point is that you can do things in JS without understanding the whole language. It's very hard to do non-trivial things in Rust without understanding the whole core. The previous section makes it out to seem like I'm saying all these concepts are bad. I'm not. Rather the opposite, actually. Because these language features were designed in tandem, they interplay very nicely: There are more interplays than I can easily describe in a post, and all of them are what make Rust what it is. Rust has other excellent language features—for example the inline assembly syntax is a work of art, props to Amanieu . But they are not interwoven into the standard library in the same way, and they do not affect the way people think about writing code in the same way. without.boats wrote a post in 2019 titled "Notes on a smaller Rust" (and a follow-up revisiting it). In a manner of speaking, that smaller Rust is the language I fell in love with when I first learned it in 2018. Rust is a lot bigger today, in many ways, and the smaller Rust is just a nostalgic rose-tinted memory. But I think it's worth studying as an example of how well orthogonal features can compose when they're designed as one cohesive whole. If you liked this post, consider reading Two Beautiful Rust Programs by matklad. This program intentionally uses a file watcher because file IO is not possible to implement efficiently with async on Linux (and also because I wrote a file watcher recently for flower , so it's fresh in my mind). Tokio itself just uses a threadpool, alongside channels for notifying the future. I don’t want to get into async here; this just demonstrates Send/Sync bounds and callbacks. ↩ Technically, is syntax sugar around , but you don't need to know that for most rust programs. ↩ which is a terrible idea by the way, even more experienced Rust programmers often don't understand the interior mutability very well; see this blog post by dtolnay on the difference between mutability and uniqueness in reference types. It would be better to suggest using owned types with exterior mutability and cloning frequently. ↩ first class functions pattern matching the borrow checker and take a function as an argument. In our program, that function is constructed inline as an anonymous function (closure). Errors are handled using something called , not with exceptions or error codes. I happened to use and , but you would still need to understand Result even without that, because Rust does not let you access the value inside unless you check for an error condition first. Result takes a generic error; in our case, . Result is an data-holding enum that can be either Ok or Err, and you can check which variant it is using pattern matching. Iterators can be traversed either with a loop or with . 2 is eager and is lazy. has different ownership semantics than . can only print things that implement the traits or . As a result, s cannot be printed directly. returns a struct that borrows from the path. Sending it to another thread (e.g. through a channel) won't work, because goes out of scope when the closure passed to finishes running. You need to convert it to an owned value or pass as a whole. As an aside, this kind of thing encourages people to break work into "large" chunks instead of "small" chunks, which I think is often good for performance in CPU-bound programs, although as always it depends. only accepts functions that are . Small changes to this program, such as passing the current path into the closure, will give a compile error related to ownership. Fixing it requires learning the keyword, knowing that closures borrow their arguments by default, and the meaning of . If you are using , which is often recommended for beginners 3 , your program will need to be rewritten from scratch (either to use Arc/Mutex or to use exterior mutability). For example, if you wanted to print changes from the main thread instead of worker threads to avoid interleaving output, you couldn't simply push to the end of an collection, you would have to use in order to communicate between threads. first class functions nullability yeah that's kinda it. Enums without pattern matching are very painful to work with and pattern matching without enums has very odd semantics and s are impossible to implement without generics (or duck-typing, which I think of as type-erased generics) / , and the preconditions to , are impossible to encode without traits—and this often comes up in other languages, for example printing a function in clojure shows something like . In Rust it gives a compile error unless you opt-in with Debug. / are only possible to enforce because the borrow checker does capture analysis for closures. Java, which is wildly committed to thread-safety by the standards of most languages, cannot verify this at compile time and so has to document synchronization concerns explicitly instead. This program intentionally uses a file watcher because file IO is not possible to implement efficiently with async on Linux (and also because I wrote a file watcher recently for flower , so it's fresh in my mind). Tokio itself just uses a threadpool, alongside channels for notifying the future. I don’t want to get into async here; this just demonstrates Send/Sync bounds and callbacks. ↩ Technically, is syntax sugar around , but you don't need to know that for most rust programs. ↩ which is a terrible idea by the way, even more experienced Rust programmers often don't understand the interior mutability very well; see this blog post by dtolnay on the difference between mutability and uniqueness in reference types. It would be better to suggest using owned types with exterior mutability and cloning frequently. ↩

0 views

how to communicate with intent

As you can see from this blog, I like to talk (my friends will be the first to confirm this). Just as important as knowing how to talk, though, is knowing what to say and when to listen. In this post I will give a few simple tips on how to improve your communication in various parts of your life. My goal is partly to help you to be a more effective communicator, and partly to reduce the number of people in my life who get on my nerves :P You don't always need these tips. In casual conversations and when brainstorming, it's healthy to just say the first thing on your mind. But not all conversations are casual, and conversations can slip into seriousness faster than you expect. For those situations, you need to be intentional. Otherwise, it's easy to end up with hurt feelings on both sides, or waste the time of everyone involved. First, think about your audience. Adapt your message to the person you’re speaking to. Someone learning Rust for the first time does not need an infodump about variance and type coercion, they need an introduction to enums, generics, and pattern matching. Similarly, if a barista asks you what the weird code on your screen is, don’t tell them you’re writing a meta-Static Site Generator in Clojure , tell them you’re building a tool to help people create websites. If you are writing a promotion doc, a resume, or a tutorial, don't just dump a list of everything that's relevant. Think about the structure of your document: the questions your reader is likely to have, the amount of time they are likely going to spend reading, and the order they are likely to read in. You need to be legible , which means explaining concrete impacts in terms your audience understands. It's not enough to say what's true; you have to also say why it's important. Consider your intended effect. If a teacher goes on Twitter saying she doesn’t understand why maths is important and we should just give out A’s like candy (real thing that happened on my feed!), dog piling on her is not going to change her mind. Show her a case in her life where maths would be useful, and don’t talk down to her. Self-righteousness feels good in the moment, but doesn’t actually achieve anything. If you just want to gloat about how other people are stupid, go play an FPS or something; Twitter has enough negativity. If you are writing a blog post, know why you are writing it. If you are writing to practice the skill of writing, or to have a reference document, or to share with your friends, infodumping is fine. If you are writing with a goal in mind—say you want to give a name to an idea or communicate when software can fail or enter an idea into the overton window —be intentional. Consider your audience, and the background you expect them to start from. Posting the equivalent of a wikipedia article is rarely the most effective way to instill an idea. Don’t fight losing causes, unless the cause is really worth it . Someone on hacker news saying "language A Sucks and you Should use language B instead" is not worth arguing with. Someone who says "language A is good in scenario X, but has shortcomings in scenario Y compared to language B" is much more serious and worth listening to. Arguing with someone who refuses to be convinced wastes everyone’s time. Be a good conversational partner. Ask directed probing questions: they show you are listening to the other person and invested in the topic. Saying “I don’t understand” puts the burden on them to figure out the source of the misunderstanding. If you really aren’t sure what to ask, because you’re confused or the other person was vague, I like “say more?” as a way to leave it open ended for the other person on how to elaborate. Consider the implications of how you communicate. When you say things, you are not just communicating the words you speak, you are also affecting the person you're talking to. If the person you're infodumping to isn't interested in the topic, infodumping anyway puts them in an awkward situation where they either have to ask you to stop or sit through a long conversation they didn't want to be in. Another tricky scenario is when the other person is interested, but an infodump is not the right level of detail for them right now. Perhaps they are new to the topic, or perhaps they asked a direct question. If they're still trying to get the "big picture", zooming in to fine-grained details will often just confuse them further. Info-dumping during an apology—even if it’s related to the thing you're apologizing for!—buries the apology. More than that, it implies that you expect mitigated judgement . If there is a power dynamic between you (say a wealth gap, or you are a manager and they are an employee), that expectation of mitigated judgment implies you expect to be forgiven , and an apology given in expectation of forgiveness is really just a request for absolution . Instead, apologize directly. If you were in an altered mental state (angry, sleep-deprived, experiencing a trauma trigger), you can add at most 1-2 sentences of context asking the other person to mitigate judgement. Not all apologies need context; often "i was wrong, i'm sorry" is enough. As we've seen above, there are times when infodumps actively hurt you. Even when they don't, though, there can be times when they aren't helping. Everyone comes to a conversation with a different background, and you cannot perfectly predict how they will respond. Rather than trying to avoid every possible miscommunication by packing the maximum amount of information—Say what you mean to say. Then, address the actual miscommunication (or regular conversation!) that happens afterwards. This saves time and energy for both conversational partners. The common theme of all of the above is to communicate effectively and radiate intent . Making it easy for the other person to understand both what you're saying and why you're saying it incurs a lot of goodwill, and makes it possible to say more things more bluntly than you would otherwise. A common trap I see people fall into is to say the first thing on their mind. This is fine for conversations between friends (although you should still consider how it affects your relationship!) but but is often counterproductive in other contexts. Slow down. Take your time. Say what you mean to say. If you don't mean to say anything, don't say anything at all.

0 views

an engineer's perspective on hiring

note for my friends: this post is targeted at companies and engineering managers. i know you know that hiring sucks and companies waste your time. this is a business case for why they shouldn't do that. most companies suck at hiring. they waste everyone’s time (i once had a 9-round interview pipeline!), they chase the trendiest programmers , and they can’t even tell programmers apart from an LLM . in short, they are not playing moneyball . things are bad for interviewees too. some of the best programmers i know (think people maintaining the rust compiler) can’t get jobs because they interview poorly under stress . one with 4 years of Haskell experience and 2 years of Rust experience was labeled as “non-technical” by a recruiter. and of course, companies repeatedly ghost people for weeks or months about whether they actually got a job. this post explores why hiring is hard, how existing approaches fail, and what a better approach could look like. my goal, of course, is to get my friends hired. reach out to me if you like the ideas here. before i start talking about my preferred approach, let’s start by establishing some (hopefully uncontroversial) principles. interviews should: there is also a 6th criteria that's more controversial. let's call it taste . an engineer with poor taste can ship things very quickly at the expense of leaving a giant mess for everyone else on the team to clean up. measuring this is very hard but also very important. conversely, someone who spends time empowering the rest of their team has a multiplicative effect on their team's productivity (c.f. "Being Glue" ). let's look at some common interviews and how they fare. fails on differentiating, applicability, respect, taste . gives very little signal about long term value . live coding cannot distinguish a senior programmer from a marketer using chatGPT , and most interview questions have very little to do with day-to-day responsibilities. all good software engineers are generalist and live coding does not select for generalists. you can augment live coding with multiple rounds of interviews, each of which tests one of the above responsibilities. but now you lose time efficiency ; everything takes lots of engineer time. doing this despite the expense is a show of wealth, and now you are no longer playing moneyball. additionally, people with lots of experience often find the experience demeaning, so you are filtering out the best applicants. a friend explicitly said "I have 18 years of experience on GitHub; if you can't tell I'm a competent programmer from that it's not a good fit." something not often thought about is that this also loses you taste . the code that someone puts together under pressure is not a reflection of how they normally work, and does not let you judge if your engineers will like working with them. fails on differentiating and respect , and partially on applicability . take home interviews are very easy for chatGPT to game and have all the other problems of live interviews, except that they remove the "interview poorly under stress" component. but they trade off a fundamental time asymmetry with the applicant, which again drives away the best people. this does a lot better. you can't use chatGPT to fake an architecture interview 1 . it fails at applicability (you don't ever see the applicant's code). at first glance it appears to give you some insight into taste , but often it is measuring "how well does the applicant know the problem domain" instead of "how does the applicant think about design problems", so you have to be careful about over-indexing on it. i haven't seen this one a lot for external interviews, but i see it very commonly for internal transfers within a company. it has much of the same tradeoffs as architecture design interviews, except it usually isn't trying to judge skills at all, mostly personality and "fit" (i.e. it fails on differentiating and partially on applicability ). i think it makes sense in environments where the candidate has a very strong recommendation and there's little competition for the position; or if you have some other reason to highly value their skills without a formal assessment. this is an interesting one. i've only ever seen it from Oxide Computer Company . i like it really quite a lot. the process looks like this: this does really really well on nearly every criteria (including respect —note that the time spent here is symmetric, it takes a long time for Oxide's engineers to read that much written material). it fails on time efficiency . i have not gone through this process, but based on the questions and my knowledge of who gets hired at oxide, i would expect just the written work to take at around 5-15 hours of time for a single application. given oxide and their goals, and the sheer number of people who apply there, i suspect they are ok with that tradeoff (and indeed probably value that it drives away people who aren't committed to the application). but most companies are not oxide and cannot afford this amount of time on both sides. if i were to take ideas from the oxide process without sacrificing too much time, i’d keep "you write the code ahead of time and discuss it in the interview". this keeps the advantage of take-home interviews—no time pressure, less stressful environment—while adding a symmetric time component that makes talented engineers less likely to dismiss the job posting out of hand, without an enormous up-front expenditure of time. and discussing the code live filters out people who just vibecoded the whole thing (they won't be able to explain what it does!) while giving everyone else a chance to explain their thinking, helping with applicability and taste . this still has some time asymmetry if the applicant doesn’t have existing open source work they want to show to an interviewer, but it’s a lot less than 5-15 hours, and the company is forced to dedicate some of their own engineer time, so they have motivation not to “throw work over the wall”, showing respect for the applicant. this one i’ve also only ever seen once. the format is that the interviewer writes some mediocre code ahead of time and asks the applicant how they would improve it. i did very well on this format so i'm biased, but i like it a lot. it aces all our criteria: if i were a hiring manager, i would use a combo of a code review interview and a work sample discussed live, giving precedence to the code review and telling the applicant ahead of time that the work sample doesn’t have to be perfect. programming is fundamentally a collaborative process. having the applicant collaborate on both sides (reviewing and authoring) shows you a lot about how they work, and signals to them that you care about more than the equivalent of their SAT score. i also suggest there always be at least one interview between the applicant and their future manager (this seems to already be common practice—yay!). "people don't quit jobs, they quit bosses": letting them meet ahead of time saves everyone pain down the road. thank you for reading! i hope this inspires you to change your own hiring processes, or at least to write a comment telling me why i'm wrong ^^. you can reach me by email if you want to comment privately. a friend worked at a Pivotal Labs where the primary job responsibility was to pair with client developers. the interview process was for a candidate to pair with an existing employee for a whole day and "shadow" them. he points to Nat Bennett's notes on the interview process as a more detailed writeup. the most interesting comment i got in response to this post was about "red-teaming" the interview to see how effective it is. for example: the friend who suggested this said both ideas were categorically rejected by management and they continued to send him mediocre resumes for people he didn't want to hire. update from after publishing: a friend said they’ve seen people successfully use chatgpt to game design interviews. oof. ↩ this section was added the day after publishing in response to feedback from senior engineers and hiring managers. ↩ differentiate . be able to tell the difference between a senior programmer and a marketer using chatgpt. be applicable . reflect the actual job duties. this includes coding. but it also includes architecture design, PR review, documentation, on and on and on. all good senior software engineers are generalists. think long term . select for applicants who will be good employees for years to come, not just in the next quarter. people are not fungible . there is a high cost to losing employees who are a good fit to the project . there is a high cost to losing employees in general . companies often over-index on crystallized knowledge over fluid intelligence. spending an additional month to find people who specialize in your tech stack, when you could have onboarded them to that stack in a month, is an advanced form of self-sabotage. be time efficient . spend as little time as possible interviewing. engineer time is expensive. be respectful . respect the applicant and their time. if you don't respect the applicant, you will select for people who don't respect themselves, and drive away the best applicants. "but i want to select for people that don't respect themselves so i can pay them less"—get the hell off my site and don't come back. the applicant submits samples of their existing work (or writes new documents specially for the interview) the applicant writes detailed responses to 8 questions about their values, work, career, and goals. the applicant goes through 9 hours of interviews with several oxide employees. it reverses the time asymmetry and reduces the amount of time spent . the interviewer makes one up front time commitment, the applicant makes no up front commitment, and they spend the same amount of time per interview. it’s applicable : you see how the applicant gives interpersonal feedback; discussions about the code naturally lead into discussions about design; and you get information about their sense of taste. taking existing employees and putting them through the interview process and seeing whether it suggests hiring them again. if it doesn't, it's either horribly noisy or filtering out good candidates or both. "regret analysis": following the careers of rejected candidates to see who went on to do interesting things. if so, find out which part of the interview process failed and change it. update from after publishing: a friend said they’ve seen people successfully use chatgpt to game design interviews. oof. ↩ this section was added the day after publishing in response to feedback from senior engineers and hiring managers. ↩

0 views

you are in a box

You are trapped in a box. You have been for a long time. Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can. most tools simultaneously think too small and too big. “i will let you do anything!”, they promise, “as long as you give up your other tools and switch to me!” this is true of languages too. any new programming language makes an implicit claim that “using this language will give you an advantage over any other language”, at least for your current problem. once you start using a tool for one purpose, due to switching costs, you want to keep using that tool. so you start using it for things that wasn’t designed for, and as a result, tools tend to grow and grow and grow until they stagnate . in a sense, we have replicated the business boom-and-bust cycle in our own tools. there are two possible ways to escape this trap. the first is to impose a limit on growth , so that tools can’t grow until they bust. this makes a lot of people very unhappy and is generally regarded as a bad idea. the second is to decrease switching costs. by making it easier to switch between tools, or to interoperate between multiple tools in the same system, there is not as much pressure to have “one big hammer” that gets used for every problem. tools and languages can decrease switching costs by keeping backwards compatibility with older tools, or at least being close enough that they’re easy to learn for people coming from those tools. for example, ripgrep has almost exactly the same syntax as GNU grep, and nearly every compiled language since C has kept the curly braces. tools can also collaborate on standards that make it easier to interoperate. this is the basis of nearly every network protocol, since there's no guarantee that the same tool will be on the other side of the connection. to some extent this also happens for languages (most notably for C), where a language specification allows multiple different compilers to work on the same code. this has limitations, however, because the tool itself has to want (or be forced) to interoperate. for example, the binary format for CUDA (a framework for compiling programs to the GPU) is undocumented, so you're stuck with reverse engineering or re-implementing the toolchain if you want to modify it. the last "internal" way to talk to languages is through a "foreign function interface", where functions in the same process can call each other cheaply. this is hard because each language has to go all the way down to the C ABI before there's something remotely resembling a standard, and because two languages may have incompatible runtime properties that make FFI hard or slow . languages that do encourage FFI often require you to write separate bindings for each program: for example, Rust requires you to write blocks for each declaration, and python requires you to do that and also write wrappers that translate C types into python objects. i won't talk too much more about this—the work i'm aware of in this area is mostly around WASM and WASM Components , and there are also some efforts to raise the baseline for ABI above the C level. another approach is to compose tools. the traditional way to do this is to have a shell that allows you to freely compose programs with IPC. this does unlock a lot of freedom! IPC allows programs to communicate across different languages, different ABIs, and different user-facing APIs. it also unlocks 'ad-hoc' programs, which can be thought of as situated software for developers themselves. consider for example the following shell pipeline: this shows the 10 largest files in the git history for the current repository . let's set aside the readability issues for now. there are a lot of good ideas here! note that programs are interacting freely in many ways: the equivalent in a programming language without spawning a subprocess would be very verbose; not only that, it would require a library for the git operations in each language, bringing back the FFI issues from before (not to mention the hard work designing "cut points" for the API interface 1 ). this shell program can be written, concisely, using only tools that already exist. note though that the data flow here is a DAG: pipes are one-way, and the CLI arguments are evaluated before the new program is ever spawned. as a result, it’s not possible to do any kind of content negotiation (other than the programmer hard-coding it with CLI args; for example tools commonly have ). the downside of this approach is that the interface is completely unstructured; programs work on raw bytes, and there is no common interface. it also doesn't work if the program is interactive, unless the program deliberately exposes a way to query a running server (e.g. or ). let's talk about both of those. powershell , and more recently, nushell , extend traditional unix pipelines with structured data and a typesystem. they have mechanisms for parsing arbitrary text into native types, and helper functions for common data formats. this is really good! i think it is the first major innovation we have seen in the shell language in many decades, and i'm glad it exists. but it does have some limitations: and it is very hard to fix these limitations because there is no "out-of-band" communication channel that programs could use to emit a schema; the closest you could get is a "standardized file descriptor number", but that will lock out any program that happens to already be using that FD. we have limited kinds of reflection in the form of shell completion scripts, but they're not standardized: there's not a standard for the shell to query the program, and there's not a standard for the format the program returns. the CLI framework inside the program often does have a schema and reflection capabilities, but they're discarded the second you go over an IPC boundary. how do you get a schema? well, you establish in-band communication. RPC is theoretically about "remote" procedure calls, but it's just as often used for local calls. the thing that really distinguishes it is that it's in-band: you have a defined interface that emits structured information. RPC works really quite well! there are frameworks for forwards- and backwards-compatible RPC ; types and APIs are shared across languages; and interop for a new language only requires writing bindings between that language and the on-wire format, not solving a handshake problem between all pairs of languages nor dropping down to the C ABI. the main downside is that it is a lot of work to add to your program. you have to extensively modify your code to fit it into the shape the framework expects, and to keep it performant you sometimes even have to modify the in-memory representation of your data structures so they can live in a contiguous buffer. you can avoid these problems, but only by giving up performance when deserializing (e.g. by parsing JSON at runtime). all these limitations are because programs are a prison . your data is trapped inside the box that is your program. the commonality between all these limitations is that they require work from the program developer, and without that work you're stuck. even the data that leaves the program has to go through the narrow entrances and exits of the box, and anything that doesn't fit is discarded . some languages try to make the box bigger—interop between Java, Kotlin, and Clojure is comparatively quite easy because they all run on the JVM. but at the end of the day the JVM is another box; getting a non-JVM language to talk to it is hard. some languages try to make the box extensible—LISPs, and especially Racket, try to make it very easy to build new languages inside the box. but getting non-LISPs inside the box is hard. some tools try to give you individual features—smalltalk gets you orthogonal persistence; pluto.jl gets you a “terminal of the future”; rustc gets you sub-process incremental builds. but all those features are inside a box. often, tools don’t even try. vendor lock in, subtle or otherwise, is everywhere around us. tools with this strategy tend to be the largest, since they have both the biggest budget and the greatest incentive to prevent you from switching tools. and always, always, always, you are at the mercy of the program author. in my next post, i will discuss how we can escape this box. blog post forthcoming ↩ the output of is passed as a CLI argument to the output of is passed as stdin to the output of is interpreted as a list and programmatically manipulated by . this kind of meta-programming is common in shell and has concise (i won't go as far as "simple") syntax. the output from the meta-programming loop is itself passed as stdin to the command there is no interop between powershell and nushell. there is no protocol for programs to self-describe their output in a schema, so each program's output has to be special-cased by each shell. powershell side-steps this by building on the .NET runtime , and having native support for programs which emit .NET objects in their output stream . but this doesn't generalize to programs that don't run on the CLR . there is no stability guarantee between versions of a program. even tools with semi-structured JSON output are free to change the structure of the JSON, breaking whatever code parses it. D. R. MacIver, “This is important” Wikipedia, “Zawinski’s Law of Software Envelopment” Graydon Hoare, “Rust 2019 and beyond: limits to (some) growth.” Rich Hickey, “Simple Made Easy” Vivek Panyam, “Parsing an undocumented file format” The Khronos® Group Inc, “Vulcan Documentation: What is SPIR-V” Aria Desires, “C Isn’t A Language Anymore” Google LLC, “Standard library: cmd.cgo” Filippo Valsorda, “rustgo: calling Rust from Go with near-zero overhead” WebAssembly Working Group, “WebAssembly” The Bytecode Alliance, “The WebAssembly Component Model” Josh Triplett, “crABI v1” Clay Shirky, "Situated Software" Microsoft, "PowerShell 7.5: 4. Types" Microsoft, "PowerShell 7.5: What is PowerShell?" Microsoft, "PowerShell 7.5: about_Output_Streams" Microsoft, ".NET Execution model: Common Language Runtime (CLR) overview" Nushell Project, "Nu Fundamentals: Types of Data" Google LLC, “Protocol Buffers” Robert Lechte, “Programs are a prison: Rethinking the fundamental building blocks of computing interfaces” Siderea, "Procrustean Epistemologies" blog post forthcoming ↩

0 views

constrained languages are easier to optimize

a recurring problem in modern “low-level” languages 1 is that they are hard to optimize. they do not reflect the hardware , they require doing complex alias analysis , and they constantly allocate and deallocate memory . 2 they looked at the structure/expressiveness tradeoff and consistently chose expressiveness. consider this paper on stream fusion in Haskell . this takes a series of nested loops, each of which logically allocate an array equal in size to the input, and optimizes them down to constant space using unboxed integers. doing the same with C is inherently less general because the optimizing compiler must first prove that none of the pointers involved alias each other. in fact, optimizations are so much easier to get right in Haskell that GHC exposes a mechanism for users to define them ! these optimizations are possible because of referential transparency —the compiler statically knows whether an expression can have a side effect. “haskell is known for performance problems, why are you using it as an example. also all GC languages constantly box and unbox values, you need raw pointers to avoid that.” GC languages do constantly box and unbox 3 , but you don’t need raw pointers to avoid that. consider futhark , a functional parallel language that compiles to the GPU. its benchmarks show it being up to orders of magnitude faster than sequential C on problems that fit well into its domain. it does so by having unboxed fixed-size integers, disallowing ragged arrays, and constraining many common operations on arrays to only work if the arrays are statically known to have the same size. futhark is highly restrictive. consider instead SQL. SQL is a declarative language, which means the actual execution is determined by a query planner, it’s not constrained by the source code. SQL has also been around for decades, which means we can compare the performance of the same code over decades. it turns out common operations in postgres are twice as fast as they were a decade ago . you can imagine writing SQL inline—wait no it turns out C# already has that covered . SQL is not a general purpose language. but you don’t need it to be! your performance issues are not evenly distributed across your code; you can identify the hotspots and choose against a language with raw pointers in favor of one more structured and therefore more amenable to optimization. there are various kinds of memory optimizations that are only possible if you have access to raw pointers; for example NaN boxing , XOR linked lists , and tagged pointers . sometimes you need them, which means you need a language that allows them. but these kinds of data structures are very rare! we should steer towards a general purpose language that does not expose raw pointers, and only drop down when we actually need to use them. well, Rust is a good step in the right direction: raw pointers are opt-in with ; Iterators support functional paradigms that allow removing bounds checks and fusing stream-like operations ; and libraries like rayon make it much easier to do multi-threaded compilation. but i think this is in some sense the wrong question. we should not be asking “what language can i use everywhere for every purpose”; we should build meta-languages that allow you to easily use the right tool for the job. this is already true for regular expressions and query languages; let’s go further. i want inline futhark; inline CSS selectors; inline datalog; ffi between python and C that’s trivially easy. the easier we make it to interop, the easier it becomes to pick the right tool for the job. next time you hit a missed optimization, ask yourself: why was this hard for the compiler? can i imagine a language where optimizing this is easier? this is true for all of C, C++, and unsafe Rust (and to some extent Fortran, but Fortran does not require alias analysis ). ↩ also, they require doing PGO ahead of time instead of collecting info dynamically at runtime. but i haven’t found any benchmarks showing Java/luaJIT programs that are faster than equivalent C, so i won’t claim that JIT is inherently faster. ↩ true in the general case, but not always in practice. in Go and Java, the compiler needs to do escape analysis to know whether a variable can be unboxed. in Haskell, the situation is more complicated because of lazy evaluation; see Alexis King on the GHC strictness analyzer for more info. ↩ languages that expose raw pointers are surprisingly hard to optimize by constraining the language to require additional structure, the compiler has much more freedom to optimize by making it easier to switch between languages, we make it easier to choose the right tool for the job, increasing the performance of our code this is true for all of C, C++, and unsafe Rust (and to some extent Fortran, but Fortran does not require alias analysis ). ↩ also, they require doing PGO ahead of time instead of collecting info dynamically at runtime. but i haven’t found any benchmarks showing Java/luaJIT programs that are faster than equivalent C, so i won’t claim that JIT is inherently faster. ↩ true in the general case, but not always in practice. in Go and Java, the compiler needs to do escape analysis to know whether a variable can be unboxed. in Haskell, the situation is more complicated because of lazy evaluation; see Alexis King on the GHC strictness analyzer for more info. ↩

0 views

sorry for the rss screwup

a couple days ago i pushed about 10 empty posts at once to everyone subscribed to my RSS feed. oops. sorry about that. i've since fixed it, but most RSS readers i've seen will cache the posts indefinitely once they're published. the workaround for my own client was to delete my page and then readd it, which will unfortunately discard all your read/unread state. the reason this happened is that i added a new kind of "stub" post, and handled that correctly on the main site, but not on the rss feed. you can see the intended layout of the stub posts here if you're interested.

0 views

operators, not users and programmers

the modern distinction between “programmers” and “users” is evil and destroys agency. spreadsheets are hugely successful. Felienne Hermans, who has spent her career studying spreadsheets, attributes this success to " their immediate feedback system and their continuous deployment model ": the spreadsheet shows you its result as soon as you open it, and it requires no steps to run other than to install Excel and double-click the file. Rik calls Excel “malleable software” and the resulting programs “vine-like systems” : The dream of malleable software is that we can enlarge the space of possibilities, users can enjoy more freedom and get more out of their software, without having to rely on software developers to provide it for them. i would go one step further: the dream of malleable software is to unify users and programmers, such that there are just “operators” of a computer, and “writing a program” doesn’t sound any harder than “writing a resume”: the distinction between "user" and "programmer" is an artifact of our presently barely-programmable and barely-usable computing systems.  I would like to use the neutral word "operator" instead. this is a relatively new distinction! if we look at the history of computing and of programming languages, we see very different patterns: In the 1960’s the supply of programmers was not very deep so IBM and other companies trying to gain a computer sale would often have to sell the business prospect on the idea of creating its own programmer(s). Sometimes it was the shipping clerk; sometimes it was the head order taker; sometimes it was a bookkeeper, and sometimes it was a woman or man packing items in the warehouse. this is what i want: for programming to be easy and simple enough to pick up that people can do it without specialized training in the field, so that they can write situated software . contrast malleable software to the systems that programmers often build: Many, many technologists have taken one look at an existing workflow of spreadsheets, reacted with performative disgust, and proposed the trifecta of microservices, Kubernetes and something called a "service mesh". This kind of Big Enterprise technology however takes away that basic agency of those Excel users, who no longer understand the business process they run and now have to negotiate with  ludicrous technology dweebs  for each software change. The previous pliability of the spreadsheets has been completely lost. now, this doesn’t happen because programmers are stupid and evil. it happens because the systems we build are amenable to all the features that programmers take for granted : what i want to imagine is what it would look like to build computing systems that have those features and are also malleable. let's start by looking at what malleable systems already exist. note these aren't just hot-patching, where you edit a system while it's running. hot-patching is certainly useful and i would like to see more of it, but it doesn't unify the source and rendered version of a program, you still have to trip the condition in order to observe the change. malleable languages are a combination of: note a pattern here: these don't require prior approval in order to use. anyone can throw a file into google drive and hit share without prior approval (sometimes this is frowned upon, in which case it gets called "Shadow IT"). unfortunately i am not aware of a malleable testing system—if you know of one, please do tell me ! so, we want the following from a malleable system: this is a tall order! the rest of this series is dedicated to ideas about how to make this possible. i consulted a local friend of mine, a polisci major, and asked him what he would build if he were able to program. he said a penguin that walks across the screen, followed by a puffin friend for him. so, here's to penguins 🐧 distributed version control systems (VCS) automated testing (when integrated with a VCS, often called "continuous integration") gradual controlled rollouts (when integrated with a VCS, often called "continuous deployment") spreadsheets, as previously discussed WYSIWYG editors: Microsoft Word, Obsidian, Typst, Wordpress. In Word, and in Obsidian's default view, the compile step is completely hidden and it appears to the user as if the source and rendered view are the same. in other editors, the rendered view updates quickly and frequently enough that they get immediate feedback, just like a spreadsheet. browser devtools, where editing an element in the inspector immediately updates the rendered view sonic-pi , where editing code live-updates the sound played hot-patching live previews that makes updates appear instant undo/redo for the whole system file shares: google drive; sharepoint; onedrive; etc. the tooling for diffing and reverting is very primitive compared to git or hg (in particular diffing is the responsibility of the application, not the VCS), but the primitives are there. Bank Python , where the distinction between local and persisted storage is smoothed over by the runtime Bank Python's vouch system , where hitting "approve" insta-deploys to prod. at a technical level this is basically the same as code review, but unlike normal CI systems it’s not configured to require tests to pass, because anything that’s too annoying will end up with people bypassing the system to build shadow IT. hot-reloading and live previews, like spreadsheets in particular, we want the user to write in the representation that makes the most sense to them, whether that's a spreadsheet or an SQL query or a text editor of markup automatic and continuous durability , like autosave in microsoft office products. in particular this undo/redo capability works on the whole system including derived data, not just the source code itself, so you can try new things without fear of breaking everything. the closest programmers have to this today is , which automatically snapshots the working tree , but taking snapshots still requires a manual action, and you need to set up the system in advance. distributed version control++ with unrestricted distribution and an easy interface, like google drive and dropbox with diffing/merging/reverting, like traditional VCS automated and instantly triggered testing, like piper at google. this could run your tests in the background as you edit your program, with integrated build caching so that only the affected tests rerun. continuous deployment we want explicit approval and controlled rollout, like traditional CD, so we can separate "upload the code" from "run the code in prod" but we want it to be trivially easy to give that approval, like double-clicking an email attachment or the vouch system. don't confuse "controlled rollout" with access control—it's useful even if you're a solo programmer. and of course, performance. Felienne Hermans, "Proposition #1" Rik de Kort, "Vine-like Systems and Malleability" Rik de Kort, "Technical and functional risk" Stanislav Datskovskiy, "Seven Laws of Sane Personal Computing" Brian M. Kelly, The AS/400 and IBM i RPG Programming Guide Clay Shirky, "Situated Software" Cal Peterson, "An oral history of Bank Python" Hillel Wayne, "What engineering can teach (and learn from) us" "Jujutsu—a version control system"

0 views

complected and orthogonal persistence

Everything Not Saved Will Be Lost —Ancient Nintendo Proverb say that you are writing an editor. you don't want to lose people's work so you implement an "autobackup" feature, so that people can restore their unsaved changes if the program or whole computer crashes. implementing this is hard ! the way i would do it is to serialize the data structures using something like bincode and then write them to an SQLite database so that i get crash-consistency. there are other approaches with different tradeoffs. this 1983 paper asks: why are we spending so much time rewriting this in applications, instead of doing it once in the runtime? it then introduces a language called "PS-algol", which supports exactly this through a library. note that arbitrary data types in the language are supported without the need for writing user code. it turns out that this idea is already being used in production. not in that form—people don’t use Algol anymore—but the idea is the same. M (better known as MUMPS ), Bank Python , and the IBM i 1 are still used in healthcare, financial, and insurance systems, and they work exactly like this 2 . here is a snippet of M that persists some data to a database: and here is some Bank Python that does the same: and finally some COBOL: 3 note how in all of these, the syntax for persisting the data to disk is essentially the same as persisting it to memory (in MUMPS, persisting to memory is exactly the same, except you would write instead of ). if you don't require the runtime to support all datatypes, there are frameworks for doing this as a library. protobuf and flatbuffer both autogenerate the code for a restricted set of data types, so that you only have to write the code invoking it. the thing these languages and frameworks have in common is that the persistence is part of the program source code; either you have to choose a language that already has this support, or you have to do extensive modifications to the source code to persist the data at the right points. i will call this kind of persistence complected persistence because they tie together the business logic and persistence logic (see Simple Made Easy by Rich Hickey for the history of the word "complect"). there is also orthogonal persistence . "orthogonal persistence" means your program's state is saved automatically without special work from you the programmer. in particular, the serialization is managed by the OS, database, or language runtime. as a result, you don't have to care about persistence, only about implementing business logic; the two concerns are orthogonal . orthogonal persistence is more common than you might think. some examples: these forms of orthogonal persistence work on the whole OS state. you could imagine a version that works on individual processes: swap the process to disk, restore it later. the kernel kinda already does this when it does scheduling. you can replicate it in userspace with telefork , which even lets you spawn a process onto another machine. but the rest of the OS moves on while the process is suspended: the files it accesses may have changed, the processes it was talking to over a socket may have exited. what we want is to snapshot the process state: whatever files on disk stay on disk, whatever processes it was talking to continue running. this allows you to rewind and replay the process, as if the whole thing were running in a database transaction. what do we need in order to do that? effectively, we are turning syscalls into capabilities , where the capabilities we give out are “exactly the syscalls the process made last time it spawned”. note how this is possible to do today, with existing technology and kernel APIs! this doesn’t require building an OS from scratch, nor rewriting all code to be in a language with tracked effects or a capability system. instead, by working at the syscall interface 4 between the program and the kernel, we can build a highly general system that applies to all the programs you already use. note that this involves 3 different levels of tracking, which we can think of in terms of progressive enhancement: the editor example i gave at the beginning refers to 3; but you can get really quite a lot of things just with 1 (tracked record/replay and transactional semantics). for example, here are some tools that would be easy to build on top: this is not an exhaustive list, just the first things on the top of my head after a couple days of thinking about it. what makes this orthogonal persistence system so useful is that all these tools are near-trivial to build on top: most of them could be done in shell scripts or a short python script, instead of needing a team of developers and a year. not inherently. “turning your file system into a database“ is only as slow as submitting the query is 5 —and that can be quite fast when the database runs locally instead of over the network; see sled for an example of such a DB that has had performance tuning. rr boasts less than a 20% slowdown . bubblewrap uses the kernel’s native namespacing and to my knowledge imposes no overhead. now, the final system needs to be designed with performance in mind and then carefully optimized but, you know. that's doable. kinda. there are two possible ways to implement intra-process persistence (i.e. "everything other than disk writes"). 1 is cheap if you don't have much memory usage compared to CPU time. 2 is cheap if you don't have much CPU time compared to memory usage. only 2 allows you to modify the binary between saving and restoring. it's possible to do both for the same process, just expensive. many of the ideas in this post were developed in collaboration with edef. if you want to see them built, consider sponsoring her so she has time to work on them. the IBM i will be coming up many times in this series, particularly block terminals and the object capability model. i will glaze over parts of the system that cannot be intercepted at a syscall boundary; but that said i want to point out that IBM POWER extensions and TIMI correspond roughly to the modern ideas of the CHERI and WASM projects. ↩ jyn, you might ask, why are all these systems so old! why legacy systems? isn't there any new code with these ideas? and the answer is no. systems research is irrelevant and its new ideas, such as they are, do not make it into industry. ↩ credit @yvanscher ↩ on all OSes i know other than Linux, the syscall ABI is not stable and programs are expected to use libc bindings. you can do something similar by using LD_PRELOAD to intercept libc calls. ↩ one might think that you have to flush each write to disk before returning from write() in order to preserve ACID semantics. not so; this is exactly the problem that a write-ahead-log fixes. ↩ hibernation (suspend to disk). first invented in 1992 for the Compaq LTE Lite . Windows has this on by default since Windows 8 (2012). MacOS has had it on by default since OS X 10.4 (2005). virtualized hibernation in hypervisors like VirtualBox and VMWare (usually labeled "Save the machine state" or something similar) a filesystem that supports atomic accesses, snapshots, and transaction restarts, such as ZFS . a runtime that supports detailed tracking and replay of syscalls, such as rr . this works by intercepting syscalls with ptrace() , among other mechanisms, and does not require any modifications to the program executable or source code. a sandbox that prevents talking to processes that weren’t running when the target process was spawned (unless those processes are also in the sandbox and tracked with this mechanism), such as bubblewrap . features you can get just by recording and replaying the whole process ("tracking between processes") features you can get by replaying from a specific point in the process ("tracking within a process") features you can only get with source code changes, by allowing the process to choose where it should be restored to ("tracking that needs source code changes") collaborative terminals, where you can “split” your terminal and hand a sandboxed environment of your personal computer to a colleague so they can help you debug an issue. this is more general than OCI containers because you don't need to spend time creating a dockerfile that reproduces the problem. this is more general than because you can edit the program source to add printfs, or change the input you pass to it at runtime. “save/undo for your terminal”, where you don’t need to add a flag to , because the underlying filesystem can just restore a snapshot. this generalizes to any command—for example, you can build an arbitrary command that works even if installed after the data is lost, which is not possible today . note that this can undo by-process, not just by point-in-time, so it is strictly more general than FS snapshots. query which files on disk were modified the last time you ran a command. for example you could ask “where did this command install its files?”. the closest we have to this today is , which only works for changes done by the package manager. asciinema , but you actually run the process instead of building your own terminal emulator. this also lets you edit the recording live instead of having to re-record from scratch. the “post-modern build system” (also needs a salsa -like red-green system ) “save/undo for your program”, where editors and games can take advantage of cheap snapshots to use the operating system's restore mechanism instead of building their own. take a snapshot of the memory, registers, and kernel state (e.g. file descriptors). this is how telefork works. this only works with a single version of the executable; any change, even LTO without changing source code, will break it. replay all syscalls done by the process. this will work across versions, as long as the program makes the same syscalls in the same order. persisting program state is hard and basically requires implementing a database persistence that does not require action from the program is called “orthogonal persistence” it is possible to build orthogonal persistence for individual processes with tools that exist today, with only moderate slowdowns depending on how granular you want to be there are multiple possible ways to implement this system, with different perf/generality tradeoffs such a system unlocks many kinds of tools by making them orders of magnitude easier to build Dan Luu, "Files are hard" Ty Overby, Zoey Riordan, Victor Koenders, bincode Atkinson, M.P., Bailey, P.J., Chisholm, K.J., Cockshott, W.P. & Morrison, R. “PS-algol: A Language for Persistent Programming”. In Proc. 10th Australian National Computer Conference, Melbourne, Australia (1983) pp 70-79. Cal Paterson, "An oral history of Bank Python" Hugo Landau, "IBM i: An Unofficial Introduction" Google LLC, "Protocol Buffers" Google LLC, "FlatBuffers Docs" Rich Hickey, "Simple Made Easy" Tristan Hume, "Teleforking a process onto a different computer!" Robert O’Callahan et al., "RR" Simon McVittie et al., "bubblewrap" Chip Morningstar and F. Randall Farmer, "What Are Capabilities?" Robert O'Callahan, "rr Trace Portability" Free Software Foundation, Inc., "GNU Coreutils" Waleed Khan, "git undo: We can do better" Marcin Kulik, "Record and share your terminal sessions, the simple way." Jade Lovelace, "The postmodern build system" Salsa developrs, "About salsa" The Rust Project contributors, "Incremental compilation in detail" Tyler Neely, "sled - it's all downhill from here!!!" The PostgreSQL Global Development Group, "Reliability and the Write-Ahead Log: Write-Ahead Logging (WAL)" Yvan Scher, "7 cobol examples with explanations." Robert N. M. Watson et al., "CTSRD – Rethinking the hardware-software interface for security" WebAssembly Working Group, "WebAssembly" Rob Pike, "Systems Software Research is Irrelevant" System Calls Manual, "ptrace(2)" dpkg suite, "dpkg(1)" Wikipedia, "MUMPS" Wikipedia, "orthogonal persistence" Wikipedia, "IBM i" Wikipedia, "ZFS" Wikipedia, "Orthogonality" Wikipedia, "Hibernation (computing)" Wikipedia, "Compaq LTE Lite" the IBM i will be coming up many times in this series, particularly block terminals and the object capability model. i will glaze over parts of the system that cannot be intercepted at a syscall boundary; but that said i want to point out that IBM POWER extensions and TIMI correspond roughly to the modern ideas of the CHERI and WASM projects. ↩ jyn, you might ask, why are all these systems so old! why legacy systems? isn't there any new code with these ideas? and the answer is no. systems research is irrelevant and its new ideas, such as they are, do not make it into industry. ↩ credit @yvanscher ↩ on all OSes i know other than Linux, the syscall ABI is not stable and programs are expected to use libc bindings. you can do something similar by using LD_PRELOAD to intercept libc calls. ↩ one might think that you have to flush each write to disk before returning from write() in order to preserve ACID semantics. not so; this is exactly the problem that a write-ahead-log fixes. ↩

0 views

how i write blog posts

this isn’t about about blogging engines, don’t worry. there’s already plenty of writing about those. i use zola, i am mildly dissatisfied with it, i don’t care enough to switch. no, this is how i actually write. i have an um. eccentric setup. in particular, i can write draft posts from any of my devices—desktop, laptop, phone—and have them show up live on a hidden subdomain of jyn.dev without any special work on my part. how does this work? i’m glad you asked. works great! normally i write outlines and ideas down on mobile, and then clean them up into prose on desktop. when i edit on desktop, i sometimes use nvim (e.g. for posts like how i use my terminal that have complicated html fragments). unlike obsidian, nvim doesn't have autosave, so i added it myself : the Caddyfile is also quite simple: the one downside of this is that i get very little visibility on mobile onto why things are not syncing to the desktop. to make up for this, my phone and desktop are on the same tailnet , which allows me to ssh in remotely to check up on the zola server (i’ve never had to check up on the Caddy server). i like Termius for this. note some things about this setup: on my desktop, i have Caddy running a reverse proxy back to a live zola server. Caddy gives me nice things like https, and makes me less worried about having public ports on the internet. to get live-reloading working, Caddy also reverse-proxies websockets. on my desktop, i have the zola content/ directory sym-linked to a subdirectory of my obsidian notes folder. on all my devices, i run Obsidian Sync in the background, which automatically syncs my posts everywhere. it costs $5/month and doesn’t cause me trouble, which is a lot more than i can say for most technology. on laptop and mobile, i just write in obsidian, like i would for any other notes. i have a "blog post" template that inserts the zola header; otherwise i just write normal markdown. when i’m ready to publish, i commit the changes to git on desktop or laptop and push to github, which updates the public facing blog. i have live reloading all the way through, regardless of the editor or device i am using to edit the post. because it's just a public url, it's very easy to share with my friends and ask for feedback on early posts, without making the posts visible to random people on hacker news. if i ever want to take down the site, i just kill the zola server. it defaults to off when i start my computer.

0 views

how i use my terminal

this is a whole blog post because it is "outside the overton window"; it usually takes at least a video before people even understand the thing i am trying to describe. so, here's the video: the steps here that tend to surprise people are 0:11 , 0:21 , and 0:41 . when i say "surprise" i don't just mean that people are surprised that i've set this up, but they are surprised this is possible at all. here's what happens in that video: i got annoyed at VSCode a while back for being laggy, especially when the vim plugin was running, and at having lots of keybind conflicts between the editor, vim plugin, terminal, and window management. i tried zed but at the time it was quite immature (and still had the problem of lots of keybind conflicts). i switched to using nvim in the terminal, but quickly got annoyed at how much time i spent copy-pasting filenames into the editor; in particular i would often copy-paste files with columns from ripgrep, get a syntax error, and then have to edit them before actually opening the file. this was quite annoying. what i wanted was an equivalent of ctrl-click in vscode, where i could take an arbitrary file path and have it open as smoothly as i could navigate to it. so, i started using tmux and built it myself. people sometimes ask me why i use tmux. this is why! this is the whole reason! (well, this and session persistence.) terminals are stupidly powerful and most of them expose almost none of it to you as the user. i like tmux, despite its age, bugs, and antiquated syntax, because it's very extensible in this way. this is done purely with tmux config: and this is the contents of : i will not go through the whole regex, but uh. there you go. i spent more time on this than i probably should have. this is actually a trick; there are many steps here. this part is not so bad. tmux again. i also have a version that always opens an editor in the current pane, instead of launching in the default application. for example i use by default to view json files, but to edit them. here is the trick. i have created a shell script (actually a perl script) that is the default application for all text files. setting up that many file associations by hand is a pain. i will write a separate blog post about the scripts that install my dotfiles onto a system. i don't use Nix partly because all my friends who use Nix have even weirder bugs than they already had, and partly because i don't like the philosophy of not being able to install things at runtime. i want to install things at runtime and track that i did so. that's a separate post too. the relevant part is this: this bounces back to tmux. in particular, this is being very dumb and assuming that tmux is running on the machine where the file is, which happens to be the case here. this is not too bad to ensure - i just use a separate terminal emulator tab for each instance of tmux i care about; for example i will often have open one Windows Terminal tab for WSL on my local laptop, one for my desktop, and one for a remote work machine via a VPN. there's actually even more going on here—for example i am translating the syntax to something vim understands, and overriding so that it doesn't error out on the —but for the most part it's straightforward and not that interesting. this is a perl script that scripts tmux to send keys to a running instance of nvim (actually the same perl script as before, so that both of these can be bound to the same keybind regardless of whether nvim is already open or not): well. well. now that you mention it. the last thing keeping me on tmux was session persistence and Ansuz has just released a standalone tool that does persistence and nothing else . so. i plan to switch to kitty in the near future, which lets me keep all these scripts and does not require shoving a whole second terminal emulator inside my terminal emulator, which hopefully will reduce the number of weird mysterious bugs i encounter on a regular basis. the reason i picked kitty over wezterm is that ssh integration works by integrating with the shell, not by launching a server process, so it doesn't need to be installed on the remote. this mattered less for tmux because tmux is everywhere, but hardly anywhere has wezterm installed by default. honestly, yeah. i spend quite a lot less time fighting my editor these days. that said, i cannot in good conscience recommend this to anyone else. all my scripts are fragile and will probably break if you look at them wrong, which is not ideal if you haven't written them yourself and don't know where to start debugging them. if you do want something similar without writing your own tools, i can recommend: hopefully this was interesting! i am always curious what tools people use and how - feel free to email me about your own setup :) 0:00 I start with Windows Terminal open on my laptop. 0:02 I hit ctrl + shift + 5 , which opens a new terminal tab which 's to my home desktop and immediately launches tmux. 0:03 tmux launches my default shell, . zsh shows a prompt, while loading the full config asynchronously 0:08 i use to fuzzy find a recent directory 0:09 i start typing a ripgrep command. zsh autofills the command since i've typed it before and i accept it with ctrl + f . 0:11 i hit ctrl + k f , which tells tmux to search all output in the scrollback for filenames. the filenames are highlighted in blue. 0:12 i hold n to navigate through the files. there are a lot of them, so it takes me a bit to find the one i'm looking for. 0:21 i press o to open the selected file in my default application ( ). tmux launches it in a new pane. note that this is still running on the remote server ; it is opening a remote file in a remote tmux pane. i do not need to have this codebase cloned locally on my laptop. 0:26 i try to navigate to several references using rust-analyzer, which fails because RA doesn't understand the macros in this file. at 0:32 i finally find one which works and navigate to it. 0:38 i hit ctrl + k h , which tells tmux to switch focus back to the left pane. 0:39 i hit n again. the pane is still in "copy-mode", so all the files from before are still the focus of the search. they are highlighted again and tmux selects the next file in search order. 0:41 i hit o , which opens a different file than before, but in the same instance of . 0:43 i hit b , which shows my open file buffers. in particular, this shows that the earlier file is still open. i switch back and forth between the two files a couple times before ending the stream. i don't need a fancy terminal locally; something with nice fonts is enough. all the fancy things are done through tmux, which is good because it means they work on Windows too without needing to install a separate terminal. the editor thing works even if the editor doesn't support remote scripting. nvim does support RPC, but this setup also worked back when i used and . i could have written this such that the fancy terminal emulator scripts were in my editor, not in tmux (e.g. in nvim). but again this locks me into the editor; and the built-in terminals in editors are usually not very good. it's much easier to debug when something goes wrong (vscode's debugging tools are mostly for plugin extension authors and running them is non-trivial). with vim plugins i can just add statements to the lua source and see what's happening. all my keybinds make sense to me! my editor is less laggy. my terminal is much easier to script through tmux than through writing a VSCode plugin, which usually involves setting up a whole typescript toolchain and context-switching into a new project fish + zoxide + fzf . that gets you steps 4, 5, and kinda sorta-ish 6. "builtin functionality in your editor" - fuzzy find, full text search, tabs and windows, and "open recent file" are all commonly supported. qf , which gets you the "select files in terminal output" part of 6, kinda. you have to remember to pipe your output to it though, so it doesn't work after the fact and it doesn't work if your tool is interactive. note that it hard-codes a vi-like CLI ( ), so you may need to fork it or still add a script that takes the place of $EDITOR. see julia evans' most recent post for more info. e , which gets you the "translate into something your editor recognizes" part of 8, kinda. i had never heard of this tool until i wrote my own with literally the exactly the same name that did literally exactly the same thing, forgot to put it in PATH, and got a suggestion from asking if i wanted to install it, lol. or or , all of which get you 12, kinda. the problem with this is that they don't all support , and it means you have to modify this whenever you switch editors. admittedly most people don't switch editors that often, lol. terminals are a lot more powerful than people think! by using terminals that let you script them, you can do quite a lot of things. you can kinda sorta replicate most of these features without scripting your terminal, as long as you don't mind tying yourself to an editor. doing this requires quite a lot of work, because no one who builds these tools thought of these features ahead of time.

0 views

theory building without a mentor

NOTE: if you are just here for the how-to guide, click here to skip the philosophizing. Peter Naur wrote a famous article in 1985 called Programming as Theory Building . it has some excellent ideas, such as: programming must be the programmers’ building up knowledge of a certain kind, knowledge taken to be basically the programmers’ immediate possession, any documentation being an auxiliary product. solutions suggested by group B [who did not possess a theory of the program] […] effectively destroyed its power and simplicity. The members of group A [who did possess a theory] were able to spot these cases instantly and could propose simple and effective solutions, framed entirely within the existing structure. the program text and its documentation proved insufficient as a carrier of the most important design ideas i think this article is excellent, and highly recommend reading it in full. however, i want to discuss one particular idea Naur mentions: For a new programmer to come to possess an existing theory of a program it is insufficient that he or she has the opportunity to become familiar with the program text and other documentation. What is required is that the new programmer has the opportunity to work in close contact with the programmers who already possess the theory [...] program revival, that is reestablishing the theory of a program merely from the documentation, is strictly impossible. i do not think it is true that it is impossible to recover a theory of the program merely from the code and docs. my day job, and indeed one of my most prized skills when i interview for jobs, is creating a theory of programs from their text and documentation alone. this blog post is about how i do that, and how you can too. Naur also says in the article: “in a certain sense there can be no question of theory modification, only program modification” i think this is wrong: theory modification is exactly what Ward Cunningham describes as "consolidation" in his 1992 article on Technical Debt . i highly recommend the original article, but the basic idea is that over time, your understanding of how the program should behave changes, and you modify and refactor your program to match that idea. this happens in all programs, but the modification is easier in programs with little technical risk . furthermore, this theory modification often happens unintentionally over time as people are added and removed from teams. as ceejbot puts it : This is Conway’s Law over time. Teams are immutable: adding or removing a person to a team produces a different team. After enough change, the team is different enough that it no longer recognizes itself in the software system it produces. The result is people being vaguely unhappy about software that might be working perfectly well. i bring this up to note that you will never recover the same theory as the original programmers (at least, not without talking to them directly). the most you can do is to recover one similar enough that it does not require large changes to the program. in other words, you are creating a new theory of the program, and may end up having to adapt the program to your new theory. this is useful both when fixing bugs and when adding new features; i will focus on new features because i want to emphasize that these skills are useful any time you modify a program. for a focus on debugging, see Julia Evans' Pocket Guide to Debugging . this post is about creating theories at the "micro" level, for small portions of the program. i hope to make a post about the "macro" level in the future, since that's what really lets you start making design decisions about a program. i recently made a PR to neovim , having never worked on neovim before; i'll use that as an example going forward. i highly recommend following along with a piece of code you want to learn more about. if you don't have one in mind, i have hidden all the examples behind a drop-down menu, so you can try to apply the ideas on your own before seeing how i use them. the investigation i did in this blog post was based off neovim commit 57d99a5 . Click here to open all notes. to start off, you need an idea of what change you want to make to the program. almost always, programs are too large for you to get an idea of the whole program at once. instead, you need to focus on theory-building for the parts you care about, and only understand the rest of the program to the extent that the parts you care about interact with it. in my neovim PR, i cared about the command, which opens a file if it isn't loaded, or switches to the relevant buffer if it is. specifically i wanted to extend the "switch to the relevant buffer" part to also respect , so that i could pass it a line number. there are several ways to get started here. the simplest is just finding the relevant part of the code or docs—if you can provoke an error that's related to the part of the code you're changing, you can search for that error directly. often, knowing how execution reaches that state is very helpful, which you can do by getting a backtrace. you can get backtraces for output from arbitrary programs with liberal use of rr , but if you're debugging rustc specifically, there's actually a built-in flag for this, so you can just use . for , this didn't work: it was documented on neovim's site , but i didn't know a -specific error to search for. if this doesn't print an error message, or if it's not possible to get a recording of the program, things are harder. you want to look for something you already know the name of; search for literal strings with that name, or substrings that might form part of a template. for i searched for the literal string , since something needs to parse commands and it's not super common for it to be on its own in a string. that pulled up the following hits: looked promising, so i read the code around there. sometimes triggering the condition is hard, so instead i read the source code to reverse-engineer the stack trace. seeing all possible call sites of a function is instructive in itself, and you can usually narrow it down to only a few callers by skimming what the callers are doing. i highly recommend using an LSP for this part since the advantage comes from seeing all possible callers, not just most, and regex is less reliable than proper name resolution. it turned out that none of the code i found in my search was for itself, but i did find it was in a function named . had only one caller, . that was called by . the doc-comment on mentions that it parses the string, but i am not used to having documentation so i went up one level too far to . at that point, looking at the call site of , i realized i had gone too far because it was passing in the whole string of the Ex command line. i found a more relevant part of the code by looking at the uses of in : i got lucky - this was not actually the code i cared about, but the bit i did care about had a similar name, so i found it by searching for : from there i went to the definition of (in ) and found in that file: and from there found that the function i cared about was called . if i had been a little more careful, i could have found sooner with (this time without filtering out hidden files or limiting to the source directory). but this way worked fine as well. do mini experiments: if you see an error emitted in nearby code, try to trigger it so that you verify you're looking in the right place. when debugging, i often use process of elimination to narrow down callers: if an error would have been emitted if a certain code path was taken, or if there would have been more or less logging, i can be sure that code i am looking at was not run. the simplest experiment is just ; it's easy to notice and doesn't change the state of the program, and it can't fail. other experiments could include "adding custom logging" or "change the behavior of the function", which let you perform multiple experiments at once and understand how the function impacts its callers. for more complicated code, i like to use a debugger, which lets you see much more of the state at once. if possible, in-editor debuggers are really nice—vscode, and since recently, zed , have one built-in; for nvim i use nvim-dap-ui . you can also just use a debugger in a terminal. some experiments i like to try: for , i was quite confident i had found the right code, so i didn't bother with any experiments. there are other cases where it's more useful; i made an earlier PR to tmux where there were many different places search happened, so verifying i was looking at the right one was very helpful. specifically i added to the function i thought was the right place, since debug logging in tmux is non-trivial to access. i rarely use a debugger for adding new code; mostly i use it for debugging existing code. programs complicated enough that i need a debugger just to understand control flow usually have a client/server model that also makes them harder to debug, so i don't bother and just read the source code. reading source code is also useful for finding examples of how to use an API. often it handles edge cases you wouldn't know about by skimming, and uses helper functions that make your life simpler. your goal is to make your change as similar to the existing codebase as possible, both to reduce the risk of bugs and to increase the chance the maintainer likes your change. when i write new code, i will usually copy a small snippet from elsewhere in the codebase and modify it to my needs. i try to copy at most 10-15 lines; more than that indicates that i should try to reuse or create a higher-level API. once in , i skimmed the code and found a snippet looked like it was handling existing files: the bug here is not any code that is present; instead it's code that's missing. i had to figure out where was stored and how to process it. so, i repeated a similar process for . this time i had something more to start with - i knew the command structure was named and had type . looking at the definition of showed me what i wanted: looking for , i found (with a helpful comment saying it was responsible for ) which called , and in turn . looking at the callers of i found , which handles . has exactly the behavior i wanted for , so i copied its behavior: out of caution, i also looked at the other places in the function that handled , and it's a good thing i did, because i found this wild snippet above: i refactored this into a helper function and then called it from both the original command and my new code in . this works in much the same way. try to find existing tests by using the same techniques as finding the code you care about . read them; write them using existing examples. tests are also code, after all. test suites usually have better documentation than the code itself, since adding new tests is much more common than modifying any particular section of code; see if you can find the docs. i look for files, and if i don't find them i fall back to skimming the readme. sometimes there are is also a in the folder where the tests are located, although these tend to be somewhat out of date. i care a lot about iteration times, so i try and find how to run individual tests. that info is usually in the README, or sometime you can figure it out from the test command's output. run your tests! ideally, create and run your tests before modifying the code so that you can see that they start to pass after your change. tests are extra important when you don't already understand the code, because they help you verify that your new theory is correct. run existing tests as well; run those before you make changes so you know which failures are spurious (a surprisingly high number of codebases have flaky or environment-dependent tests). i started by looking for existing tests for : fortunately this had results right away and i was able to start adding my new test. had a pointer to which documented and . neovim has very good internal tooling and when my call failed it gave me a very helpful pointer to . hopefully this was helpful! i am told by my friends that i am unusually good at this skill, so i am interested whether this post was effective at teaching it. if you have any questions, or if you just want to get in contact, feel free to reach out via email . breaking at a function to make sure it is executed printing local variables setting hardware watchpoints on memory to see where something is modified (this especially shines in combination with a time-travel debugger ) programming is theory building . recovering a theory from code and docs alone is hard, but possible. most programs are too large for you to understand them all at once. decide on your goal and learn just enough to accomplish it. reading source code is surprisingly rewarding. match the existing code as closely as you can until you are sure you have a working theory.

0 views

technical debt is different from technical risk

the phrase "technical debt" at this point is very common in programming circles. however, i think the way this phrase is commonly used is misleading and in some cases actively harmful. here is a statement of the common usage by a random commenter on my fediverse posts : tech debt is [...] debt in the literal sense that you took a shortcut to get to your product state. You've saved time by moving quick, and are now paying interest by development slowing down. contrast this to the original statement of technical debt in Ward Cunningham's paper from 1992 : Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite. [...] The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation. Ward isn't comparing "shortcuts" or "move fast and break things" to "good code"—he's comparing iterative design (often called "agile") to the waterfall method 1 : The traditional waterfall development cycle has endeavored to avoid programming catastrophy by working out a program in detail before programming begins. [...] However, using our debt analogy, we recognize this amounts to preserving the concept of payment up-front and in-full. Finally, I want to quote a follow-up statement from Ward in 2006, which is closely related to Programming As Theory Building by Peter Naur: A lot of bloggers at least have explained the debt metaphor and confused it, I think, with the idea that you could write code poorly with the intention of doing a good job later and thinking that that was the primary source of debt. I'm never in favor of writing code poorly, but I am in favor of writing code to reflect your current understanding of a problem even if that understanding is partial. It seems pretty clear at this point that Ward is describing something different from the common usage (I think "technical debt" is a good term for Ward's original idea). What then can we call the common usage? I like technical risk . Whenever you modify a program's behavior, you incur a risk that you introduce a bug. Over time, as the code is used more, the number of bugs tends to decrease as you fix them. Two studies in 2021 and 2022 (one by the Android security team, one by Usenix security) found empirically that memory vulnerabilities decay exponentially over time. So you have an inherent tension between minimizing your changes so that your code gets less buggy over time and modifying your code so that your program becomes more useful. When people talk about "technical debt", what I am calling "technical risk", they mean "modifying the code has a very high risk"—any kind of modification has a high chance of introducing bugs, not only when adding new features but also when doing refactors and bug fixes. Even the most trivial changes become painful and time-consuming, and the right tail of your time distribution increases dramatically. Furthermore, when we say "this program has a lot of tech debt", we are implicitly arguing "the risk of a refactor is lower than the risk of it eventually breaking when we make some other change ". We are gambling that the risk of a refactor (either in time or breakage) is worth the decreased risk going forward. Note that you cannot overcome technical risk simply by spending more time; in this way it is unlike technical debt. With sufficient technical risk, simply predicting how long a change will take becomes hard. Due to the risk of regressions you must spend more time testing; but because of the complexity of the program, creating tests is also time-consuming, and it is less likely that you can test exhaustively, so there is a higher risk that your tests don't catch all regressions. Eventually changing the program without regressions becomes nearly impossible, and people fork or reimplement the program from scratch (what Ward describes as "the interest is total"). The common understanding of "tech debt" is that it only applies to programs that were built hastily or without planning. "tech risk" is much more broad than that, though. It also applies to old programs which no longer have anyone that understands their theory ; new code if it's sufficiently complicated (stateful, non-local, "just a complicated algorithm", etc); and large programs that are too big for any one person to understand in full. In fact, most code has some amount of risk, simply because it isn't usually worth making readability the number 1 priority (and readability differs from programmer to programmer). Hillel Wayne recently wrote a post titled Write the most clever code you possibly can . At one point he says this: I've also talked to people who think that datatypes besides lists and hashmaps are too clever to use, that most optimizations are too clever to bother with, and even that functions and classes are too clever and code should be a linear script. This is an extreme example, but it reflects a pattern I often see: people think any code that uses complicated features is "too clever" and therefore bad. This comes up a lot for "weird" syscalls or libc functions, like / and . I think this misses the point. What makes something technical risk is the risk , the inertia when you try to modify it, the likelihood of bugs. Having a steep learning curve is not the same as being hard to modify, because once you learn it once, future changes become easier. Once your risk is high enough, and if you don't have the option of reducing complexity, people tend to work around the risk with feature flags or configuration options. These flags avoid the new behavior altogether in the default case, such that "changing the program" is decoupled from "changing the behavior". In my experience this can be good in moderation—but if every new change requires a feature flag, and you never go back and remove old flags, then you're in trouble, because the flags themselves are adding complexity and risk. Each new change has to consider not just the default case, but all possible combinations of flags in a combinatorial explosion. You see this with things like tmux, OracleDB, and vim, all of which tend to accumulate options without removing them. Consider this quote from someone who claimed to work at Oracle: Sometimes one needs to understand the values and the effects of 20 different flag to predict how the code would behave in different situations. Sometimes 100s too! I am not exaggerating. The only reason why this product is still surviving and still works is due to literally millions of tests! This is an extreme case, but in my experience it is absolutely representative of what happens to sufficiently large codebases over time. Once things are this bad you are "locked in" to the feature flag model—there's too many to remove (and your users may be depending on them!), but you cannot understand the interactions of all their combinations, so you gate new changes behind more new flags just to be sure. this post is kinda scary! it tells a story of codebases that grow more and more bogged down over time, despite people's best efforts, until they eventually die because they can't be modified. i think things are not actually so dire as they seem. firstly, you always have the option to do ongoing refactors, reducing the risk of changes. with ongoing maintenance like this, even extremely complex programs can be maintained for years or decades; i think the rust compiler is a good example of such a program. secondly, rebuilding systems is good, actually, because it lets us learn from the lessons of the past. oracledb, tmux, and vim all have younger competitors (e.g. sqlite, zellij, and helix) that are more nimble. even more than that, new systems have the opportunity to be built on a different paradigm (e.g. sqlite runs in-process instead of as a daemon) with different tradeoffs. this is the classic case of disruptive innovation . to some extent, people or teams can get "locked in" to existing systems, especially if they are highly configurable or changing to a new system would be high risk for the organization (e.g. migrating to a new database is extremely high risk for almost anyone), but this can be mitigated by open file formats (such as sqlite's database file ) and backwards compatibility for the old options (such as in neovim). Actually, if you read the post more closely, he is saying something even more interesting: iterative development is only possible because his company is using a language (smalltalk) that has privacy boundaries. ↩ "technical debt" as commonly understood is different from its origins. the original "technical debt" referred to iterative design. the common meaning is about programs that are hard to change, and i refer to it as "technical risk". all programs have technical risk to greater or lesser degree; you can decrease it but never eliminate it altogether. once risk grows sufficiently high, changes become hard enough that they have to be gated behind feature flags. the program eventually stagnates and is rewritten. Actually, if you read the post more closely, he is saying something even more interesting: iterative development is only possible because his company is using a language (smalltalk) that has privacy boundaries. ↩

0 views