Latest Posts (20 found)

Letting Claude Play Text Adventures

The other day I went to an AI hackathon organized by my friends Lucia and Malin . The theme was mech interp , but I hardly know PyTorch so I planned to do something at the API layer rather than the model layer. Something I think about a lot is cognitive architectures (like Soar and ACT-R ). This is like a continuation of GOFAI research, inspired by cognitive science. And like GOFAI it’s never yielded anything useful. But I often think: can we scaffold LLMs with cog arch-inspired harnesses to overcome their limitations? LLM agents like Claude Code are basically “accidental” cognitive architectures: they are designed and built my practitioners rather than theorists, but they have commonalities, they all need a way to manage memory, tool use, a task agenda etc. Maybe building an agent on a more “principled” foundation, one informed by cognitive science, yields a higher-performing architecture. So I sat around a while thinking how to adapt Soar’s architecture to an LLM agent. And I sketched something out, but then I thought: how can I prove this performs better than baseline? I need an eval, a task. Math problems? Too one-shottable. A chatbot? Too interactive, I want something hands-off and long-horizon. A coding agent? That’s too freeform and requires too much tool use. And then I thought: text adventures ! You have a stylized, hierarchically-structured world accessible entirely tthrough text, long-term goals, puzzles, physical exploration and discovery of the environment. Even the data model of text adventures resembles frame-based knowledge representation systems. And there’s a vast collection of games available online. Anchorhead , which I played years ago, is a Lovecraft-inspired text adventure by Michael S. Gentry. It takes on the order of hundrds of turns to win across multiple in-game days. And the game world is huge and very open. In other words: a perfect long-horizon task. So I started hacking. The frotz interpreter runs on the command line and has a “dumb” interface called , which takes the ncurses fluff out, and gives you a very stripped command-line experience. It looks like this: It is easy to write a little Python wrapper to drive the interpreter through and : Now we can play the game from Python: send commands, get game output. Now we need the dual of this: a player. The trivial harness is basically nothing at all: treat the LLM/game interaction like a chat history. The LLM reads the game output from the interpreter, writes some reasoning tokens, and writes a command that is sent via to the interpreter. And this works well enough. Haiku 4.5 would mostly wander around the game map, but Sonnet 4.5 and Opus 4.5 manage to solve the game’s first puzzle—breaking into the real estate office, and finding the keys to the mansion—readily enough. It takes about ~200 turns for Claude to get to the second in-game day. The way I thought this would fail is: attention gets smeared across the long context, the model gets confused about the geometry of the world, its goal and task state, and starts confabulating, going in circles, etc. As usual, I was outsmarting myself. The reason this fails is you run out of credits. By the time you get to day two, each turn costs tens of thousands of input tokens. No good! We need a way to save money. Ok, let’s try something that’s easier on my Claude credits. We’ll show Claude the most recent five turns (this is the perceptual working memory), and give it a simple semantic memory: a list of strings that it can append entries to, and remove entries from using tool use. This keeps the token usage down: The problem is the narrow time horizon. With the trivial harness, Claude can break into the real estate office in ~10 turns, and does so right at the start of the game. With this new harness, Claude wanders about the town, taking copious notes, before returning to the real estate office, and it spends ~40 turns fumbling around with the garbage cans before managing to break into the real estate office. The next step, after getting the keys to the house, is to meet your husband Michael at the University and head home. Claude with the trivial harness takes about ~100 turns to find the house, with some tangential wandering about the town, and reaches day two around turn 150. Claude, with the memory harness, took ~250 turns just to get the keys to the house. And then it spends hundreds of turns just wandering in circles around the town, accumulating redundant memories, and hits the turn limit before even finding the house. Anchorhead is a long, broad game, and from the very beginning you can forget the plot and wander about most of the town. It takes a long time to see if a run with an agent goes anywhere. So I thought: I need something smaller. Unsurprisingly, Claude can make its own games. The Inform 7 package for NixOS was broken (though Mikael has fixed this recently) so I had to use Inform 6 . I started with a trivial escape-the-room type game, which was less than 100 lines of code and any Claude could beat it less than 10 turns. Then I asked for a larger, multi-room heist game. This one was more fun. It’s short enough that Claude can win with just the trivial harness. I tried a different harness, where Claude has access to only the last five turns of the game’s history, and a read-write memory scratchpad. And this one was interesting. First, because Claude only ever adds to its own memory, it never deletes memories. I thought it would do more to trim and edit its scratchpad. Second, because Claude become fixated on this red-herring room: a garden with a well. It kept going in circles, trying to tie a rope to the well and climb down. Because of the limited game history, it only realized it was stuck when it saw that the most recent ~20 entries it wrote to its memories related to various attempts to go down the well. Then I watched Claude walk away from the garden and solve the final puzzle, and hit the turn limit just two turns short of winning. Tangent: I wonder if models are better at playing games created by other instances of the same model, by noticing tiny correlations in the text to infer what puzzles and obstacles they would have written. In the end I abandoned the “small worlds” approach because the games are too stylized, linear, and uninteresting. Anchorhead is more unwieldy, but more natural. I have a bunch of ideas I want to test, to better learn how harness implementations affect performance. But I’m short on time, so I’m cutting it here and listing these as todos: The repository is here . Domain-Specific Memories: Claude’s notes are all jumbled with information on tasks, locations, etc. It might be better to have separate memories: a todo list, a memory of locations and their connections, etc. This is close to the Soar approach. Automatic Geography: related to the above, the harness can inspect the game output and build up a graph of rooms and their connections, and format it in the context. This saves Claude having to note those things manually using a tool. Manual Geography: the automatic geography approach has a few drawbacks. Without integration into the Z-machine interpreter, it requires some work to implement (parsing the currente location from the output, keeping track of the command history to find standard travel commands e.g. ) but isn’t 100% deterministic, so that mazes and dynamic rooms (e.g. elevators) will confuse the system. So, instead of doing it manually, we could give Claude a tool like . Episodic Memory: this feels like cheating, but, at the end of a run, you can show Claude the session transcript and ask it to summarize: what it accomplished and how, where it failed and why. Including a short walkthrough for how to get to the “last successful state”. This allows future runs to save time in getting up to speed.

0 views

1Password Dependency Breaks Syntax Highlighting

Earlier today I noticed the syntax highlighting on this website was broken. But not fully: on reload I’d see a flash of highlighted text, that then turned monochrome. The raw HTML from showed rouge tags, but the web inspector showed raw text inside the elements. This didn’t happen in Chromium. My first thought was: there’s malformed HTML, and Firefox is recovering in a way that loses the DOM inside tags. Then I noticed it doesn’t happen in incognito. Turning my extensions off one by one, I found that 1Password is responsible. Others ( 1 , 2 ) have reported this also. If you extract the latest XPI , unzip it, and dig around, you’ll find they’re using Prism.js , a JavaScript syntax highlighter. I don’t know why a password manager needs a syntax highlighter. I imagine it has to do with the app feature where, if you have an SSH key, you can open a modal that tells you how to configure Git commit signing using. Maybe they want to highlight the SSH configuration code block (which is unnecessary anyways, since you could write that HTML by hand). But I can’t know for sure. Why write about this? Because 1Password is a security critical product, and they are apparently pulling random JavaScript dependencies and unwittingly running them in the tab context , where the code has access to everything. This is no good. I don’t need to explain how bad a supply-chain attack on the 1Password browser extension would be. I like 1Password and I was sad when Apple Sherlocked them with the Passwords app, but this is a bad sign about their security practices.

0 views

Using the Brother DS-640 Scanner on NixOS

The DS-640 is a compact USB scanner from Brother . It was surprisingly hard to get it working on NixOS, so I wrote up my solution so others don’t have this problem. The bad news is you need Brother’s proprietary drivers to make this work. You need this configuration: After applying this you have to log out and in, or reboot, for the usergroup changes to apply. Note also : if you use (as I did initially), the scanner will kind of work, but it only scans the first third or so of every page. And if you want a GUI: Now, make sure the scanner is there: If you get , you either have the wrong driver or (as I did, surprisingly) a faulty USB port. In which case move the scanner to another port. should recognize the model number. The most basic test that should work: put a page in the scanner until it locks and run: This will produce a (probably not very good) scan in . Now, we can improve things using the device-specific options, which you can check with this command: Try this for a better scan: Note that some of the flags are in format and others , and if you mess it up you get a cryptic error message.

0 views

Books I Enjoyed in 2025

The Apocalypse of Herschel Schoen by nostalgebraist . A revelation (ἀποκάλυψις = “unveiling”) told through the eyes of a developmentally-disabled teenager. You will never guess where it goes. This came across my desk because I really enjoyed The Northern Caves , which is both a great horror story and an evocation of the Internet forum culture of the late 2000’s. Algebraic Models for Accounting Systems . I like anything along the lines of, “let’s take a technical field that formed its ontology, vocabulary, methods etc. before modern mathematics, and set it on a modern, algebraic, formal foundation”. And this is that, for accounting. It is a pleasant read. Confessions of a Mask by Yukio Mishima . The artist’s confession. “I had decided I could love a girl without feeling any desire whatsoever”. Paul and Virginia by Jacques-Henri Bernardin de Saint-Pierre . Published in 1788, very sentimental, but I think it helped me to get in the mindset of late 17th century French society: the bucolic, Rousseauist kick, the whole “simplicity of nature” thing. This landed on my reading list because many, many years ago I read a Cordwainer Smith story called Alpha Ralpha Boulevard , and I read somewhere that the characters in the story, Paul and Virginia, were an allusion to Paul et Virginie . De Monarchia by Dante . This is another “get into the mindset of another century” book. It’s interesting because it’s written like a logical, geometric proof: there’s modus ponens and modus tollens and case analysis and proof by contradiction. But the axioms are very eclectic: quotations from various Virgil, Plato, Livy, Cicero, Thomas Aquinas et al. and Dante’s private interepretation of bits from the Bible. The theorem he wants to prove is that to attain the highest development of humanity, the whole world must be unified into a world-state ran by the Holy Roman Emperor. Building SimCity: How to Put the World in a Machine by Chaim Gingold . Nominally an oral history of the development of SimCity. That’s how he gets you. Then the trap is sprung, and you are given a history of cybernetics, WW2 fire control systems, cellular automata, artificial life, computation, Vannevar Bush, pedagogy, cognition, the World3 model, The Limits to Growth , Forrester’s system dynamics . “Unexpectedly Borgesian technical book” is one of my favourite genres. Antigone by Sophocles , in the translation of Robert Fagles . “Don’t fear for me. Set your own life in order”. The Education of Cyrus by Xenophon . I’m not sure what to make of it, honestly, but when I have the time I want to read Leo Strauss ’s lectures on Xenophon, where he expounds on the hidden meaning of the text. Borrowed Time: An AIDS Memoir by Paul Monette . The author’s account of caring for his partner who was dying of AIDS in the 80’s, while he himself was actively dying from AIDS. Frightful. The author died just a few years before HAART therapy became available. The Slave by Isaac Bashevis Singer . Singer is unique. I don’t know quite how to characterize it. His writing is very disarming and innocent without being sentimental, he is earnest and free of cynicism. A love story in 17th century Poland, after the Khmelnytsky pogroms . It’s very magical realist, in a good way, not in the hysterical sense. The world is shot through with the supernatural, but the inner lives of the characters oscillate between religious awe and a very contemporary cynicism. Dream Story by Arthur Schnitzler . The inspiration for Eyes Wide Shut . I was surprised by how much of the movie, that I thought was mostly Kubrick’s invention, is actually from the story. It’s a great mood piece: you can feel the cold of early morning in Vienna, and see the paving stones, and the gas lamps, and the carriages disappearing in the fog. The Cyberiad by Stanisław Lem . I like Lem when he’s serious ( Solaris , His Master’s Voice ) and not so much when he’s doing satire ( The Futurological Congress ) so when I picked this up years ago and saw that it was a collection of fairy tales I put it away. I tried again this year and found I actually enjoyed it, but some of the later stories go on for far too long. I think The Seventh Sally is the one everyone likes. The Magician of Lublin by Isaac Bashevis Singer . Another Singer, this time in 19th century Poland. A rake is punished by God. Short and fun. I like that Singer doesn’t write giant doorstoppers, so that quality per page is high. Mephisto by Klaus Mann . A socialist actor in interwar Germany saves his career by making friends with the Nazis. I was surprised by how Randian it was: the characters are divided into two disjoint categories, the Good, who are upper middle class, burgeois people, or aristocrats from old and noble families, and the Bad, who are vulgar, parvenus, thugs, and boors. It’s kind of ironic to think people become Nazis because of bad breeding. What Is Life? by Erwin Schrödinger . Before modern crystallography, NMR, DFT etc. people had to learn about the nanoscale through clever reasoning. Schrödinger uses the limited knowledge of the day to set up a constraint system, and finds the solution: genetic information is stored in an aperiodic, covalently-bonded crystal, and he even estimates the physical volume of the genome from experiments relating mutation rates to X-ray exposure. Satan in Goray by Isaac Bashevis Singer . Another Singer, back in the 17th century, this one is more fire and brimstone, and it’s about a historical episode I had not heard about until the last few pages of The Slave : the case of Sabbatai Zevi , a Jewish mystic who, at one point, had most of the Jewish world convinced he was the messiah. This happened in the year 1666. The novel is about what it’s like, phenomenologically, to live in a remote village in 1600’s Poland. How do you know anything about the world? People come in, from time to time, traders, and they have news, but the news are just words that come out of their mouth. And you have to interrogate them, ask questions, compare notes. Like living in a Pacific island. Has the messiah come? Is there such a place as the Ottoman Empire? Is there even a world outside Poland? Tog on Interface by Bruce Tognazzini . A book about interface design from 1992. A lot of the advice is good, and a lot of it is interesting for the historical context, and the constraints people worked with in the past. One aspect I found interesting: how many products and companies are mentioned of whose existence I can find little to no evidence today. This makes the hoarder in me sad. This one across my desk because I read a blog post implementing one of the UI ideas from the book. Term Rewriting and All That by Franz Baader and Tobias Nipkow . I feel that I understand what computation is now. Indistinguishable From Magic by Robert L. Forward . If you’ve spent years steeped in Orion’s Arm then most of the ideas in the book will not be new to you. But they were new once. And it’s interesting to read a book and think: this is where starwisps and launch loops all come from. The Shadow of the Torturer by Gene Wolfe . Surreal and a pleasure to read. Knowledge Representation: Logical, Philosophical, and Computational Foundations by John F. Sowa . Delightful, particularly the early bits about the history of logic, and many chapters explaining the work of Peirce and Whitehead on ontology. I have not finished reading this book, but I am in the first few pages of A Shorter Model Theory by Wilfrid Hodges , and I am delighted. The very first exercise in the book involves a formalization of Aquinas’ account of the trinity.

0 views

Coarse is Better

When DALL-E came out, it took me a couple of weeks to pick my jaw up from the floor. I would go to sleep excited to wake up to a full quota, with a backlog of prompts to try. It was magical, miraculous. Like discovering a new universe. I compiled the best art in this post . The other day a friend ran some of my old prompts through Nano Banana Pro (NBP), and put the old models side by side with the new. It’s interesting how after years of progress, the models are much better better at making images, but infinitely worse at making art. Electron contours in the style of Italian futurism, oil on canvas, 1922, trending on ArtStation. The old Midjourney v2 renders this: NBP renders this: Admiteddly MJ’s output doesn’t look quite like futurism. But it looks like something . It looks compelling. The colours are bright and vivid. NBP’s output is studiously in the style of Italian futurism, but the colours are so muted and dull. Maybe the “trending on ArtStation” is a bit of an archaism and impairs performance. Let’s try again without: Painting of an alley in the Kowloon Walled City, Eugène Boudin, 1895, trending on ArtStation. MJ gave me this: And it looks nothing like the Kowloon Walled City . But it’s beautiful . It’s coarse, impressionistic, vague, evocative, contradictory. It’s brimming with mystery. And it is, in fact, in the style of Eugène Boudin . This, by contrast, is the NBP output: Sigh. It looks like every modern movie: so desaturated you feel you’re going colourblind. Let’s try forcing it: Painting of an alley in the Kowloon Walled City, Eugène Boudin, 1895. Make it coarse, impressionistic, vague, evocative, contradictory, brimming with mystery. This is somewhat better, but why is it so drab and colourless? Is the machine trying to make me depressed? Attar and Ferdowsi in a dream garden, Persian miniature, circa 1300, from the British Museum. Midjourney v2: It doesn’t quite look like anything. But it is beautiful, and evocative. I like to imagine that little splotch of paint on the upper right is hoopoe . The NBP output: Well, it looks like a Persian miniature . The “from the British Museum” bit, I meant that to be interpreted evocatively, rather than literally. The prompt cites a fictional object, bringing it into the existence. But NBP reads this as: no, this is a photograph of a Persian miniature in the British Museum. The Burning of Merv by John William Waterhouse, 1896, from the British Museum. Midjourney v2: It does look like Waterhouse . Semantically there’s room to argue: it looks like a woman being burnt at the stake, not the sack of a city. But aesthetically: it’s gorgeous. The flames are gorgeous, the reds of the dress are gorgeous. Look at the reeds in the background, and the black water, that looks like tarnished silver or pewter. The faces of the crowd. Is that a minotaur on the lower left, or a flower? What is she holding on her bent left arm? A crucifix, a dagger? You could find entire universes in this image, in this 1024x1024 frame. By contrast, this is the NBP output: What can one say? It doesn’t look like Waterhouse. The horsemen wear Arab or Central Asian dress, but Merv was sacked in the year 1221 by the Mongol Empire . And, again, the “British Museum” line is taken literally rather than evocatively. Portrait of Ada Lovelace by Dante Gabriel Rossetti, 1859, auctioned by Christie’s. Midjourney: This is beautiful. It is beautiful because the coarse, impressionistic brushstroke is more evocative than literal. And it actually looks like a woman drawn by Rossetti . And look at the greens! Gorgeously green. The palette is so narrow, and the painting is so beautiful. The NBP output: Pure philistinism. “Auctioned by Christie’s”, again, is meant to be evocative: “this is the kind of painting that would be sold at auction”. But NBP makes it a photograph of a painting at an auction house. Fine, I suppose I got what I asked for. But the woman doesn’t look like Rossetti! This is absurd. How can a model from 2022 get this right, and the SOTA image generation model gives us generic oil painting slop? A Persian miniature of the cosmic microwave background, from Herat circa 1600, trending on ArtStation Midjourney v2: Again: what can one say? Dream Story, 1961, blurry black and white photograph, yellow tint, from the Metropolitan Museum of Art. This is one of my favourite DALL-E 2 outputs: They remind me of The King in Yellow . I love these because of how genuinely creepy and mysterious they are. You could pull a hundred horror stories from these. It is hard to believe how bad the NBP output is: What are we doing here? The old models were beautiful and compelling because the imperfections, vagueness, mistakes, and contradictions all create these little gaps through which your imagination can breathe life into the art. The images are not one fixed, static thing: they can be infinitely many things. The new models—do I even need to finish this sentence? They’re too precise and high-resolution, so they cannot make abstract, many-faced things, they can only make specific, concrete things. We need to make AI art weird again.

0 views
Fernando Borretti 1 months ago

I Wish People Were More Public

Probably not a popular thing to say today. The zeitgeisty thing to say is that we should all log off and live terrible cottagecore solarpunk lives raising chickens and being mindful. I wish people were more online and more public. I have rarely wished the opposite. Consider this post addressed to you, the reader. I will often find a blog post on Hacker News that really resonates. And when I go to check the rest of the site there’s three other posts. And I think: I wish you’d write more! When I find someone whose writing I really connect with, I like to read everything they have written, or at least a tractable subset of their most interesting posts. If I like what I see, I reach out. This is one of the best things about writing online: your future friends will seek you out. And, from the other side, I have often written a post where, just before publishing, I would think: “who would want to read this? It’s too personal, obscure, idiosyncratic, probably a few people will unsubscribe to the RSS feed for this”. And always those are the posts where people email me to say they always thought the same thing but could never quite put it into words. I really value those emails. “I am understood” is a wonderful feeling. I try to apply a rule that if I do something, and don’t write about it—or otherwise generate external-facing evidence of it—it didn’t happen. I have built so many things in the dark, little experiments or software projects or essays that never saw the light of day. I want to put more things out. If it doesn’t merit an entire blog post, then at least a tweet. If I follow you on Twitter, and you have posted a picture of your bookshelf, I have probably scanned every book in it. This is why I appreciate Goodreads . Like many people I have been reading a lot less over the past ~5y, but since I made a Goodreads account earlier this year, I’ve read tens of books. Reading in public has helped to motivate me. You may say reading in public is performative. I say reading in private is solipsistic. Dante, in De Monarchia , writes: All men on whom the Higher Nature has stamped the love of truth should especially concern themselves in laboring for posterity, in order that future generations may be enriched by their efforts, as they themselves were made rich by the efforts of generations past. For that man who is imbued with public teachings, but cares not to contribute something to the public good, is far in arrears of his duty, let him be assured; he is, indeed, not “a tree planted by the rivers of water that bringeth forth his fruit in his season,” [ Psalms 1:3 ] but rather a destructive whirlpool, always engulfing, and never giving back what it has devoured. My default mode is solipsism. I read in private, build in private, learn in private. And the problem with that is self-doubt and arbitrariness. I’m halfway through a textbook and think: why? Why am I learning geology? Why this topic, and not another? There is never an a priori reason. I take notes, but why tweak the LaTeX if no-one, probably not even future me, will read them? If I stop reading this book, what changes? And doing things in public makes them both more real and (potentially) useful. If you publish your study notes, they might be useful to someone. Maybe they get slurped up in the training set of the next LLM, marginally improving performance. And Goodreads, for all its annoyances, is a uniquely tender social network. Finishing a book, and then seeing a friend mark it as “want to read”, feels like a moment of closeness. I have a friend who lived in Sydney, who has since moved away, and we don’t keep in touch too often, because the timezones are inconvenient, but occasionally she likes my book updates, and I like hers, and I will probably never read that avant-garde novel, but I’m glad she is reading it. It is like saying: “You exist. I exist. I remember. I wish you happiness.” Lots of people use spaced repetition , but most everyone’s flashcard collections are private. They exist inside a database inside an app like Anki or Mochi . You can export decks, but that’s not a living artifact but a dead snapshot, frozen in time. One reason I built hashcards : by using a Git repo of Markdown files as the flashcard database, you can trivially publish your deck to GitHub. My own flashcard collection is public. I hope that more people use hashcards and put their decks up on GitHub. The point is not that you can clone their repos (which is close to useless: you have to write your own flashcards) but because I’m curious what people are learning. Not the broad strokes, since we all want to learn thermo and econ and quantum chemistry and the military history of the Song dynasty and so on, but the minutiae. Why did you make a flashcard out of this Bible passage? Why does it resonate with you? Why do you care about the interpretation of that strange passage in Antigone ? Why did you memorize this poem? Computers mediate every aspect of our lives, yet most people use their computers the way they came out of the box. At most they might change the desktop background. Some people don’t even change the default icons on the macOS dock. Even most Linux users just use the stock configuration, e.g. GNOME on Fedora or whatever. I’m interested in people who customize their experience of computing. This is often derided as “ ricing ”. But agency is interesting. People who remake their environment to suit them are interesting. And I am endlessly curious about how people do this. I like reading people’s , their custom shell scripts, their NixOS config. It’s even better if they have some obscure hardware e.g. some keyboard layout I’ve never heard of and a trackball with custom gestures. I put my dotfiles up on GitHub because I imagine someone will find them interesting. And beyond my selfish curiosity there’s also the Fedorovist ancestor simulation angle: if you die and are not cryopreserved, how else are you going to make it to the other side of the intelligence explosion? Every tweet, blog post, Git commit, journal entry, keystroke, mouse click, every one of these things is a tomographic cut of the mind that created it.

0 views
Fernando Borretti 2 months ago

Ad-Hoc Emacs Packages with Nix

You can use Nix as a package manager for Emacs, like so: Today I learned you can also use it to create ad-hoc packages for things not in MELPA or nixpkgs . The other day I wanted to get back into Inform 7 , naturally the first stack frame of the yak shave was to look for an Emacs mode. exists, but isn’t packaged anywhere. So I had to vendor it in. You can use git submodules for this, but I have an irrational aversion to submodules. Instead I did something far worse: I wrote a Makefile to download the from GitHub, and used home-manager to copy it into my . Which is nasty. And of course this only works for small, single-file packages. And, on top of that: whatever dependencies your vendored packages need have to be listed in , which confuses the packages you want, with the transitive dependencies of your vendored packages. I felt like the orange juice bit from The Simpsons . There must be a better way! And there is. With some help from Claude, I wrote this: Nix takes care of everything: commit pinning, security (with the SHA-256 hash), dependencies for custom packages. And it works wonderfully. Armed with a new hammer, I set out to drive some nails. Today I created a tiny Haskell project, and when I opened the file, noticed it had no syntax highlighting. I was surprised to find there’s no in MELPA. But coincidentally, someone started working on this literally three weeks ago ! So I wrote a small expression to package this new : A few weeks back I switched from macOS to Linux, and since I’m stuck on X11 because of stumpwm , I’m using XCompose to define keybindings for entering dashes, smart quotes etc. It bothered me slightly that my file didn’t have syntax highlighting. I found in kragen’s repo , but it’s slightly broken (it’s missing a call at the end). I started thinking how hard it would be to write a Nix expression to modify the source after fetching, when I found that Thomas Voss hosts a patched version here . Which made this very simple: Somehow the version of in nixpkgs unstable was missing the configuration option to use a custom shell. Since I want to use nu instead of bash, I had to package this myself from the latest commit: I started reading Functional Programming in Lean recently, and while there is a , it’s not packaged anywhere. This only required a slight deviation from the pattern: when I opened a file I got an error about a missing JSON file, consulting the README for , it says: If you use a source-based package-manager (e.g. , Straight or Elpaca), then make sure to list the directory in your Lean4-Mode package recipe. To do this I had to use rather than :

0 views
Fernando Borretti 2 months ago

Linux on the Fujitsu Lifebook U729

This post describes my experience using Linux on the Fujitsu Lifebook U729 . The tl;dr is that it’s a delightful laptop, and Linux runs flawlessly, and all the hardware things I’ve needed run OOTB. The only difficulty I had was in disabling Secure Boot, but I figured out how to do it, which I explain below. From early 2024 my daily driver was an M2 MacBook Air, until earlier this year I broke the screen, and the repair was quoted at almost 1000 AUD. Since I used it as a desktop most of the time, this didn’t affect me much. After some flip-flopping I decided to get an M4 Mac mini. Partly for the faster CPU and more RAM, but partly because I liked the idea of LARPing like it’s the 2000s, when computers, and by extension the Internet, where fixed in physical space, rather than following everyone around. Of course this was a terrible idea. I had three working computers—a Linux+Windows desktop, a Mac Mini, and a MacBook Air that I could use as a desktop—and none of them were portable. When I went to RustForge 2025 I just brought my phone. If I wanted to travel, even within Sydney, to a demo night or math club or some such, I didn’t have a laptop to bring with me. So I needed a new laptop. And the Tahoe release of macOS was so ugly (see e.g. 1 , 2 , 3 ) it made me boot up the old Linux desktop, and start playing around with NixOS again. And I fell in love with Linux again: with the tinkering and the experimentation and the freedom it affords you. So, I wanted a Linux laptop. I had a ThinkPad X1 some years ago and it was terribly: flimsy plastic build and hardware that vastly underperformed its price. I looked around for old, refursbished workstation laptops, and, randomly, I ran into an eBay seller offering a refurbished Fujitsu laptop. The specs/price ratio was pretty good: 16 GiB of RAM and 512GiB of SSD, all for 250 AUD. And it was 12in and 1.1kg, which I like: laptops should be small and lightweight. But the thing that got me, in all honesty, was the brand. “Fujitsu laptop” sounds like colour in a William Gibson novel: “crawling into the avionics bay, Case took out a battered Fujitsu refurb, and stuck a JTAG port in the flight computer—”. I already use NixOS and a trackball and a mechanical keyboard , so a laptop that’s even more obscure than a ThinkPad is perfect for me. And it was only 250 AUD. So I got it. The only problem I had was disabling Secure Boot in order to install Linux. Otherwise: I love it. It’s small and lightweight, feels solid, the keyboard is good, all the hardware works out of the box with NixOS, and the battery life is pretty good. This section describes the problems I encountered. I tried to install Linux the usual way, when I was greeted by this: Going into the BIOS, the option to disable Secure Boot was greyed out. I tried a bunch of random bullshit: wiping the TPM, disabling the TPM. That didn’t work. What did work was this: First, install Windows 11. This came with the laptop. And the installation makes installing Linux feel easy: I had to do so many weird tricks to avoid having to create an account with Microsoft during the installation. Once Windows is installed, go into Windows Update. Under “Advanced Options > Optional Updates”, there should be an option to install Fujitsu-specific drivers. Install those. And for good measure, do a general Windows update. There should be a program called DeskUpdate on the Desktop. This is the Fujitsu BIOS update tool. Run this and go through the instructions: this should update the BIOS (the ordering seems to be important: first update the Fujitsu firmware through Windows Update, then the BIOS through DeskUpdate). Reboot and go into the BIOS (F2). You should have a new BIOS version. In my case, I went from BIOS 2.17 to 2.31 which was released on 2025-03-28: You now have the option to disable Secure Boot: After this, I was able to install NixOS from a live USB: The laptop comes with this corporate spyware thing called Absolute Persistence . It’s some anti-theft tracking device. Since the Lifebook is typically an enterprise laptop, it makes sense that it comes with this type of thing. I only noticed this because I was searching the BIOS thoroughly for a way to disable Secure Boot. The good news is disabling it is pretty straightforward: you just disable it in the BIOS. As I understand it, Absolute Persistence requires an agent running in the OS, so the BIOS support, by itself, doesn’t do anything once disabled. The following work flawlessly OOTB: Things I have not tested: To enter the BIOS: smash until you hear the beep. No need to hold down the key. To enter the boot menu: as above but with . Troubleshooting Secure Boot Non-Problems Sound (using PipeWire ) Display brightness control (using brightnessctl ) Touchscreen (I didn’t realize the screen was actually a touchscreen until I touched it by accident and saw the mouse move) Webcam (not winning any awards on quality, but it works) Fingerprint sensor Fujitsu product page ( archive.org ) Data sheet (PDF)

0 views
Fernando Borretti 2 months ago

Agda on NixOS

To install Agda and its standard library, add this to your config: Or, using home-manager : The here stands for the package set. Note that the following will not work: If you use Emacs, you probably want , which if you use , can be installed using Nix: Now, if you have a file with: So far, so good. Using the standard library however is more complicated. If you have a file with: Then will not work: Instead, create a file in the same directory with: Now will succeed.

0 views
Fernando Borretti 3 months ago

Hashcards: A Plain-Text Spaced Repetition System

hashcards is a local-first spaced repetition app, along the lines of Anki or Mochi . Like Anki, it uses FSRS , the most advanced scheduling algorithm yet, to schedule reviews. The thing that makes hashcards unique: it doesn’t use a database. Rather, your flashcard collection is just a directory of Markdown files, like so: And each file, or “deck”, looks like this: You write flashcards more or less like you’d write ordinary notes, with lightweight markup to denote basic (question/answer) flashcards and cloze deletion flashcards. Then, to study, you run: This opens a web interface on , where you can review the flashcards. Your performance and review history is stored in an SQLite database in the same directory as the cards. Cards are content-addressed, that is, identified by the hash of their text. This central design decision yields many benefits: you can edit your flashcards with your editor of choice, store your flashcard collection in a Git repo, track its changes, share it on GitHub with others ( as I have ). You can use scripts to generate flashcards from some source of structured data (e.g. a CSV of English/French vocabulary pairs). You can query and manipulate your collection using standard Unix tools, or programmatically, without having to dig into the internals of some app’s database. Why build a new spaced repetition app? Mostly because I was dissatisfied with both Anki and Mochi. But also, additionally, because my flashcards collection is very important to me, and having it exist either in some remote database, or as an opaque unusable data blob on my computer, doesn’t feel good. “Markdown files in a Git repo” gives me a level of ownership that other approaches lack. The rest of this post explains my frustrations with Anki and Mochi, and how I landed on the design decisions for hashcards. Anki was the first SR system I used. It’s open source, so it will be around forever; it has a million plugins; it was the first SR system to use FSRS for scheduling. It has really rich stats, which I think are mostly useless but are fun to look at. And the note types feature is really good: it lets you generate a large number of flashcards automatically from structured data. The central problem with Anki is that the interface is really bad. This manifests in various ways. First, it is ugly to look at, particularly the review screen. And this diminishes your enjoyment of what is already an often boring and frustrating process. Second, doing simple things is hard. A nice feature of Mochi is that when you start the app you go right into review mode. You’re drilling flashcards before you even realize it. Anki doesn’t have a “study all cards due today”, rather, you have to manually go into a deck and click the “Study Now” button. So what I would do is put all my decks under a “Root” deck, and study that. But this is a hack. And, third: card input uses WYSIWYG editing. So, you’re either jumping from the keyboard to the mouse (which increases latency, and makes flashcard creation more frustrating) or you have to remember all these keybindings to do basic things like “make this text a cloze deletion” or “make this TeX math ”. Finally, plugins are a double-edged sword. Because having the option to use them is nice, but the experience of actually using most plugins is bad. The whole setup feels janky, like a house of cards. Most of the time, if a feature is not built into the app itself, I would rather live without it than use a plugin. Mochi feels like it was built to address the main complaint about Anki: the interface. It is intuitive, good looking, shortcut-rich. No jank. Instead of WYSIWYG, card text is Markdown: this is delightful. There’s a few problems. While Markdown is a very low-friction way to write flashcards, cloze deletions in Mochi are very verbose. In hashcards, you can write this: The equivalent in Mochi is this: This is a lot of typing. And you might object that it’s only a few characters longer. But when you’re studying from a textbook, or when you’re copying words from a vocabulary table, these small frictions add up. If writing flashcards is frustrating, you’ll write fewer of them: and that means less knowledge gained. Dually, a system that makes flashcard creation as frictionless as possible means more flashcards, and more knowledge. Another problem is that Mochi doesn’t have an equivalent of Anki’s note types . For example: you can make a note type for chemical elements, with fields like atomic number, symbol, name, etc., and write templates to generate flashcards asking questions like: And so on for other properties. This is good. Automation is good. Less work, more flashcards. Mochi doesn’t have this feature. It has templates , but these are not as powerful. But the biggest problem with Mochi, I think, is the algorithm. Until very recently , when they added beta support for FSRS, the algorithm used by Mochi was even simpler than SM-2 . It was based on multipliers : remembering a card multiplies its interval by a number >1, forgetting a card multiplies its interval by a number between 0 and 1. The supposed rationale for this is simplicity: the user can reason about the algorithm more easily. But I think this is pointless. The whole point of an SR app is the software manages the schedule for you, and the user is completely unaware of how the scheduler works. The optimality is to have the most advanced possible scheduling algorithm (meaning the one that yields the most recall for the least review time) under the most intuitive interface possible, and the user just reaps the benefits. Obviously without an RCT we can’t compare Mochi/ SM-2 /FSRS, but my subjective experience of it is that the algorithm works well for the short-term, and falters on the long-term. It’s very bad when you forget a mature card: if a card has an interval of sixty days, and you click forget, you don’t reset the interval to one day (which is good, because it helps you reconsolidate the lost knowledge). Rather, the interval is multiplied by the forget multiplier (by default: 0.5) down to thirty days . What’s the use? If I forgot something after sixty days, I surely won’t have better recall in thirty. You can fix this by setting the forget multiplier to zero. But you have to know this is how it works, and, crucially: I don’t want to configure things! I don’t want “scheduler parameter finetuning” to be yet another skill I have to acquire: I want the scheduler to just work . In general, I think spaced repetition algorithms are too optimistic. I’d rather see cards slightly more often, and spend more time reviewing things, than get stuck in “forgetting hell”. But developers have to worry that making the system too burdensome will hurt retention. In Anki, it’s the interface that’s frustrating, but the algorithm works marvelously. In Mochi, the interface is delightful, but it’s the algorithm that’s frustrating. Because you can spend months and months drilling flashcards, building up your collection, but when the cards cross some invisible age threshold, you start to forget them, and the algorithm does not help you relearn things you have forgotten. Eventually I burned out on it and stopped doing my reviews, because I expected to forget everything eventually anyhow. And now they added support for FSRS, but by now I have 1700 cards overdue. Additionally: Mochi has only two buttons, “Forgot” and “Remembered”. This is simpler for the user, yes, but most SR scheduling algorithms have more options for a reason: different degrees of recall adjust the card parameters by different magnitudes. What do I want from a spaced repetition system? The first thing is: card creation must be frictionless. I have learned that the biggest bottleneck in spaced repetition, for me, is not doing the reviews (I am very disciplined about this and have done SR reviews daily for months on end), it’s not even converting conceptual knowledge into flashcards, the biggest bottleneck is just entering cards into the system. The surest way to shore up your knowledge of some concept or topic is to write more flashcards about it: asking the same question in different ways, in different directions, from different angles. More volume means you see the same information more often, asking in different ways prevents “memorizing the shape of the card”, and it acts as a kind of redundancy: there are multiple edges connecting that bit of knowledge to the rest of your mind. And there have been many times where I have thought: I would make this more solid by writing another flashcard. But I opted not to because the marginal flashcard is too effortful. If getting cards into the system involves a lot of friction, you write fewer cards. And there’s an opportunity cost: the card you don’t write is a concept you don’t learn. Integrated across time, it’s entire oceans of knowledge which are lost. So: the system should make card entry effortless. This was the guiding principle behind the design of the hashcards text format. For example, cloze deletions use square brackets because in a US keyboard, square brackets can be typed without pressing shift (compare Mochi’s curly brace). And it’s one bracket, not two. Originally, the format was one line per card, with blank lines separating flashcards, and question-answer cards used slashes to separate the sides, like so: And this is strictly less friction. But it creates a problem for multi-line flashcards, which are common enough that they should not be a second-class citizen. Eventually, I settled on the current format: Which is only slightly more typing, and has the benefit that you can easily visually identify where a card begins and ends, and what kind of card it is. I spent a lot of time arguing back and forth with Claude about what the optimal format should be. Another source of friction is not creating the cards but editing them. The central problem is that your knowledge changes and improves over time. Often textbooks take this approach where Chapter 1 introduces one kind of ontology, and by Chapter 3 they tell you, “actually that was a lie, here’s the real ontology of this subject”, and then you have to go back and edit the old flashcards to match. Because otherwise you have one card asking, e.g., for the undergraduate definition of some concept, while another asks you for the graduate-level definition, creating ambiguity. For this reason, when studying from a textbook, I create a deck for the textbook, with sub-decks for each chapter. That makes it easy to match the flashcards to their source material (to ensure they are aligned) and each chapter deck only has a few tens of cards usually, keeping them navigable. Sometimes you wrote multiple cards for the same concept, so you have to update them all at once. Finding the related ones can be hard if the deck is large. In hashcards, a deck is just a Markdown file. The cards immediately above and below a card are usually semantically related. You just scroll up and down and make the edits in place. But why plain-text files in a Git repo? Why not use the above format, but in a “normal” app with a database? The vague idea of a spaced repetition system where flashcards are stored as plain-text files in a Git repo had been kicking around my cranium for a long time. I remember asking an Ankihead on IRC circa 2011 if such a thing existed. At some point I read Andy Matuschak’s note on his implementation of an SR system. In his system, the flashcards are colocated with prose notes. The notation is similar to mine: and tags for question-answer cards, and for cloze deletions. And the cards are content-addressed: identified by their hash. Which is an obviously good idea. But his code is private and, besides, I feel that prose notes and flashcards are very different beasts, and I don’t need or want them to mix. But I think the idea of plain-text spaced repetition got bumped up the priority queue because I spontaneously started using a workflow that was similar to my current hashcards workflow. When studying from a textbook or a website, I’d write flashcards in a Markdown file. Usually, I used a shorthand like for cloze deletions. Then I’d use a Python script to transform the shorthand into the notation used by Mochi. And I’d edit the flashcards in the file, as my knowledge built up and my sense of what was relevant and important to remember improved. And then, when I was done with the chapter or document or whatever, only then, I would manually import the flashcards into Mochi. And it struck me that the last step was kind of unnecessary. I was already writing my flashcards as lightly-annotated Markdown in plain-text files. I had already implemented FSRS out of curiosity. I was looking for a personal project to build during funemployment. So hashcards was by then a very neatly-shaped hole that I just needed to paint inside. It turns out that using plain-text storage has many synergies: The result is a system where creating and editing flashcards is nearly frictionless, that uses an advanced spaced repetition scheduler, and which provides an elegant UI for drilling flashcards. I hope others will find it useful. What is the atomic number of [name]? What element has atomic number [number]? What is the symbol for [name]? What element has symbol [symbol]? You can edit the cards using whatever editor you use, build up a library of card-creating macros, and navigate the collection using the editor’s file browser. You can query and update the collection using standard Unix tools, or a programming language, e.g. using to get the total number of words in the collection, or using to make a bulk-update to a set of cards. You can use Git for version control. Git is infinitely more featureful than the change-tracking of any SR app: you can edit multiple cards in one commit, branch, merge, use pull requests, etc. You can make your flashcards public on GitHub. I often wish people put more of themselves out there: their blog posts, their dotfiles, their study notes. And why not their flashcards? Even if they are not useful to someone else, there is something enjoyable about reading what someone else finds interesting, or enjoyable, or worth learning. You can generate flashcards using scripts (e.g., turn a CSV of foreign language vocabulary into a deck of flashcards), and write a Makefile to tie the script, data source, and target together. I do this in my personal deck. Anki’s note types don’t have to be built into hashcards, rather, you can DIY it using some Python and make.

0 views
Fernando Borretti 5 months ago

Adding Planets to Celestia on macOS

tl;dr: you have to modify the application bundle. Celestia is a space simulator: you can fly around space and look at moons and exoplants, fast forward time. It is sometimes used by sci-fi artists for worldbuilding because you can easily add new stars/planets/megastructures/spacecraft. Some people have built whole virtual worlds for storytelling in Celestia. The Orion’s Arm collaborative worldbuilding project has a collection of Celestia addons so you can explore the world of the year 10,000 AT. But the documentation is sparse and old. As with many things: the biggest hurdle to starting is just knowing which files go in which directories. Celestia uses (solar system catalogue) files to define planets. These are plain-text files with a syntax resembling HCL . Let’s create baby’s first planet: below is a minimal file that adds a planet “Alpha” around the star Gliese 555 : Now, what you would hope is that there exists a standard directory like you can put this into. I spent a lot of time looking through old docs and source code for this, and I’m writing this so others don’t have to. Unfortunately, at least on macOS, you have to modify the application bundle itself. This feels morally wrong, but it works. Save the above code as , and execute: Open Celestia, and navigate to Gliese 555 (press enter, type “Gliese 555”, press enter, press g). You should see a new planet: Zooming in, you can see it’s using the built-in asteroid texture: To verify it’s reading the right file, press tilde and use the arrow keys to scroll up the logs, and you should see a line like: Celestia traverses the directory recursively, so you can put your files inside folders to organize large worldbuilding projects.

0 views
Fernando Borretti 7 months ago

Notes on Managing ADHD

The pleasure is in foreseeing it, not in bringing it to term. — Jorge Luis Borges, Selected Non-Fictions This post is about managing ADHD. It is divided into two sections: “Strategies” describes the high-level control system, “Tactics” is a list of micro-level improvements (really it should be called “stratagems”, since most are essentially about tricking yourself). High-level advice, control systems. ADHD has a biological cause and drugs are the first-line treatment for good reasons. There is no virtue in trying to beat it through willpower alone. The first-line treatment for ADHD is stimulants. Everything else in this post works best as a complement to, rather than as an alternative to, stimulant medication. In fact most of the strategies described here, I was only able to execute after starting stimulants. For me, chemistry is the critical node in the tech tree: the todo list, the pomodoro timers, etc., all of that was unlocked by the medication. Some people can’t tolerate a specific stimulant. But there are many stimulant and non-stimulant drugs for ADHD. I would prefer to exhaust all the psychiatric options before white-knuckling it. A lot of people don’t want to take medication for shame-based reasons. There is a lot of pill-shaming in the culture. You must learn to ignore it: we are automata, our minds are molecules in salt water. As a motivating example for the “salt water automaton” view: I struggled with sleep hygiene for a long time. It felt like WW1: throwing wave after wave of discipline at it and always failing. I would set an alarm, for, say, 10pm, that said: it is time to go to bed. How many times did I obey it? Never. I was always doing something more important. What fixed it? Melatonin. I have an alarm that goes off at 8pm to remind me to take melatonin. The point of the alarm is not, “now you must log off”, which is a very discipline-demanding task. The point of the alarm is simply: take this pill. It takes but a moment. Importantly, I’m not committing to anything other than taking a pill. Thirty, forty minutes later, I want to sleep. That is the key thing: the melatonin has changed my preferences. And then I don’t need willpower to close the sixteen Wikipedia tabs or whatever, because I want to sleep more than I want to scroll, or watch YouTube. The broader perspective here is that personal growth is a dialogue between internal changes and external changes. Internal changes might come from medication, meditation, therapy, coaching, or practicing habits for a long enough time. External changes are the scaffolding around the brain: using a todo list, and using it effectively. Using a calendar. Clearing your desk so you don’t get distracted by things. Journaling, so that you can introspect and notice patterns: which behaviours leads to a good workday, and which behaviours lead to a day being wasted. Are internal changes more important? Kind of. It’s more a back and forth, where internal changes unlock external changes which unlock further internal changes. Here’s an example: you (having undiagnosed ADHD) try to set a schedule, or use a todo list, or clean your bed every day, but it doesn’t stick. So you get on medication, and the medication lets you form your first habit: which is using a todo list app consistently, checking it every morning. Then, with the todo list as a core part of your exocortex, you start adding recurring tasks, and forming other simple habits: you have a daily recurring task to make your bed, and so every morning when you check the todo list, you see the task, and make your bed, and in time, with your now-functioning dopamine system, you make a habit to make your bed every day, such that you no longer need to have that in the todo list. So the timeline is: Taking Ritalin with no plan for what you will do today/tomorrow/this week doesn’t work. Dually, an ambitious todo list will sit idle if your brain won’t let you execute it. So personal growth comes from using both internal and external changes, like a ladder with alternating left-right steps. A todo list is a neuroprosthesis that augments long-term memory for tasks. I use Todoist on my desktop and my phone. The pro plan is worth it. I don’t really think of it as an app, rather, it’s a cognitive prosthesis. The todo list provides three things: Of these, the most important is memory. The todolist is an action-oriented long term memory prosthesis. This is especially useful for habit formation: my biggest blocker with forming habits was just remembered that I’d committed to doing something. If you think, i will make the bed every day, you might do it today, tomorrow, and by the third day you forget. You’re failing by simply forgetting to show up, which is a sad way to fail. By making something a recurring task on the todo list, it ensures I will see it. In a sense, the todo list turns many habits into one. You don’t need to remember “I will make my bed every day”, “I will floss my teeth every night”, etc., because the todolist remembers all those things for you. You only need to form a single habit: checking the todo list. Analogously, I often fail to finish projects simply because I forget about them. I start reading a book, but I don’t write it down anywhere (say, in Goodreads) that “I’m reading this book” is something I have committed to. I leave the book on a table where it’s out of sight (and therefore out of mind) for all of my waking hours. I glance at it occasionally and think, oh, yeah, I was reading that book, and then I’m distracted by something else. And weeks later, when I’ve already started another book, I notice the first book, with the bookmark on page 20, abandoned. The todolist prevents this failure mode: you create a project to represent reading the book, and that project is now tracked, and when you open the todo list, you can see it in the list of active projects. In Todoist, every task is part of a project (which really should just be called a list). My sidebar looks like this: Tasks is the list for ad-hoc tasks. Mostly chores and things that don’t fit in elsewhere. Unload the dishwasher, reply to this email, etc. The only rule for this list is that everything in it must be scheduled. Groceries is self-explanatory. Ideas is the where every half-formed goal, intention, project idea etc. goes. “Go deeper into metta” and “learn how to use the slide rule” and “go penguin watching in Manly” and “write a journalling app” and “learn PLT Redex ”. I put these things here so that they don’t live in my brain. And occasionally I go through the list and promote something into an actual, active project. Blog is like the ideas list specifically ideas for blog posts. Reading List is for media I want to consume. This is divided into: fiction books, non-fiction books, technical books, blog posts, papers, games, films. Cycles is for recurring tasks. This one is divided into sections by period: daily, weekly, and above. The daily recurring tasks are things like “take vitamin D”, “meditate”, and the inbox-clearing task. Projects is a container for actual projects: an objective which takes multiple tasks to accomplish. Why lift projects into lists? Why not just use a top-level task to represent the project’s objective, and nested subtasks to represent the execution steps of the project? Because having the project in the sidebar is one mechanism I use to ensure I don’t forget about it. Every time I glance at the todo list, I can see the list of active projects. I can notice if something has not been worked on for a while, and act on it. Otherwise: out of sight, out of mind. The difficulty class of the tasks you can perform declines throughout the day. There are many metaphors for the concept of mental energy. Spoon theory , for example. The usual metaphor is that “mental energy” is like a battery that is drained through the day, in greater and lesser quantities, and is replenished by sleep. To me, energy is less like a battery and more like voltage. Some machines require a threshold voltage to operate. Below that voltage they don’t just operate slower, they don’t operate at all. Analogously, different categories of activity have different threshold voltages. For me, it’s like this: And when I wake up I have the highest possible voltage, and throughout the course of the day the voltage declines. And that’s the key difference from spoon theory: spoons are fungible across time, voltage is not. For each category of activity, there is a span of the day when I can action it. When I wake up, I do my morning routine, get some quick wins, and then I try to tackle the thing I dread the most, as early in the morning as possible, because that’s the time of day when I have the most energy and self-control. I get that done and I move on. (Another reason to do the dreaded tasks first: if you put it off to, say, late morning, well, why not put it off again? And again and again. And then it’s 7pm and you can’t even think about the task, and it’s late, and I don’t have energy, so I couldn’t even do it if I wanted to, so let’s do it tomorrow.) And then, when I have removed that burden, I work on projects. The creative, generative, intellectual things. The things that move some kind of needle, and aren’t just pointless chores. And when I run out of energy to create, I read. And when I run out of energy to read, I clean and go to the gym and do the other things. And when the sun goes down everything starts to unravel: I have zero energy and the lazy dopamine-seeking behaviour comes out. So I take melatonin, and try to be in bed before the instant gratification monkey seizes power. Typology of procrastination, approaches. In my ontology there are three types of procrastination: This is the easiest kind to address. The solution is pharmacological treatment for ADHD + having a productivity system and some tricks. This one is harder. The good thing is you know, cognitively, what you have to do. The hard part is getting over the aversion. In the short term, the way to fix this is to do it scared. Accept the anxiety. Asking for help also works, sometimes you just need someone in the room with you when you hit send on the email. You can also use techniques like CBT to rationally challenge the source of the anxiety and maybe overcome it. In the long term: write down the things you procrastinate one due to anxiety, and find the common through-line, or the common ancestor. By identifying the emotional root cause, you can work on fixing it. And this is the hardest, because you don’t know, cognitively, what the right choice is, and also you probably have a lot of anxiety/aversion around it. Many things in life are susceptible to this: you have set of choices, there’s good arguments for/against each one, and you have a lot of uncertainty as to the outcomes. And so you ruminate on it endlessly. I don’t have a good general solution for this. Talking to people helps: friends, therapists, Claude. This works because thinking by yourself has diminishing returns: you will quickly exhaust all the thoughts you will have about the problem, and start going in circles. Often people will bring up options/considerations I would never have thought of. Sometimes, if you’re lucky, that’s all it takes: someone mentions an option you had not considered and you realize, oh, it was all so simple. One thing to consider is that thinking in your head is inherently circular, because you have a limited working memory, and you will inevitably start going in circles. Writing things down helps here. Treat the decision, or the emotions behind it, like an object of study, or an engineering problem. Sit down and write an essay about it. Name the arguments, number the bullet points, refer back to things. Make the thoughts into real, physical, manipulable entities. Journaling is good for detecting maladaptive patterns and tracking your progress. I keep a hierarchical journal in Obsidian . Hierarchical because I have entries for the days, weeks, months, and years. The directory tree looks like this: In the morning I finish yesterday’s journal entry, and begin today’s. Every Sunday I write the review of the week, the first of each month I write the review of the previous month, the first of each year I review the past year. The time allotted to each review is in inverse proportion to its frequency: so a monthly review might take an hour while a yearly review might take up a whole morning. The daily reviews are pretty freeform. Weekly and above there’s more structure. For example, for the weekly reviews I will write a list of the salient things that happened in the week. Then I list on what went well and what went poorly. And then I reflect on how I will change my behaviour to make the next week go better. Journaling is a valuable habit. I started doing it for vague reasons: I wasn’t sure what I wanted to get out of it, and it took a long time (and long stretches of not doing it) until it became a regular, daily habit. I’ve been doing it consistently now for three years, and I can identify the benefits. The main benefit is that to change bad patterns, you have to notice them. And it is very easy to travel in a fix orbit, day in, day out, and not notice it. Laying it out in writing helps to notice the maladaptive coping mechanisms. Reading back over the journal entries helps you notice: when an event of type X happens, I react with Y. Today’s journal entry is a good default place for writing ad-hoc notes or thoughts. Often I wanted to write something, but didn’t know where I would file it (how do you even file these little scraps of thought?) and from not knowing where to put it, I would not do it. Nowadays I just begin writing in the journal. Later, if it is valuable to file it away, I do so. Creating a journal entry in the morning is a good opportunity to go over the goals and priorities for the day and explicitly restate them to myself. The final benefit is retrospection: I can look at the past and see how my life has changed. And this is often a positive experience, because the things that worried me didn’t come to pass, the things I used to struggle with are now easy, or at least easier. There’s a paradox with productivity: when you grind executive function enough, things that you used to struggle with become quotidian. And so what was once the ceiling becomes the new floor. You no longer feel proud that you did X, Y, Z because that’s just the new normal. It’s like the hedonic treadmill. You might feel that you never get to “productive”. Journaling helps to combat this because you can see how far you’ve come. Manage time at the macro level with calendars, at the micro level with timers. To manage time, you need a calendar (macro) and a timer (micro). At the macro level, I use the calendar very lightly. Mostly for social things (to ensure I don’t forget an event, and that I don’t double-book things). I also use it to schedule the gym: if the goal is to lift, say, five times a week, I schedule five time blocks to lift. Lifting is special because it has a lot of temporal constraints: But outside these two categories, my calendar is empty. The calendar might be useful to you as a self-binding device. If you keep dragging some project along because you “haven’t made time” for it: consider making a time block in the calendar, and sticking to it. Creating a calendar event is, literally, making time: it’s like calling . Some people use the calendar as their entire todo list. I think this kind of works if your todo list is very coarse grained: “buy groceries” and “go to the dentist”. But I have a very fine-grained todo list, and putting my tasks in the calendar would make it overwhelming. Another problem with calendars is they are too time-bound: if I make a calendar block to do something, and I don’t do it, the calendar doesn’t know it. It just sits there, forgotten, in the past. In a todo list, everything gets dragged along until I explicitly complete it. Along the same lines, the calendar is not good for collecting vague ideas and plans for things you want to do in the future, while todo lists are ideal for this. The problem with todo lists is that they’re timeless: there is no sense of urgency. You look at the list and think, I could do the next task now, or in five minutes, or in an hour. There’s always some time left in the day. Or tomorrow. You need a way to manufacture urgency. If you have ADHD you’ve probably heard of the Pomodoro method, tried it, and bounced off it. The way it’s framed is very neurotypical: it’s scaffolding around doing , but ADHD people often have problems with the doing itself. And so the scaffolding is kind of pointless. The method works well in three kinds of contexts: Overcoming Aversion: when you have a large number of microtasks, each of which takes a few seconds to a few minutes, but the number of them, and the uncertainty factor, makes the sum seem a lot larger. A classic example for me is having to reply to like ten different people. Realistically, each person can be handled in 15s. One or two might require a couple of minutes to compose a longer reply. But often I will avoid those tasks like the plague and drag them across the entire day. The pomodoro method works here because you’re basically trading (up to) 25m of pain for an entire day’s peace and quiet. So you get all the annoying little tasks together, start a timer, and go through them. And usually you’re done in maybe ten minutes. And you feel really good after, because all those annoying little tasks are done. It really is amazing what a little bit of fake urgency can do. Starting: sometimes the problem is just starting. It is very trite, but it’s true. You have something you want to want to do, but don’t want to do. I want to want to read this book, to learn this topic, to write this blog post, to work on this software project. But I don’t want to do it. The pomodoro method helps you start. You’re not committing to finishing the project. You’re not committing to months or weeks or days or even hours of work. You’re committing to a half hour. And if you work just that half hour: great, promise kept. 30m a day, over the course of a single month, is 15h of work. And often I start a 30m timer and end up working four hours, and maybe that’s a good outcome. Stopping: dually, sometimes the problem is stopping. If you’re trying to advance multiple projects at the same time, if you hyperfocus on one, it eats into the time you allocated for the others. And more broadly, spending too much time on one project can derail all your plans for the day. Maybe you meant to go to the gym at 6pm but you got so stuck in with this project that it’s 8:30pm and you’re still glued to the screen. So the gym suffers, your sleep schedule suffers, etc. Actually stopping when the pomodoro timer goes off can prevent excessive single-mindedness. Additionally, the five-minute break at the end of the pomodoro block is useful. It’s a time to get up from the computer, unround your shoulders, practice mindfulness, essentially, all those little things that you want to do a few times throughout the day. Stratagems, tricks. To select the next task, pick either the shortest or the most-procrastinated task. I don’t like the word “prioritize”, because it has two subtly different meanings: “Weak prioritization” is something everyone should do: it takes a moment to go over the todo list and drag the tasks into more or less the order in which you will do them. This keeps the most relevant tasks near the top, which is where your eyes naturally go to. “Strong prioritization” is a terrible job scheduling algorithm. Importance alone is not good enough. Consider the case where you have a very important task A which takes a long time to finish, and a less important task B which takes 5m to finish. For example, writing an essay versus replying to an email. Which should you do first? I would execute B first, because doing so in turn unblocks B’s successor tasks. If you reply to the email and then get to work on task A, the other person has time to read your email and reply to you. And the conversation moves forward while you are otherwise engaged. Of course, the pathological version of this is where you only action the quick wins: all the minute little chores get done instantly, but the big tasks, requiring long periods of concentration, get postponed perpetually. My task-selection algorithm is basically: do the shortest task first, with two exceptions: To remember something, put it in your visual field. Dually: to forget, get it out of sight. Out of sight, out of mind. The corollary: to keep something in mind, put it in your visual field; to keep it out, leave it out. My desk is very spartan: there’s a monitor, a mouse, and a keyboard, and a few trinkets. My desktop is empty. There are no files in it. The dock has only the apps I use frequently. And at a higher level, I try to keep the apartment very clean and orderly. Because everything that’s out of place is a distraction, visual noise. That’s the negative aspect: the things I remove. The positive aspect, the things I keep in my visual field: most of the time, I have two windows open on my computer the todo list occupies the left third of the screen, the right two-thirds are occupied by whatever window I have open at the time, e.g.: And so at a glance, I can see: Keep in regular contact with long-running projects. A common failure mode I have is, I will fail to finish a project because I forget I even started it. Or, relatedly: I will let a project drag on and on until enough time has passed that my interests have shifted, the sun has set on it, and it is now a slog to finish. One reason I do this is that creative/intellectual work often requires (or feels like it requires) long stretches of uninterrupted time. So I procrastinate working on something until I can find such a chunk of time. Which never comes. Time passes and the project begins to slip the moorings of my attention, as other new and shiny things arrive. And sometimes I will pick the project back up after months or years, and I have lost so much context, it’s impossible to know what I even intended. And then you procrastinate even more, because you don’t want to feel the guilty of picking up a project and realizing it has become strange and unfamiliar to you. One way to combat this is to make regular project checkins. This could be a daily or few-times-a-week recurring task on Todoist that just says “spend 30m on this project”. You don’t even have to work on the thing: just allocate fifteen minutes to hold the project in your mind and nothing else. If it’s creative writing, you might open the Word document and just look at it. If it’s a programming project: read the Jira board and look at the code again. Don’t write anything. Just read the code. You will likely come up with a few tasks to do, so write those down. Think. Plan. Build up the structures in your mind, refresh the caches. If you can do, do, otherwise, plan, and if you can’t even do that, read. When you’re doing this regularly, when you’re in regular contact with the project, when the shape of it is clear in your mind, you will have the tasks on the top of your mind, you will no longer feel that you need a giant empty runway of time to work on it, you will be able to work on it in shorter chunks. To manage long-term creative work, keep in regular contact. That doesn’t mean work on them every day, but maybe look at them every day. The pomodoro method works here. Set a timer for just 25m to keep in touch with the project. Bring all tasks, broadly defined, into one todo list. Life is full of inboxes: These are inboxes because they fill up over time and need action to empty. You can also think of them as little domain-specific task lists. “Centralizing your inboxes” means moving all these tasks from their silos into the one, central todo list. For example, I have a daily task called “catch up” to clear the digital inboxes: In this way I mostly manage to stay on top of comms. All inboxes should be at zero. You have probably heard of inbox zero. It sounds like LinkedIn-tier advice. But if you struggle with comms, with replying to people in a timely manner (or at all), inbox zero is a good strategy. There are two reasons, briefly: And, like everything: before you make it into a habit, it feels incredibly time-consuming and labour-intensive. But once you make it into a habit, it’s almost effortless. So, I will give you an example. I come in to work, and read four emails. Three could’ve been archived outright, one needed a reply from me. And I said, oh, I’ll get to it in a second. And then I got distracted with other tasks. And throughout the day I kept glancing at the email client, and thinking, yeah, I will get to it. Eventually I got used to those four emails: they are the “new normal”, and what’s normal doesn’t require action. I would think: if those emails are there, and I already looked at them, then it’s probably fine. At the end of the day I looked at the inbox again and saw, wait, no, one of those emails was actually important. That’s the failure mode of inbox greater-than-zero: the important stuff hides among the irrelevant stuff, such that a quick glance at the todo list doesn’t show anything obviously wrong. Dually, with inbox zero, if you see a single email in the inbox, you know there’s work to do. Inbox zero removes ambiguity. If there’s anything in the inbox, you know, unambiguously, you have a task to complete. If there is nothing in the inbox, you know, unambiguously, there is nothing to do. Inbox zero frees you from false negatives, where you think you’ve handled your correspondence but there’s some important email, camouflaged among the trivial ones, that has not been replied to. A problem with doing inbox zero is most communication apps (like Discord, Slack, iMessage etc.) don’t have a concept of an inbox, just the read/unread flag on conversations. Since there’s no separation between the inbox and the archive, it takes more discipline to ensure every conversation is replied to. If an inbox is overwhelmed, archive it in a recoverable way. By the time I started to become organized I’d already accumulated thousands of bookmarks, unread emails, files in my downloads folder, papers in my physical inbox, etc. It would have been a Herculean effort to file these things away. So I didn’t. All the disorganized files, I wrapped them up in a folder and threw them in my folder. Emails? Archived. Bookmarks? Exported to HTML, archived the export, and deleted them from the browser. Ideally you should do this once, at the start. And by archiving things rather than deleting them, you leave open the possibility that as some point in the future, you might be able to action some of those things. Triage the old bookmarks, sort your filesystem, etc. Bring aversion-causing tasks into an environment that you control. If you’re averse to doing something, for emotional reasons, one way to overcome the aversion is to do it as much as possible on your own terms. An example: you have to fill out some government form. You’re averse to it because you worry about making a mistake. And just the thought of opening the form fills you with dread. So, take the boxes in the form, and make a spreadsheet for them. If fonts/colours/emojis/etc. if that makes it feel more personal, or like something you designed and created. Then fill out the form in the spreadsheet. And then copy the values to the form and submit. This helps because instead of performing the task in this external domain where you feel threatened, you’re performing the task in your own domain, in your own terms. Another example: you have an email you have to reply to, and you’re anxious about it. Just opening the email client gives you a bad feeling. Instead, try composing the email elsewhere, say, in a text editor. The change of environment changes the emotional connotation: you’re not replying to an email, you’re writing a text. You might even think of it as a work of fiction, a pseudoepigraphy. Turn off notifications, check comms as an explicit task. “Interrupts” means notifications, which arrive at unpredictable and often inconvenient times. “Polling” means manually checking the source of the notifications for things to action. The obvious benefit of replacing interrupts with polling is you don’t get interrupted by a notification. The less obvious benefit is that when notifications are smeared throughout the day, it is easy for them to fall through the cracks. Something comes in when you’re busy, and you swipe it away, and forget about it, and realize days later you forgot to respond to an important message. Polling is focused: you’ve chosen a block of time, you’re committed to going through the notifications systematically. Instead of random islands of interruptions throughout the day, you have a few short, focused blocks of going through your notifications. Often I get an email while I’m on my phone and think, well, I can’t reply, typing on mobile is horrible, I’m on a train, etc. Polling usually happens at my desk so I have no excuses: I’m in the right environment and in the right mental state. This is so trite. “Put your phone on Do Not Disturb and silence notifications”. And yet it works. For a long time I resisted this because I aspire to be the kind of person who gets a message and replies within minutes. But I didn’t notice how much notifications were impairing my focus until one day I accidentally put the phone/desktop on DND and had a wonderfully productive, distraction-free day. Get someone to sit next to you while you work. If you’re struggling to work on something, work next to another person. Set a timer and tell them what you’re going to accomplish and when the timer ends tell them how you did. Just being around other people can make it easier to overcome aversion. This is why coworking spaces are useful. If you don’t have a person around, you might try Focusmate . It works for some people . Sometimes I’ll start a conversation with Claude, lay out my plans for the day, and update Claude as I do things. If I’m stuck, or if I need help overcoming procrastination, I can ask Claude for help, and it’s easier to do that in an on-going thread because Claude already has the necessary context, so I don’t have to describe what I’m struggling with ab initio . Separate planning from action, so if you get distracted while acting, you can return to the plans. Separating planning from doing can be useful. Firstly because planning/doing require different kinds of mental energy. When you’re too tired to do, you can often still plan. Secondly because by separating them you can look back and see how useful the plan was, how much you stuck to it, and then get better at planning. Thirdly, and most importantly, because for ADHD people doing can be a source of distractions that impair other tasks. From Driven to Distraction : The first item on the list referred to a cough drop. As I read it, I asked her about it. “Oh,” she answered, “that is about a cough drop someone left on the dashboard of our car. The other day I saw the cough drop and thought, I’ll have to throw that away. When I arrived at my first stop, I forgot to take the cough drop to a trash can. When I got back into the car, I saw it and thought, I’ll throw it away at the gas station. The gas station came and went and I hadn’t thrown the cough drop away. Well, the whole day went like that, the cough drop still sitting on the dashboard. When I got home, I thought, I’ll take it inside with me and throw it out. In the time it took me to open the car door, I forgot about the cough drop. It was there to greet me when I got in the car the next morning. […] It was such a classic ADD story that I’ve come to call it the “cough drop sign” when a person habitually has trouble following through on plans on a minute-to-minute, even second-to-second, basis . This is not due to procrastination per se as much as it is due to the busyness of the moment interrupting or interfering with one’s memory circuits . You can get up from your chair, go into the kitchen to get a glass of water, and then in the kitchen forget the reason for your being there. Emphasis mine. When I notice a micro-task like this, my instinct is not to do it, but to put it in the todo list. Then I try to do it immediately. And if I get distracted halfway through, it’s still there, in the todo list. A practical example is something I call the apartment survey. When I clean the apartment, I start by walking around, noticing everything that needs fixing, and creating a little task for it. Even something as simple as “move the book from the coffee table to the bookshelf”. But I don’t start anything until the survey is done. And when the survey is done, I execute it. And if I get distracted halfway through cleaning the apartment, I have the tasks in the list to go back to. Introspect to find the things that ruin your productivity and avoid them. Through introspection you can discover the behaviours that derail your productivity. Lifting in the morning derails the day. Cardio is fine, but if I lift weights in the morning, the rest of the day I’m running on -40 IQ points. The most cognitively demanding thing I can do is wash the dishes. I’m not sure what the physiology is: maybe it’s exhaustion of the glycogen stores, or fatigue byproducts floating around in my brain, or the CNS is busy rewiring the motor cortex. The point is that I try to do the cognitively-demanding things in the morning and lift in the evening. Motion also does this. I suppose it’s the H in ADHD: hyperactivity. I used to be a big pacer: put on headphones, pace my room back and forth daydreaming for hours and hours. Some days I would pace so much my legs were sore. To think, I have to be in motion. But sometimes I’ve thought enough, and it’s time to do. Music, too, derails me. If I start listening to music very soon I start pacing the room and it’s over. Music is almost like reverse methylphenidate: it makes me restless, mentally hyperactive, and inattentive. So, to be productive I have to not move too much, and be in silence, and not have fried my brain with exercise. If being organized makes you feel good, spend more on organizing your productivity system. In a sense, having a really complex productivity system is like trying to use neuroticism to defeat ADHD, to use high neuroticism to defeat low conscientiousness. There’s an element of truth to that, sure (see mastery of drudgery). But here’s the thing: you have to play to your strengths. You have to. If you like order and systems and planning but you struggle with doing, then, yeah, it might work, for you, to spend more energy on the trappings of productivity (ensuring your todo list is properly formatted, organized, etc.) if that bleeds over into making it easier to do the real, meaningful things. For example: I like emojis in my todo list. The chores have a 🧼 emoji, the comms tasks have an ✉️ emoji. That kind of thing. Makes it easy to see at a glance what kind of things I have to do, to group them by category. But Todoist doesn’t support emoji icons on tasks, unlike Notion, so adding the emojis takes a bit more effort: I have to open Raycast and search for the emoji I want and paste it into the task title. It adds a little friction each time I create a task, but the benefit is I enjoy using the todo list more. Avoid spending too much productive time on worthless chores. A productivity antipattern: indulging too much in “quick wins”. There’s this running joke, or meme, online, about the kind of person who has this huge, colossal productivity system, but they get nothing done. They have five todo list apps and everything is categorized and indexed and sorted, but their material output is zero. They complete a hundred tasks a day and when you interrogate what those tasks are they are “brush my teeth” or “reorganize my bookshelf”. There’s a lot of truth to that. Every task falls into one of two categories: the quick wins, and everything else. Life is not made of quick wins. Creative, generative, open-ended work requires long periods of focused work. A lot of unpleasant, aversion-causing things have to be done. But the quick wins are infinite: there’s always some micro-chore to do around the house, for example. I don’t have advice specifically on avoiding this. But you should notice if you’re doing it and course-correct. Don’t let procrastiation on one task derail everything else. A bad failure mode I have is: I have a task T that I have to do, but I can’t, because of some kind of aversion. But when I try to work on other things, the alarms are going off in my head, telling me to work on T because you’ve been putting this off for so long and life is finite and the years are short and all that. The end result is that because one thing is blocked, everything grinds to a halt. It’s a very annoying state to be in. And I don’t have a perfect solution, but I try to manage it but applying a sense of proportionality, “render unto Caesar” etc. You can’t ignore T forever, dually, you probably won’t solve it in the next ten minutes. But you can timebox T : allocate some block of time every day to try to advance it, or at least to work around it, e.g. to ask a friend for help, for example. And the rest of the day you can dedicate to moving other things forward. Calculate travel time ahead of time to avoid being late. I am chronically late. So if I have a calendar event like a party at someone’s home, I will go on Google Maps and measure the travel time (from my home or wherever I’m likely to be) to the destination, and make a time block for that. e.g., if it takes 30m to go to the dentist and back, this is what my calendar looks like: This ensures I leave my home on time. If it’s something especially important I often add 15m to the travel block as a buffer. Use tools that are effective and you like. What productivity app should I use? Reminders? Linear? Todoist? A bullet journal? Use something that feels good and works. That’s all. Personally I use Todoist. A lot of people think todo list apps are commodities, but when you have an app open for 98% of your screentime, the little subtleties really add up. I’ve tried using Reminders, Linear, as my todo lists, and building my own. My productivity always suffers and I always go back to Todoist. One app is better than two: the more disjoint things you have to pay attention to, the worse it is. If you’re a software engineer I strongly advise against building your own, which is a terrible form of procrastination for creative types. Thanks to Cameron Pinnegar for reviewing. Strategies Chemistry First Procrastination Introspection Tactics Task Selection Visual Field Management Project Check-Ins Centralize Your Inboxes Inbox Bankruptcy Do It On Your Own Terms Replace Interrupts with Polling Accountability Buddy Plan First, Do Later Using Neuroticism to Defeat ADHD The Master of Drudgery Put Travel in the Calendar Choice of Tools Acknowledgements Internal change: starting medication unlocks… External change: using a todo list, which provides scaffolding (e.g. daily recurring tasks) for forming new habits, which unlocks Internal change: new habits formed (make bed, brush teeth in the morning) Memory: the list remembers things for me. I’m not at the mercy of my brain randomly pinging me that I forgot to do X or I want to someday do Y. The todo list remembers. Order: the todo list lets you drag and drop tasks around, so you can figure out the ordering in which you’re going to do them. Hierarchy: the todo list lets you break tasks down hierarchically and without limit. Things I am averse to, the things I intuitively want to put off because they bring up painful emotions, are high-voltage. Creative, open-ended work is high-voltage to start, but once you get started, keeping it going is medium-voltage. Simple chores like cleaning, throwing clothes in the washing machine, etc. are low-voltage. ADHD Procrastination: you want to do the task, but can’t because of distraction/hyperactivity. Anxious Procrastination: you know you have to do the task, but you don’t want to, because it triggers difficult emotions. Decision Paralysis Procrastination: you don’t know how to execute the task, because it involves a decision and you have difficulty making the decision. The main benefit is that to change bad patterns, you have to notice them. And it is very easy to travel in a fix orbit, day in, day out, and not notice it. Laying it out in writing helps to notice the maladaptive coping mechanisms. Reading back over the journal entries helps you notice: when an event of type X happens, I react with Y. Today’s journal entry is a good default place for writing ad-hoc notes or thoughts. Often I wanted to write something, but didn’t know where I would file it (how do you even file these little scraps of thought?) and from not knowing where to put it, I would not do it. Nowadays I just begin writing in the journal. Later, if it is valuable to file it away, I do so. Creating a journal entry in the morning is a good opportunity to go over the goals and priorities for the day and explicitly restate them to myself. The final benefit is retrospection: I can look at the past and see how my life has changed. And this is often a positive experience, because the things that worried me didn’t come to pass, the things I used to struggle with are now easy, or at least easier. There’s a paradox with productivity: when you grind executive function enough, things that you used to struggle with become quotidian. And so what was once the ceiling becomes the new floor. You no longer feel proud that you did X, Y, Z because that’s just the new normal. It’s like the hedonic treadmill. You might feel that you never get to “productive”. Journaling helps to combat this because you can see how far you’ve come. I lift exactly n times per week. I lift at most once a day. I lift in the evening, which potentially clashes with social things. There are adjacency constraints, e.g. doing shoulders the day before chest is bad. There is at least one rest day which has to be scheduled strategically (e.g. to have maximal distance between successive deadlift sessions). Overcoming Aversion: when you have a large number of microtasks, each of which takes a few seconds to a few minutes, but the number of them, and the uncertainty factor, makes the sum seem a lot larger. A classic example for me is having to reply to like ten different people. Realistically, each person can be handled in 15s. One or two might require a couple of minutes to compose a longer reply. But often I will avoid those tasks like the plague and drag them across the entire day. The pomodoro method works here because you’re basically trading (up to) 25m of pain for an entire day’s peace and quiet. So you get all the annoying little tasks together, start a timer, and go through them. And usually you’re done in maybe ten minutes. And you feel really good after, because all those annoying little tasks are done. It really is amazing what a little bit of fake urgency can do. Starting: sometimes the problem is just starting. It is very trite, but it’s true. You have something you want to want to do, but don’t want to do. I want to want to read this book, to learn this topic, to write this blog post, to work on this software project. But I don’t want to do it. The pomodoro method helps you start. You’re not committing to finishing the project. You’re not committing to months or weeks or days or even hours of work. You’re committing to a half hour. And if you work just that half hour: great, promise kept. 30m a day, over the course of a single month, is 15h of work. And often I start a 30m timer and end up working four hours, and maybe that’s a good outcome. Stopping: dually, sometimes the problem is stopping. If you’re trying to advance multiple projects at the same time, if you hyperfocus on one, it eats into the time you allocated for the others. And more broadly, spending too much time on one project can derail all your plans for the day. Maybe you meant to go to the gym at 6pm but you got so stuck in with this project that it’s 8:30pm and you’re still glued to the screen. So the gym suffers, your sleep schedule suffers, etc. Actually stopping when the pomodoro timer goes off can prevent excessive single-mindedness. Additionally, the five-minute break at the end of the pomodoro block is useful. It’s a time to get up from the computer, unround your shoulders, practice mindfulness, essentially, all those little things that you want to do a few times throughout the day. “Weak prioritization” means to sort a list of tasks by some unspecified criterion, that is, to establish an order where some things are prior to another. “Strong prioritization” is to sort a list specifically by importance. Stalled tasks get a priority bump. If I created a task weeks ago, or if I’ve been postponing in for many days in a row, it has to be done now. Content-dependence: if I’m working on a particular project, I’d rather focus on tasks from that project, rather than from the global todo list. What I’m currently working on. What I will work on next. The list of active projects, so that I don’t forget they exist. DMs on Twitter, iMessage, WhatsApp, Signal, Discord, etc. Twitter bookmarks Browser bookmarks Your Downloads folder. Messages in my myGov inbox. The physical mailbox in my apartment. Go through all my communication apps (email, Discord, Twitter DMs etc) and triage the unread conversations: if something needs replying to, I either reply immediately or make a task to reply later so I don’t forget. File the contents of my Downloads folder. Go through Twitter/browser bookmarks and turn them into tasks (e.g., if I bookmark an article, the task is to read the article). Inbox zero has no false negatives: if an inbox is empty, you know you’ve handled everything. Important communications have a way of “camouflaging” themselves among irrelevance. How To Do Things describes an ADHD-friendly version of the Pomodoro method. It’s a 50 page PDF with no fluff, so it’s worth buying to support writers who don’t waste the reader’s time. Getting Things Done has a lot of good advice (e.g. dump your entire brain into the todo list) but it’s somewhat neurotypical in that it’s assumed you won’t have any problems actually executing the tasks.

0 views
Fernando Borretti 7 months ago

Inboxes are Underrated

I have a lot of communication apps. By volume: Twitter DMs, Signal, Whatsapp, iMessage, Discord, email. Because I have so many disjoint places where communication happens, I have a daily task on Todoist to go through each of these, and ensure that every conversation is handled, where “handled” means: if I can reply immediately, I do so; otherwise, I make a task to reply. Polling is better than interrupts. But this is imperfect, because often I get distracted, and I do neither. Sometimes I read the other person’s message, and mentally begin drafting a reply, but forget to make a task. Sometimes I check DMs outside of this timeblock, when I’m less disciplined about following the checklist. Sometimes I’m interrupted before I can create the task. And so on. And all of these systems have a concept a conversation being read/unread, but it is fragile: touch it and it goes away. So if I don’t reply immediately, and I don’t make a task, I might never reply. And then new conversations pile up, burying the old ones. Email is where I get the least human communication, but it is the one system that has an inbox. And the inbox is invaluable for me, because it acts as a domain-specific todo list: it draws a hard line between the things that have been handled (archived), and the things that are not (inbox). Crossing this line requires an explicit act. With email, I can execute this algorithm: Because archiving requires an explicit action, there’s no possibility of forgetting to handle a conversation. This is the utility of inbox zero: it has no false negatives! If the inbox is empty, I know that all of my correspondence has been handled. If the inbox is non-empty, I know there is work to do. Why do so few apps have inboxes? Probably because most people never archive their emails, they just keep everything in the inbox. And probably the concept of an inbox reminds them of email, and email feels old and corporate and spammy. Most of the email I get is transactional (e.g. login codes), notifications, and spam. For people like me who want to be conscientious about communication, and who need mechanical help to achieve that, the lack of an inbox is really, really frustrating. And while inboxes could be entirely local to the client software, the protocol doesn’t have to implement the inbox/archive distinction. But communication protocols are increasingly locked down , so that you can’t bring your own client, with your own features. Tangentially: inbox zero is not an obvious practice at all. Rather than relying on the user to implement the inbox zero workflow, the client should make triaging a first-class workflow. Like spaced repetition: you open Anki , click “Study”, go through the flashcards due today, choosing either “Forgot” or “Remembered”. You open the email client, click “Triage”, and go through one conversation at a time, and choose either “Delete”, “Archive”, “Reply”, or “Skip”. Usually I archive a conversation immediately after replying, but sometimes you need a reply from the other person. So I make a task on my todo list that says “Waiting for a reply from X”. The idea is from Getting Things Done . If the person doesn’t reply, the existence of the task reminds me to ping them again. Otherwise I will certainly forget about it.  ↩ For each conversation in the inbox: If it’s spam, delete it. If it doesn’t need a reply, archive it. If I can reply immediately, reply and archive the conversation 1 . If I can’t reply immediately, make a task to reply. Usually I archive a conversation immediately after replying, but sometimes you need a reply from the other person. So I make a task on my todo list that says “Waiting for a reply from X”. The idea is from Getting Things Done . If the person doesn’t reply, the existence of the task reminds me to ping them again. Otherwise I will certainly forget about it.  ↩

0 views
Fernando Borretti 8 months ago

You Can Choose Tools That Make You Happy

On Hacker News and Lobsters I often see blog posts with titles like: The general form being: why Obscure Thing is better than Popular Thing. And always the justification is purportedly rational and technical. And always, always, it is complete sophistry. Why? Because people make technical decisions, in part, for affective reasons. They choose a technology because it feels good, or comfortable, or because it’s what they know. They choose obscure tech as a form of sympathetic magic, like the guy who uses NetBSD on a ThinkPad to feel like a William Gibson protagonist. They choose obsolete languages, like Lisp or Smalltalk, because they think of the heroic age of Xerox PARC, and they want to feel connected to that tradition. They find tools whose vibes align with theirs: Ada says “slow, conservative, baroque” while Rust says “fast-paced, unproven, parvenu”. They use Emacs because they read that Neal Stephenson essay and they feel VS Code is for normies and Emacs is Gnostic. But many people can’t admit this to themselves! Because it is contrary to their identity: that they are unfeeling Cartesian rationalist automata. And so they invent rationalizations. Once you read enough of these posts, you see the patterns. The arguments for the Obscure Thing downplay the downsides (“yeah I had to take a six-month detour to implement an HTTP server for Fortran 2023”) and invent not-even-wrong upsides. I once read someone argue Common Lisp is great because it has garbage collection, like the writer has some obscure form of agnosia where their brain doesn’t register the existence of Python. The arguments against the Popular Thing are vague (“Docker is too complex”) or rely on social shaming (“the community is toxic”) or claims about identity (“Rust makes you soft and weak, C++ keeps you on your toes”). And sometimes the arguments are true, but they would not tip the scales of a more dispassionate assessment. So let’s cut the knot. Emacs is a Gnostic cult. And you know what? That’s fine. In fact, it’s great. It makes you happy, what else is needed? You are allowed to use weird, obscure, inconvenient, obsolescent, undead things if it makes you happy. We are all going to die. If you’re lucky you get three gigaseconds and you’re up. Do what you are called to do. Put ZFS in your air fryer, do your taxes in Fortran. We use tools to embody their virtues. You use Tails because it’s cyberpunk? That’s beautiful man. Go all in. Get a leather jacket. If you’re doing it for the aesthetics, go all in. Make your life a living work of art. Go backpacking in Bangkok and write a novel on a Gemini and take pictures for your LiveJournal on a 2003 digital camera. Move the family groupchat to Signal. Dial into standup from an ISDN payphone and tell your PM the feds are after you. And write a blog post about that. Just don’t bullshit me. Don’t look me in the eye and tell me SNOBOL is the language of the future. Don’t tell your boss it was a rational cost-benefit calculation that made you rewrite the frontend in Prolog. Above all, do not lie to yourself. Examine your motivations. If you pursue things out of pure obsession, and ignore reason, you might wake up and realize you’ve spent years labouring in obscurity on a dead-end. Why I built my startup on Common Lisp and DragonflyBSD Rewriting PyTorch in APL (year six update) I will never, ever, ever learn Docker

0 views
Fernando Borretti 9 months ago

Two Years of Rust

I recently wrapped up a job where I spent the last two years writing the backend of a B2B SaaS product in Rust , so now is the ideal time to reflect on the experience and write about it. I didn’t learn Rust the usual way: by reading tutorials, or books; or writing tiny projects. Rather, I would say that I studied Rust, as part of the research that went into building Austral . I would read papers about Rust, and the specification, and sometimes I’d go on the Rust playground and write a tiny program to understand how the borrow checker works on a specific edge case. So, when I started working in Rust, my knowledge was very lopsided: I had an encyclopedic knowledge of the minutiae of the borrow checker, and couldn’t have told you how to write “Hello, world!”. The largest Rust program I had written was maybe 60 lines of code and it was to empirically test how trait resolution works. This turned out fine. Within a day or two I was committing changes. The problem is when people ask me for resources to learn Rust, I draw a blank. The way I would summarize Rust is: it’s a better Go, or a faster Python. It’s fast and statically-typed, it has SOTA tooling, and a great ecosystem. It’s not hard to learn. It’s an industrial language, not an academic language, and you can be immensely productive with it. It’s a general-purpose language, so you can build backends , CLIs , TUIs , GUIs , and embedded firmware. The two areas where it’s not yet a good fit are web frontends (though you can try) and native macOS apps. Rust is fast. You can write slow code in any language: quadratic loops and n+1 queries and bad cache usage. But these are discrete bottlenecks. In Rust, when you fix the bottlenecks, the program is fast. In other languages performance problems are often pervasive , so e.g. in Python it’s very common to have a situation where you’ve fixed all the bottlenecks—and everything is still unacceptably slow. Why? Because in Python the primitives are 10x to 100x slower than in Rust, and the composition of slow primitives is a slow program. No matter how much you optimize within the program, the performance ceiling is set by the language itself. And when you find yourself in that situation, what is there to do? You can scale the hardware vertically, and end up like those people who spend five figures a month on AWS to get four requests per second. You can keep your dependencies up to date, and hope that the community is doing the work of improving performance. And you can use async as much as possible on the belief that your code is I/O-bound, and be disappointed when it turns out that actually you’re CPU-bound. By having a high performance ceiling, Rust lets you write programs that are default fast without thinking too much about optimization, and when you need to improve performance, you have a lot of room to optimize before you hit the performance ceiling. Cargo has the best DX of any build system+package manager I have used. Typically you praise the features of a program, with cargo you praise the absences: there’s no gotchas, no footguns, no lore you have to learn in anger, no weirdness, no environment variables to configure, no virtualenvs to forget to activate. When you copy a command from the documentation and run it, it works, it doesn’t spit out a useless error message that serves only as a unique identifier to find the relevant StackOverflow/Discourse thread. Much of the DX virtues are downstream of the fact that cargo is entirely declarative rather than stateful. An example: something that always trips me up with npm is when I update the dependencies in the , running the type-checker/build tool/whatever doesn’t pick up the change. I get an unexpected error and then I go, oh, right, I have to run first. With cargo, if you update the dependencies in the file, any subsequent command ( or or ) will first resolve the dependencies, update , download any missing dependencies, and then run the command. The state of ( , , local dependency store) is always synchronized. Rust has a good type system: sum types with exhaustiveness checking, option types instead of , no surprising type conversions. Again, as with tooling, what makes a type system good is a small number of features, and a thousand absences, mistakes that were not made. The practical consequence is you have a high degree of confidence in the robustness of your code. In e.g. Python the state of nature is you have zero confidence that the code won’t blow up in your face, so you spend your time writing tests (to compensate for the lack of a type system) and waiting for the tests to clear CI (because Python is slow as shit). In Rust you write the code and if it compiles, it almost always works. Writing tests can feel like a chore because of how rarely they surface defects. To give an example: I don’t really know how to debug Rust programs because I never had to. The only parts of the code I had to debug were the SQL queries, because SQL has many deficiencies . But the Rust code itself was overwhelmingly solid. When there were bugs, they were usually conceptual bugs, i.e., misunderstanding the specification. The type of bugs that you can make in any language and that testing would miss. There’s two ways to do errors: traditional exception handling (as in Java or Python) keeps the happy path free of error-handling code, but makes it hard to know the set of errors that can be raised at a given program point. Errors-as-values, as in Go, makes error handling more explicit at the cost of being very verbose. Rust has a really nice solution where errors are represented as ordinary values, but there’s syntactic sugar that means you don’t have to slow down to write a thousand times over. In Rust, an error is any type that implements the trait. Then you have the type: Functions which are fallible simply return a , e.g.: The question mark operator, , makes it possible to write terse code that deals with errors. Code like this: Is transformed to the much more verbose: When you need to explicitly handle an error, you omit the question mark operator and use the value directly. The borrow checker is Rust’s headline feature: it’s how you can have memory safety without garbage collection, it’s the thing that enables “fearless concurrency”. It’s also, for most people, the most frustrating part of learning and using Rust. Personally I didn’t have borrow checker problems, but that’s because before I started using Rust at work I’d designed and built my own borrow checker . I don’t know if that’s a scalable pedagogy. Many people report they have to go through a lengthy period of fighting the borrow checker, and slowly their brain discovers the implicit ruleset, and eventually they reach a point where they can write code without triggering inscrutable borrow checker errors. But that means a lot of people drop out of learning Rust because they don’t like fighting the borrow checker. So, how do you learn Rust more effectively, without building your own compiler, or banging your head against the borrow checker? Firstly, it’s useful to understand the concepts behind the borrow checker, the “aliased XOR mutable” rule, the motivation behind linear types, etc. Unfortunately I don’t have a canonical resource that explains it ab initio . Secondly, a change in mindset is useful: a lot of people’s mental model of the borrow checker is as something bolted “on top” of Rust, like a static analyzer you can run on a C/C++ codebase, which just happens to be built into the compiler. This mindset leads to fighting the system, because you think: my code is legitimate, it type-checks, all the types are there, it’s only this final layer, the borrow checker, that objects. It’s better to think of the borrow checker as an intrinsic part of the language semantics. Borrow checking happens, necessarily, after type-checking (because it needs to know the types of terms), but a program that fails the borrow checker is as invalid as a program that doesn’t type-check. Rather than mentally implementing something in C/C++, and then thinking, “how do I translate this to Rust in a way that satisfies the borrow-checker?”, it’s better to think, “how can I accomplish the goal within the semantics of Rust, thinking in terms of linearity and lifetimes?”. But that’s hard, because it requires a high level of fluency. When you are comfortable with the borrow checker, life is pretty good. “Fighting the borrow checker” isn’t something that happens. When the borrow checker complains it’s either because you’re doing something where multiple orthogonal features impinge on each other (e.g. async + closures + borrowing) or because you’re doing something that’s too complex, and the errors are a signal you have to simplify. Often, the borrow checker steers you towards designs that have mechanical sympathy, that are aligned with how the hardware works. When you converge on a design that leverages lifetimes to have a completely -free flow of data, it is really satisfying. When you design a linearly-typed API where the linearity makes it really hard to misuse, you’re grateful for the borrow checker. Everyone complains about async. They complain that it’s too complex or they invoke that thought-terminating cliche about “coloured functions”. It’s easy to complain about something when comparing it to some vague, abstract, ideal state of affairs; but what, exactly, is the concrete and existing alternative to async? The binding constraint is that OS threads are slow. Not accidentally but intrinsically, because of the kernel, and having to swap the CPU state and stack on each context switch. OS threads are never going to be fast. If you want to build high-performance network services, it matters a lot how many concurrent connections and how much throughput you can get per CPU. So you need an alternative way to do concurrency that lets you maximize your hardware resources. And there are basically two alternatives. From the perspective of a language implementor, or someone who cares about specifying the semantics of programming languages, async is not a trivial feature. The intersection of async and lifetimes is hard to understand. From the perspective of a library implementor, someone who writes the building blocks of services and is down in the trenches with / / , it’s rough. But from the perspective of a user, async Rust is pretty good. It mostly “just works”. The user perspective is you put in front of function definitions that perform IO and you put at the call sites and that’s it. The only major area where things are unergonomic is calling async functions inside iterators. It’s paint by numbers. The type errors make refactoring extremely straightforward and safe. Is it hard to hire Rust programmers? No. First, mainstream languages like Python and TypeScript are so easy to hire for that they wrap back around and become hard. To find a truly talented Python programmer you have to sift through a thousand resumes. Secondly, there’s a selection effect for quality. “Has used Rust”, “has written open-source code in Rust”, or “wants to use Rust professionally” are huge positive signals about a candidate because it says they are curious and they care about improving their skills. Personally I’ve never identified as a “Python programmer” or a “Rust programmer”. I’m just a programmer! When you learn enough languages you can form an orthogonal basis set of programming concept and translate them across languages. And I think the same is true for the really talented programmers: they are able to learn the language quickly. Enough about tech. Let’s talk about feelings. When I worked with Python+Django the characteristic feeling was anxiety . Writing Python feels like building a castle out of twigs, and the higher you go, the stronger the wind gets. I expected things to go wrong, I expected the code to be slow, I expected to watch things blow up for the most absurd reasons. I had to write the code defensively, putting type assertions everywhere. Rust feels good. You can build with confidence. You can build things that not only work as desired but which are also beautiful . You can be proud of the work that you do, because it’s not slop. This section describes the things I don’t like. In Rust, there’s two levels of code organization: A project, or workspace, can be made up of multiple crates. For example a web application could have library crates for each orthogonal feature and an executable crate that ties them together and starts the server. What surprised me was learning that modules are not compilation units, and I learnt this by accident when I noticed you can have a circular dependency between modules within the same crate 1 . Instead, crates are the compilation unit. When you change any module in a crate, the entire crate has to be recompiled. This means that compiling large crates is slow, and large projects should be broken down into many small crates, with their dependency DAG arranged to maximize parallel compilation. This is a problem because creating a module is cheap, but creating a crate is slow. Creating a new module is just creating a new file and adding an entry for it in the sibling file. Creating a new crate requires running , and don’t forget to set in the , and adding the name of that crate in the workspace-wide so you can import it from other crates. Importing a symbol within a crate is easy: you start typing the name, and the LSP can auto-insert the declaration, but this doesn’t work across crates, you have to manually open the file for the crate you’re working on and manually add a dependency to the crate you want to import code from. This is very time-consuming. Another problem with crate-splitting is that has a really nice feature that warns you when code is unused. It’s very thorough and I like it because it helps to keep the codebase tidy. But it only works within a crate. In a multi-crate workspace, declarations that are exported publicly in a crate, but not imported by any other sibling crates, are not reported as unused. 2 So if you want builds to be fast, you have to completely re-arrange your architecture and manually massage the dependency DAG and also do all this make-work around creating and updating crate metadata. And for that you gain… intra-crate circular imports, which are a horrible antipattern and make it much harder to understand the codebase. I would much prefer if modules were disjoint compilation units. I also think the module system is just a hair too complex, with re-exports and way too many ways to import symbols. It could be stripped down a lot. The worst thing about the Rust experience is the build times. This is usually blamed on LLVM , which, fair enough, but I think part of it is just intrinsic features of the language, like the fact that modules are not independent compilation units, and of course monomorphization. There are various tricks to speed up the builds: caching , cargo chef , tweaking the configuration . But these are tricks, and tricks are fragile. When you notice a build performance regression, it could be for any number of reasons: It’s not worth figuring out. Just pay for the bigger CI runners. Four or eight cores should be enough. Too much parallelism is waste: run with the flag, open the report in your browser, and look at the value of “Max concurrency”. This tells you how many crates can be built in parallel, and, therefore, how many cores you can buy before you hit diminishing returns. The main thing you can do to improve build performance is to split your workspace into multiple crates, and arranging the crate dependencies such that as much of your workspace can be built in parallel. This is easy to do at the start of a project, and very time-consuming after. Maybe this is a skill issue, but I have not found a good way to write code where components have swappable dependencies and can be tested independently of their dependencies. The central issue is that lifetimes impinge on late binding. Consider a workflow for creating a new user in a web application. The three external effects are: creating a record for the user in the database, sending them a verification email, and logging the event in an audit log: Testing this function requires spinning up a database and an email server. No good! We want to detach the workflow from its dependencies, so we can test it without transitively testing its dependencies. There’s three ways to do this: And all of these approaches work. But they require a lot of make-work. In TypeScript or Java or Python it would be painless, because those languages don’t have lifetimes, and so dynamic dispatch or closures “just work”. For example, say we’re using traits and doing everything at compile-time. To minimize the work let’s just focus on the dependency that writes the user’s email and password to the database. We can define a trait for it: (We’ve parameterized the type of database transactions because the mock won’t use a real database, therefore, we won’t have a way to construct a type in the tests.) The real implementation requires defining a placeholder type, and implementing the trait for it: The mock implementation uses the unit type as the type of transactions: Finally we can define the workflow like this: The live, production implementation would look like this: While in the unit tests we would instead create a and pass it in: Obviously this is a lot of typing. Using traits and dynamic dispatch would probably make the code marginally shorter. Using closures is probably the simplest approach (a function type with type parameters is, in a sense, a trait with a single method), but then you run into the ergonomics issues of closures and lifetimes. Again, this might be a skill issue, and maybe there’s an elegant and idiomatic way to do this. Alternatively, you might deny the entire necessity of mocking, and write code without swappable implementations, but that has its own problems: tests become slower, because you have to spin up servers to mock things like API calls; tests require a lot of code to set up and tear down these dependencies; tests are necessarily end-to-end, and the more end-to-end your tests, the more test cases you need to check every path because of the combinatorial explosion of inputs. It’s easy to go insane with proc macros and trait magic and build an incomprehensible codebase where it’s impossible to follow the flow of control or debug anything. You have to rein it in. If modules were separate compilation units this wouldn’t work. If module A depends on B, to compile A you need to first compile B to know what declarations it exports and what their types are. But if B also depends on A, you have an infinite regression.  ↩ One way to fix this is to make extremely fine-grained crates, and rely on to identify unused code at the dependency level. But this would take up way too much time.  ↩ The Good Performance Type Safety Error Handling The Borrow Checker Refactoring The Bad The Module System Build Performance Expressive Power Green threads, which give programmers the same semantics as OS threads (good!) but often leave a lot of performance on the table (bad!) because you need to allocate memory for each thread’s stack and you need a runtime scheduler to do preemptive multitasking. Stackless coroutines, as in Rust, which add complexity to the language semantics and implementation (bad!) but have a high performance ceiling (good!). Modules are namespaces with visibility rules. Crates are a collection of modules, and they can depend on other crates. Crates can be either executables or libraries. The code is genuinely larger, and takes longer to build. You’re using language features that slow down the frontend (e.g. complex type-level code). You’re using language features that slow down the backend (e.g. excessive monomorphization). A proc macro is taking a very long time ( in particular is fantastically slow). The crate DAG has changed shape, and crates that used to be built in parallel are now being built serially. Any of the above, but in the transitive closure of your dependencies. You’ve added/updated an immediate dependency, which pulls in lots of transitive dependencies. You’re caching too little, causing dependencies to be downloaded. You’re caching too much , bloating the cache, which takes longer to download. The cache was recently invalidated (e.g. by updating ) and has not settled yet. The CI runners are slow today, for reasons unknowable. The powerset of all of the above. (Insert Russell’s paradox joke) Use traits to define the interface, and pass things at compile-time. Use traits to define the interface, and use dynamic dispatch to pass things at run-time. Use function types to define the interface, and pass dependencies as closures. If modules were separate compilation units this wouldn’t work. If module A depends on B, to compile A you need to first compile B to know what declarations it exports and what their types are. But if B also depends on A, you have an infinite regression.  ↩ One way to fix this is to make extremely fine-grained crates, and rely on to identify unused code at the dependency level. But this would take up way too much time.  ↩

0 views
Fernando Borretti 9 months ago

My Backup Infrastructure, 2025 Edition

tl;dr two portable SSDs, synced with rsync ; and a Backblaze bucket synced with restic : I’m finally satisfied with my infrastructure for backups, so I’m writing it up so others can benefit from it. My requirements for backup infrastructure are: The one non-criterion is portability. Because I only use macOS, I don’t need a solution where I can restore the backups from different operating systems. I have two portable SSDs, Chiron and Nessus, with encrypted APFS . The filesystem itself being encrypted is extremely convenient: I just plug them in, and the macOS keychain has the keys. There’s no possibility of accidentally leaking cleartext into the disk because the encryption is transparent. I use rsync to synchronize the laptop to the disks. The specific incantation is: Which recursively copies the contents of into , preserving permissions/times/the executable flag, using checksums rather than heuristics to see which files have changed, and deleting files that exist in the target but not the source. Note that in rsync, trailing slashes matter! creates a directory inside , while syncs the contents of inside . Why two disks? No reason. Why have one when you can have two for twice the price? Continuing with the centaur naming convention, I have a Backblaze bucket named Pholus, and I use restic to take snapshots of the laptop and upload them to the bucket. Why Backblaze? Because it’s cheaper than S3 , and less involved than S3 (no IAM/roles/policies/etc.), and it does one thing and does it well. I would use S3 if I already had other personal infrastructure on AWS, and latency was a problem (I’m in Australia, and Backblaze is not; with AWS I could have an S3 bucket with ~6ms latency to my home). Why restic? Because everything else is worse. Duplicity requires using GnuPG for key management, which is like if to start a car you had to stab yourself with your keys. Borg is written in Python, which is usually a bad sign for performance and user experience. Rclone , by default, is just cloud rsync, it doesn’t encrypt anything, you have to use a two-level configuration, where a backend acts as a proxy to the real storage backend. So if you misconfigure things, you could end up writing cleartext to the cloud. restic is easy to learn. The ontology is: you have a thing called a repository , which could be a local directory or a remote object store, identified by a path and locked with a password. A repository has a list of snapshots, which are like Git commits: a snapshot of a directory at a point in time. You can the contents of snapshots and even restore specific files , which is useful for checking that a snapshot has the data you want without restoring the whole thing. I recommend trying out the commands using local repositories, where the data is stored in a directory. That lets you get a hang of the ontology and the commands. Then you can create a repository backed by cloud storage. restic supports Backblaze directly, but the documentation recommends using Backblaze’s S3-compatible API. To do this, when creating a bucket key you have to tick “Allow List All Bucket Names”, you will also have to know how to map the Backblaze key properties to the AWS environment variables. This is the only difficulty. Taking a snapshot is just: You will then be asked to enter the repository password. For added peace of mind, you can the snapshot and dump the contents of a few representative files. I have a recurring task on my todo list whereby, once a week, I plug in the external drives, run the backup script, and also take a restic snapshot. I could leave the drives plugged in all the time, and run automatically every day, but my MacBook Air doesn’t have enough ports for that and, also, this risks propagating data loss to the backups, which defeats the purpose. My doing manual backups, if I lose data unintentionally, I have up to a week to notice and restore it from the SSD. Open source, to minimize the risk of backdoors. Fast, but only incrementally: an initial snapshot can be slow. Simple configuration, with little surface area to mess things up. Encryption with keys that I control and which never leave my device. Ideally, encryption should be mandatory, to prevent accidentally putting cleartext on backup media. Has to satisfy the 3-2-1 rule : at least three disjoint copies, in two different media, at least one off-site. There has to be a known (documented, memorable) path to recovery. It would be embarrassing if you went to restore your backups and suddenly realized there’s a missing link that prevents you from e.g. recovering the encryption key.

0 views
Fernando Borretti 9 months ago

We Live In a Golden Age of Interoperability

Yesterday I was reading Exploring the Internet , an oral history of the early Internet. The first part of the book describes the author’s efforts to publish the ITU ’s Blue Book: 19 kilopages of standards documents for telephony and networks. What struck me was the description of the ITU’s documentation stack: A week spent trolling the halls of the ITU had produced documentation on about half of the proprietary, in-house text formatting system they had developed many years ago on a Siemens mainframe. The computer division had given me nine magnetic tapes, containing the Blue Book in all three languages. […] We had two types of files, one of which was known to be totally useless. The useless batch was several hundred megabytes of AUTOCAD drawings, furnished by the draftsmen who did the CCITT illustrations. Diagrams for the Blue Book were done in AUTOCAD, then manually assembled into the output from the proprietary text formatting system. […] Turned out that AUTOCAD was indeed used for the diagrams, with the exception of any text in the illustrations. The textless diagrams were sent over to the typing pool, where people typed on little pieces of paper ribbon and pasted the itsy-bitsy fragments onto the illustrations. Come publication time, the whole process would be repeated, substituting typeset ribbons for typed ribbons. A nice production technique, but the AUTOCAD files were useless. The rationale for this bizarre document production technique was that each diagram needed text in each of the three official languages that the ITU published. While AUTOCAD (and typing) was still being used, the ITU was slowly moving over to another tool, MicroGrafix Designer. There, using the magical concept of layers, they were proudly doing “integrated text and graphics.” The second batch of DOS files looked more promising. Modern documents, such as the new X.800 recommendations, were being produced in Microsoft Word for Windows. My second batch of tapes had all the files that were available in the Word for Windows format, the new ITU publishing standard. Proprietary tape drives with proprietary file systems. AutoCAD for vector graphics. Text documents in the proprietary, binary Word format. Note that the diagrams were being assembled physically , by pasting pieces of paper together. And then they were photographed. That’s why it’s called a “camera ready” copy. And this is 1991, so it’s not a digital camera: it’s film, silver-halogen crystals in collagen. It’s astounding to think that this medieval process was happening as recently as the 90s. Compare this to today: you drag some images into Adobe FrameMaker and press print. The ITU had documented the format we could expect the tapes to be in. Each file had a header written in the EBCDIC character set. The file itself used a character set seemingly invented by the ITU, known by the bizarre name of Zentec. The only problem was that the header format wasn’t EBCDIC and the structure the ITU had told us would be on the tape wasn’t present. Proprietary character sets! Next, we had to tackle TPS. This text formatting language was as complicated as any one could imagine. Developed without the desire for clarity and simplicity I had come to expect from the UNIX operating system and its tools, I was lost with the Byzantine, undocumented TPS. The solution was to take several physical volumes of the Blue Book and compare the text to hexadecimal dumps of the files. I then went to the Trident Cafe and spent a week drinking coffee trying to make sense of the data I had, flipping between the four files that might be used on any given page of text trying to map events in the one-dimensional HexWorld to two-dimensional events in the paper output. Finally, after pages and pages of PERL code, we had the beginnings of a conversion program. We had tried to use the software developed at the ITU to convert from TPS into RTF , but the code had been worse than useless. A proprietary, in-house, (ironically) undocumented document-preparation system! Today this would be a Git repo with Markdown files and TikZ / Asymptote source files for the diagrams, and a Makefile to tie it all together with Pandoc . Maybe a few custom scripts for the things Markdown can’t represent, like complex tables or asides. Maybe DITA if you really like XML. This reminded me of a similar side quest I attempted many years ago: I tried to build a modern version of the Common Lisp HyperSpec from the source text of the ANSI Common Lisp draft (the draft being in the public domain, unlike the officially blessed version). The sources are in TeX, not “modern” LaTeX but 90’s TeX. Parsing TeX is hard enough, the language is almost-but-not-quite context free, it really is meant to be executed as it is parsed; rather than parsed, represented, and transformed. But even if you managed to parse the TeX sources using a very flexible and permissive TeX parser, you have to apply a huge long tail of corrections just to fix bad parses and obscure TeX constructs. In the end I gave up. We live in much better times. For every medium, we have widely-used and widely-implemented open formats: Unicode and Markdown for text, JSON and XML for data exchange, JPEG/PNG/SVG for images, Opus for audio, WebM for videos. Unicode is so ubiquitous it’s easy to forget what an achievement it is. Essentially all text today is UTF-8 except the Windows APIs that were designed in the 90s for “wide characters” i.e. UTF-16. I remember when people used to link to the UTF-8 Everywhere manifesto. There was a time, not long ago, when “use UTF-8” was something that had to be said. Rich text is often just Markdown. Some applications have more complex constructs that can’t be represented in Markdown, in those cases you can usually get the document AST as JSON. The “worst” format most people ever have to deal with is XML, which is really not that bad . Data exchange happens through JSON, CSV, or Parquet . Every web API uses JSON as the transport layer, so instead of a thousand ad-hoc binary formats, we have one plain-text, human-readable format that can be readily mapped into domain objects. Nobody would think to share vector graphics in DWG format because we have SVG, an open standard. TeX is probably the most antediluvian text “format” in widespread use, and maybe Typst will replace it. Math is one area where we’re stuck with embedding TeX (through KaTeX or equivalent) since MathML hasn’t taken off (understandably, since nobody wants to write XML by hand). Filesystems are usually proprietary, but every operating system can read/write a FAT32/NTFS flash drive. In any case networking has made filesystems less important: if you have network access you have Google Drive or S3. And filesystems are a lot less diverse nowadays: except for extended attributes, any file tree can be mapped losslessly across ext4, NTFS, and APFS. This was not true in the past! It took decades to converge on the definition of a filesystem as “a tree of directories with byte arrays at the leaf nodes”, e.g. HFS had resource forks , the VMS file system had versioning built in. File paths were wildly different. Open standards are now the default. If someone proposes a new data exchange format, a new programming language, or things of that nature, the expectation is that the spec will be readable online, at the click of a button, either as HTML or a PDF document. If implementing JSON required paying 300 CHF for a 900 page standards document, JSON would not have taken off. Our data is more portable than ever, not just across space (e.g. if you use a Mac and a Linux machine) but across time. In the mid-80s the BBC wanted to make a latter-day Domesday Book . It was like a time capsule: statistical surveys, photographs, newsreels, people’s accounts of their daily life. The data was stored on LaserDisc , but the formats were entirely sui generis , and could only be read by the client software, which was deeply integrated with a specific hardware configuration. And within a few years the data was essentially inaccessible, needing a team of programmer-archeologists to reverse engineer the software and data formats. If the BBC Domesday Book was made nowadays it would last forever: the text would be UTF-8, the images JPEGs, the videos WebM, the database records would be CSVs or JSON files, all packaged in one big ZIP container. All widely-implemented open standards. A century from now we will still have UTF-8 decoders and JSON parsers and JPEG viewers, if only to preserve the vast trove of the present; or we will have ported all the archives forward to newer formats. All this is to say: we live in a golden age of interoperability and digital preservation.

0 views
Fernando Borretti 9 months ago

Domain-Agnostic and Domain-Specific Tools

This post is, in a sense, a continuation to Unbundling Tools for Thought . It’s an argument for why you shouldn’t try to use a single tool to do everything, aimed at people who have been spent too much time shoveling prose into a “second brain” and have little to show for it. Software tools span a spectrum from domain-agnostic to domain-specific. Domain-agnostic tools are things like Obsidian. They have a small, spartan data model that can be made to represent most things. Obsidian’s data model is just folders, pages, and links. Pages have a title and a body, and the body is text, and text is the universal interface. They have a small number of general workflows: creating a page, editing a page, viewing backlinks, text search. You can use them as a journal, a recipe app, a todo list, etc. Domain-specific tools have a richer and more structured data model. Consider a CRM: there are first-class objects to represent people, companies, employment relations; these have rich attributes, you can represent “this person worked for this company in this position for this span of time” natively within the data model. This structure allows you to have a large number of much more specific workflows, like “see everyone who worked with this person” or “find everyone who worked for this company in 2016”. But you can’t use them outside the domain: you can’t use a CRM as a recipe app. And here’s the asymmetry: while the tools can be domain-agnostic or domain-specific, your use cases are always specific. You are always doing some concrete thing. And for any one specific use case, a specific tool can deliver a better ontology and a better UX than a general tool. Because when you implement a specific use case in a domain-agnostic tool, you are always building on top of the tool’s data model. If you use e.g. Obsidian (or any other note-taking app) as, e.g., a CRM, there’s an abstract concept of people, companies, employment, etc., but these concepts don’t have a first-class existence, everything is concretely implemented as pages and links. You have a page to represent a person, a page to represent a company, and you use a link from the former to the latter to represent the “employed by” relation, and the corresponding backlink represents the “employs” relation. At the ontology level, your data looks like this: But the concrete data model looks like this: And all the domain-specific nuances are hidden in text, invisible to software automation. Whereas in a domain-specific tool, you are building inside the data model: there’s a table that implements the concept of “a person”, it has a fixed set of attributes. At every point in time, the database has a fixed schema: you know all the attributes a company object can have, you know all your entries are consistent and coherent. Instead of a generic notion of a bidirectional link, you have first-class objects that represent relations: e.g. an employment relation that links people to companies is represented by a table that points the person and their employer and has metadata (the person’s role, the start and end date of their employment). When it comes to workflows, using a domain-agnostic tool means you either have to do most things by hand or through plugins. Doing it by hand is straightforwardly less efficient. But plugins never feel right. Often the UX feels janky because plugins are built to a lower standard. But ultimately plugins mean you go from a coherent, unified vision to a cacophony of a hundred visions which are mutually suspicious and subtly out of alignment with one another. The main benefit of using a domain-agnostic app is that everything lives in the same data silo: you can cross-link data from many different disjoint use cases, e.g. journal entries and project documents and a reference library. Unlike web links, a single unified object graph can avoid dangling links, because the app can enforce link integrity. But this linking is hardly ever useful: do you actually need to have a bidirectional link between your journal entries and your recipes? Is there a benefit to this? You know where the link leads to. And links create a maintenance burden by making the entire graph structure more rigid. So why do people use domain-agnostic apps at all? Partly, because a lot of use-cases are too rare or ad-hoc to require specific software. If you have three entries in a spreadsheet of book reviews, it’s not necessary to go looking for a piece of software to manage them. This calculation will change as AI lowers the cost of software development. But part of the reason is ideological: a lot of people bemoan data silos and have this aspirational idea that computing would be so much better if everything was more deeply interlinked. If you let go of this monistic obsession that everything under the Sun should go in the one giant object graph, and instead let each piece of data live in its own silo, you can be more effective.

0 views
Fernando Borretti 11 months ago

Non-Fiction Has Bad Incentives

By “non-fiction” I mean mass-market non-fiction, those paperbacks with titles like “Sleep: Why We Need It, And Why We Don’t Get Enough of It”. Textbooks and technical books are a separate category. The problem with non-fiction, and the reason most non-fiction books are not worth reading, is the interests of the reader and writer are misaligned . The actual text of the book—the semantic content—doesn’t matter to the writer. If you read a few of these books, you inevitably notice the patterns: every chapter begins with an anecdote, or with a time and place and a person, to humanize the topic; the tone is didactic, condescending, similar to the voice of authority ; every book feels like it was written by the same bloodless person. The book’s real content is a one-page essay that has been mechanically expanded to publication length, and filled with irrelevance. The reader wants the essay: something novel and useful and brief, because few things are worth elaborating to 200 pages, but the writer is not incentivized to provide it. If the text doesn’t matter, what does? The press tour, the interviews, the readings in libraries followed by a Q&A, the excerpts of the book that are published in literary magazines, the reviews published in famous newspapers. Writing a non-fiction book is both a means to increase your social status and a way to become intellectually legitimate : someone who can be cited as an authority, someone whom journalists can quote. What separates a crackpot with a blog from an intellectual authority is the latter has had their work published by a legitimate publisher and reviewed by a legitimate newspaper. Dually, a writer who self-publishes non-fiction on Amazon or Gumroad, and whose reviews come exclusively from normal people, will accrue some social status, but they will not become an intellectual authority. Like the photoelectric effect : a million five-star Goodreads ratings are less than a blurb from The New York Times . Why does this matter? Because we have too many books already, and publishing as a status play pollutes the information environment.

0 views
Fernando Borretti 11 months ago

Composable SQL

SQL could be improved somewhat by introducing composable query fragments with statically-typed interfaces. I begin by explaining two areas (testing and reusing business logic) where SQL does very poorly. Then I explain my solution, and how it addresses the problems. This section explains two big pain points of SQL. Testing SQL is impossible. Consider the simplest possible query: What does this query depend on? It depends on the column from the table. In an ideal world, the smallest complete test dataset for this query would be: But it’s not enough to populate the columns a query depends on. You have to populate every column in the row: The values of these columns are completely causally disconnected from the query. They cannot influence the output. But you must populate them. And the process is explosive: to insert a row you have to insert every row that it points to, on and on recursively until you hit a root object (typically a user). And each of those rows must have every one of its columns populated. Testing the simplest query requires building up a massive object graph, just to test an infinitesimal slice of it. Reams and reams of code have to be written to insert test data, of which only a few lines are causally relevant. More of your time will be spent writing test fixtures, factories, test helpers. Tests become too long to write ab initio , so a test suite becomes a giant file of copy-pasted functions, each of which differ in only a few lines. Shallow queries that retrieve objects near the root of the foreign key DAG are easy to test. Queries that involve joins across many tables, or which retrieve objects that are deep in the DAG, are catastrophically expensive to write tests for. So your test coverage is uneven: the queries that don’t need tests have them, the queries that need them don’t. And the performance of the test suite is really bad, which starts to hurt even in medium-sized projects. There are no good solutions to this: “Business logic” is usually thought of as imperative: in response to an event, we do X, Y, and Z. But if you have a fully-normalized database, a lot of your business logic is going to be implemented at read-time. This generally falls into two categories: Imagine you’re working on a logistics system. We have boxes, which have mass, and boxes can go on pallets: Pallets have a dry mass, a maximum payload mass they can support, and they go on containers: Pallets also have a number of computed properties: Containers are analogous to pallets, but one level up: With just three tables, there are many possible questions we could ask about our data: Each question corresponds to a different query. Each query depends on computed properties of the data, such as the clearance state of a container. On top of that, logic builds on logic: the logic for “is this pallet cleared?” depends on the logic for “what is the payload mass of this pallet?”. We want to write our queries in a way that satisfies these properties: Because SQL has such limited means of abstraction, we have only a choice of bad options: The next sections explain why each option is bad. Write out the logic for computed properties in every query. Hope that changes to the business logic affect every place where the logic is defined. Testing would help here, but as discussed above, testing (especially for deep OLAP-type queries) is intractable because of the combinatorial explosion. Worse, if you duplicate the logic, but tailor it to the specifics of the query, it becomes much harder to actually find other instances. There is a single, abstract concept of a relation that e.g. maps pallet IDs to payload masses sale counts, but the implementations are varied and can’t easily be identified. While the logic here isn’t too complex, there are enough degrees of freedom that, if the logic is duplicated, we will have drift. Ideally, the logic for the definition of “how heavy is this pallet?” and “is this container ready to be loaded?” should be defined once, and tested once, but used in many places. We can denormalize the computed properties: adding a and column to both the and tables. Whenever an event enters the system which affects these properties, they are recomputed. The logic can be implemented in one place, at the application layer (where it is easier to test). The costs of denormalization are well-known, but it boils down to: With denormalization, the individual queries are more testable, but now the system as a whole has to be tested, end to end, to ensure the key invariants are maintained. This is an approach I experimented with. I call it the “tree of views”. You write a view for each of these read-time properties, and then your queries can read from those views. It’s a tree because views can query other views, since logic builds upon logic (e.g. the logic for how well a product line is selling depends on the logic for how well each product is selling). The result is that each view is a very focused, very atomic piece of business logic, and the top-level queries can read from the views as if they were reading denormalized data, so they are usually very short. Concretely, for this case, you would write views like this: And so on. With the views having the following data dependencies: What you hope happens is that Postgres will recursively inline every view, merge them all together into a gigaquery, and shuffle the predicates up and down to maximally optimize the query. That was the conceit. And of course that never happens. The central problem is that views have to be written for the general case, and then you filter on the view’s output. Views can’t take parameters. And the optimizer is very conservative about pushing predicates down into the view. The output shows these massive sequential scans, where only a tiny fraction of the data is ever needed, meaning Postgres is materializing the view and then filtering on it. An analogous situation is if you’ve ever written a query with lots of very general CTEs, and with filtering at the end: These are often slow. Moving the predicates into the CTEs: Improves performance by forcing Postgres to do the filtering at earlier stages. But the fact that Postgres won’t push predicates into the CTEs on its own means CTEs and views are a minefield of pessimization, and there’s a performance upper bound to using them 1 . If the query planner were sufficiently smart , this wouldn’t be a problem. But the sufficiently smart query planner is always just one more heuristic away. Imagine a programming language without functions. You can only write code that operates on concrete values, i.e. variables or literals. So instead of writing a function and calling it anywhere you have to write these little code templates as comments and every time you want to “call” the “function” you copy the template and do a search/replace. This would be tiresome. But that’s what SQL is. The concrete values are the table names. The code is the queries. And the function templates you have to search replace are your business logic, which must be inlined everywhere it is used. This formulation suggests the solution: we need something like functions, for SQL. That is, we need a way to define composable query fragments with statically-typed interfaces. I’m calling these functors . The parameters to a functor are tables satisfying some interface, the return type is the return type of the body query. For example, this 2 : Declares a functor . The parameter is any table that has at least a column of type and a column of type text 3 . The functor’s return type is the type of the rows returned by the query. Table types form a subtyping relationship, so any table with a column of type can be passed as an argument. This is the same as to row polymorphism in TypeScript . The reason testing is hard is SQL queries depend on concrete tables. But functors can depend on interfaces instead. The business logic for “payload mass of a pallet” is implemented by this query: Can be parameterized like so: We can test this query against fake tables satisfying the interface, e.g.: While point to , for this test, we don’t need to create a container. We also don’t need to come up with a value for the pallet. If a value is causally independent of the query, it doesn’t need to be provided. Postgres also supports table literals , so there is actually a way to write things without doing a single whatever. Test data can be loaded into a table literal like so: The CTE has type , and therefore satisfies the interface. We have a functor that maps pallet IDs to their payload mass. The pallet clearance state depends on the pallet’s maximum payload mass, and the pallet’s actual payload mass, so we can implement it as a functor like so: Even though logic builds on logic, the functor doesn’t need to be aware of the functor. The results of the latter can just be passed in. This makes the functors more testable (since testing one functor won’t call another) but also keeps the interfaces small. Say we want to query the clearance state of a specific pallet. How do we write this? We can try this: This query macroexpands into: Which in turn expands into: This is not satisfactory. It has the same problem of views: we’re doing the filtering at the end, and relying on the query planner to push the predicate down as far as it will go. Anecdotally, Postgres is more aggressive about optimizing subqueries than views, but relying on query planner arcana does not inspire confidence. We want to be able to write queries that expand into what a talented DBA would write by hand. Can we do better? Yes. We can just do this: And that’s it. If we want to filter a table early, we just filter it early, and pass the result to the functor. The functor can be applied to any table that satisfies the interface, including CTEs or expressions. You can’t do this with native SQL, because SQL does not compose. The closest you could implement is copying the business logic query manually into a CTE, renaming the table references (and hoping you didn’t forget any), and now you have one more query duplicating business logic that has to be kept in sync with everything else. Note, also, that you can’t do the above with views. That is, you can’t define a CTE that filters a table early, and then pass that CTE to a view. We could keep going, and implement the rest of the functors for the logic of the logistics system. But these examples are enough to prove that functors solve the biggest pain points of SQL. We can write queries that are fast, testable, and which can be understood through entirely local reasoning. Could this be built? Yes. You could implement it as a compiler that takes a schema definition, and compiles functor-augmented SQL down to bare SQL by recursively macroexpanding the functors. And also does typechecking etc. The hardest part would be parsing and interpreting SQL syntax, which is a separate problem . Tangents, and brief sketches for extending the ideas in this post. What if we want to factor out this into a functor: We can’t do this: Because what would we put as the return type? But we could do something like this: Where is a generic table type and the type represents the set union between and the type . So has the return type: And with this, we can rewrite: With functors, SQL can be short, simple, and understandable, without sacrificing performance. One aspect of the logistics platform example is that the business logic for pallets and containers is the same, but at different levels: We can implement functors in a way that is generic for both kinds of object, and use SQL renaming to map column names. For example, this functor: Expresses the general concept of “map an object to the sum of the masses of its children”. This can work for pallets: And containers: Where is the result of joining to the functor that calculates their mass. You can also generalize this further by making the ID type generic, e.g.: Why functor? Well, the alternatives aren’t very good: “Functor” is one word and conveys the notion that it’s happening one level up from queries. Functors can specify the tables they depend on as parameters. A more interesting restriction is if functors can only query from tables explicitly listed as parameters. Why would this be useful? Because SQL tables are global variables . By vanishing global variables, we automatically make every query fully testable. It’s strange to me how bad query planners are, given how limited SQL is in terms of expressivity. SQL isn’t usefully Turing complete, but it’s Turing complete enough that the query planner has to be extremely conservative to preserve soundness. Which is the worst of both worlds.  ↩ I tried to keep the syntax in line with the SQL style, which means it is hideously verbose.  ↩ For brevity, I’m omitting declarations.  ↩ Motivation Testing Business Logic The Solution Functors Functors for Testing Functors for Business Logic Appendices Apendix: Generics Appendix: Generalizing Business Logic Appendix: Naming Appendix: Global Variables You can make every single FK in the database deferred, so it’s checked at the end of a transaction rather than during an insert, but that solves half the problem (the non-null columns still need to be populated) and requires updating every FK in the schema. You can make all your columns nullable, ruining the data model. You can write all your tables in sixth normal form, which is the same as making everything nullable. The state of an object is determined dynamically from the state of its constituents, e.g. your score on an exam is the sum of the questions you got right. Reporting features that require OLAP queries specifically, and which query properties of the data which are computed dynamically. Payload mass: the sum of the masses of all the boxes on the pallet. Clearance: if the pallet’s payload mass is less than the maximum, the pallet is cleared to be moved. Wet mass: the sum of the dry mass and the payload mass. Is this container cleared to load? How much does this container mass? What’s the total mass of this set of containers (e.g., those being loaded onto a ship)? Are we packing boxes efficiently? What’s the average percent utilization on our pallets? Which pallet should this box be packed into? Are any pallets in this container in excess of their maximum payload mass? Performance: queries should be fast. Testability: queries should be testable. Comprehensibility: queries should be readable and understandable through local reasoning. Reusability: logic should be defined once, tested once, and used in many places. We can duplicate the business logic across queries, or We can give up on normalization and cache the computed properties in response to events, or We can implement the business logic in views. There are now two definitions of the same concept: the declarative one and the imperative one. There is an implicit invariant: for every row in the table, the value of must equal the result of the declarative query on the normalized data model. Detecting violations of this invariant is both computationally expensive and requires building custom infrastructure. When writing a new mutation, you have to very carefully consider all the places in the database where denormalization is happening, to ensure the mutation doesn’t violate implicit invariants. Dually, when introducing denormalization, you have to consider all existing mutations to patch the ones that relate to the denormalized data. Bugs in the code require identifying all affected data (potentially impossible!) and running a data migration (incredibly tiresome). Finally, there is the cost of physical storage. While storage is cheap, IaaS providers love to charge extra for database disks, as if only the finest iron oxides are fit for your Postgres cluster. Both pallets and containers have a notion of a payload mass, which is the sum of the (wet) masses of their contents. Both have a notion of a maximum payload mass, and a boolean property that indicates being in excess of that mass. Both have a notion of a wet mass, which is the sum of their dry mass and the payload mass. “Function” is confusing because SQL already has functions . “Parameterized query” takes too long to say and is confusing because SQL queries can take scalar parameters . “Generic query” is too vague and also too many words. “Query component/transformer/operator” is too wordy and too vague. “Query template” sounds like C++, and de-emphasizes the static type-checking aspect. “Macro” sounds too untyped and stringly typed. dbt has macros, and they are stringly typed. It’s strange to me how bad query planners are, given how limited SQL is in terms of expressivity. SQL isn’t usefully Turing complete, but it’s Turing complete enough that the query planner has to be extremely conservative to preserve soundness. Which is the worst of both worlds.  ↩ I tried to keep the syntax in line with the SQL style, which means it is hideously verbose.  ↩ For brevity, I’m omitting declarations.  ↩

0 views