Posts in Json (20 found)

Self-hosting my photos with Immich

For every cloud service I use, I want to have a local copy of my data for backup purposes and independence. Unfortunately, the tool stopped working in March 2025 when Google restricted the OAuth scopes, so I needed an alternative for my existing Google Photos setup. In this post, I describe how I have set up Immich , a self-hostable photo manager. Here is the end result: a few (live) photos from NixCon 2025 : I am running Immich on my Ryzen 7 Mini PC (ASRock DeskMini X600) , which consumes less than 10 W of power in idle and has plenty of resources for VMs (64 GB RAM, 1 TB disk). You can read more about it in my blog post from July 2024: When I saw the first reviews of the ASRock DeskMini X600 barebone, I was immediately interested in building a home-lab hypervisor (VM host) with it. Apparently, the DeskMini X600 uses less than 10W of power but supports latest-generation AMD CPUs like the Ryzen 7 8700G! Read more → I installed Proxmox , an Open Source virtualization platform, to divide this mini server into VMs, but you could of course also install Immich directly on any server. I created a VM (named “photos”) with 500 GB of disk space, 4 CPU cores and 4 GB of RAM. For the initial import, you could assign more CPU and RAM, but for normal usage, that’s enough. I (declaratively) installed NixOS on that VM as described in this blog post: For one of my network storage PC builds, I was looking for an alternative to Flatcar Container Linux and tried out NixOS again (after an almost 10 year break). There are many ways to install NixOS, and in this article I will outline how I like to install NixOS on physical hardware or virtual machines: over the network and fully declaratively. Read more → Afterwards, I enabled Immich, with this exact configuration: At this point, Immich is available on , but not over the network, because NixOS enables a firewall by default. I could enable the option, but I actually want Immich to only be available via my Tailscale VPN, for which I don’t need to open firewall access — instead, I use to forward traffic to : Because I have Tailscale’s MagicDNS and TLS certificate provisioning enabled, that means I can now open https://photos.example.ts.net in my browser on my PC, laptop or phone. At first, I tried importing my photos using the official Immich CLI: Unfortunately, the upload was not running reliably and had to be restarted manually a few times after running into a timeout. Later I realized that this was because the Immich server runs background jobs like thumbnail creation, metadata extraction or face detection, and these background jobs slow down the upload to the extent that the upload can fail with a timeout. The other issue was that even after the upload was done, I realized that Google Takeout archives for Google Photos contain metadata in separate JSON files next to the original image files: Unfortunately, these files are not considered by . Luckily, there is a great third-party tool called immich-go , which solves both of these issues! It pauses background tasks before uploading and restarts them afterwards, which works much better, and it does its best to understand Google Takeout archives. I ran as follows and it worked beautifully: My main source of new photos is my phone, so I installed the Immich app on my iPhone, logged into my Immich server via its Tailscale URL and enabled automatic backup of new photos via the icon at the top right. I am not 100% sure whether these settings are correct, but it seems like camera photos generally go into Live Photos, and Recent should cover other files…?! If anyone knows, please send an explanation (or a link!) and I will update the article. I also strongly recommend to disable notifications for Immich, because otherwise you get notifications whenever it uploads images in the background. These notifications are not required for background upload to work, as an Immich developer confirmed on Reddit . Open Settings → Apps → Immich → Notifications and un-tick the permission checkbox: Immich’s documentation on backups contains some good recommendations. The Immich developers recommend backing up the entire contents of , which is on NixOS. The subdirectory contains SQL dumps, whereas the 3 directories , and contain all user-uploaded data. Hence, I have set up a systemd timer that runs to copy onto my PC, which is enrolled in a 3-2-1 backup scheme . Immich (currently?) does not contain photo editing features, so to rotate or crop an image, I download the image and use GIMP . To share images, I still upload them to Google Photos (depending on who I share them with). The two most promising options in the space of self-hosted image management tools seem to be Immich and Ente . I got the impression that Immich is more popular in my bubble, and Ente made the impression on me that its scope is far larger than what I am looking for: Ente is a service that provides a fully open source, end-to-end encrypted platform for you to store your data in the cloud without needing to trust the service provider. On top of this platform, we have built two apps so far: Ente Photos (an alternative to Apple and Google Photos) and Ente Auth (a 2FA alternative to the deprecated Authy). I don’t need an end-to-end encrypted platform. I already have encryption on the transit layer (Tailscale) and disk layer (LUKS), no need for more complexity. Immich is a delightful app! It’s very fast and generally seems to work well. The initial import is smooth, but only if you use the right tool. Ideally, the official could be improved. Or maybe could be made the official one. I think the auto backup is too hard to configure on an iPhone, so that could also be improved. But aside from these initial stumbling blocks, I have no complaints.

0 views
xenodium Yesterday

Rinku: CLI link previews

In my last Bending Emacs episode, I talked about overlays and used them to render link previews in an Emacs buffer. While the overlays merely render an image, the actual link preview image is generated by rinku , a tiny command line utility I built recently. leverages macOS APIs to do the actual heavy lifting, rendering/capturing a view off screen, and saving to disk. Similarly, it can fetch preview metadata, also saving the related thumbnail to disk. In both cases, outputs to JSON. By default, fetches metadata for you. In this instance, the image looks a little something like this: On the other hand, the flag generates a preview, very much like the ones you see in native macOS and iOS apps. Similarly, the preview renders as follows: While overlays is one way to integrate anywhere in Emacs, I had been meaning to look into what I can do for eshell in particular. Eshell is just another buffer , and while overlays could do the job, I wanted a shell-like experience. After all, I already knew we can echo images into an eshell buffer . Before getting to on , there's a related hack I'd been meaning to get to for some time… While we're all likely familiar with the cat command, I remember being a little surprised to find that offers an alternative elisp implementation. Surprised too? Go check it! Where am I going with this? Well, if eshell's command is an elisp implementation, we know its internals are up for grabs , so we can technically extend it to display images too. is just another function, so we can advice it to add image superpowers. I was pleasantly surprised at how little code was needed. It basically scans for image arguments to handle within advice and otherwise delegates to 's original implementation. And with that, we can see our freshly powered-up command in action: By now, you may wonder why the detour when the post was really about ? You see, this is Emacs, and everything compounds! We can now leverage our revamped command to give similar superpowers to , by merely adding an function. As we now know, outputs things to JSON, so we can use to parse the process output and subsequently feed the image path to . can also output link titles, so we can show that too whenever possible. With that, we can see the lot in action: While non-Emacs users are often puzzled by how frequently we bring user flows and integrations on to our beloved editor, once you learn a little elisp, you start realising how relatively easily things can integrate with one another and pretty much everything is up for grabs . Reckon and these tips will be useful to you? Enjoying this blog or my projects ? I am an 👉 indie dev 👈. Help make it sustainable by ✨ sponsoring ✨ Need a blog? I can help with that . Maybe buy my iOS apps too ;)

0 views
xenodium 6 days ago

WhatsApp from you know where

While there are plenty of messaging alternatives out there, for better or worse, WhatsApp remains a necessity for some of us. With that in mind, I looked for ways to bring WhatsApp messaging to the comfort of my beloved text editor. As mentioned in my initial findings , WhatsApp on Emacs is totally doable with the help of wuzapi and whatsmeow , which offer a huge leg up. Today, I introduce a super early version of Wasabi , a native Emacs interface for WhatsApp messaging. I wanted installation/setup to be as simple as possible. Ideally, you install a single Emacs package and off you go. While leveraging XMPP is rather appealing in reusing existing Emacs messaging packages, I felt setting up a WhatsApp gateway or related infrastructure to be somewhat at odds with 's simple installation goal. Having said that, wuzapi / whatsmeow offer a great middle ground. You install a single binary dependency, along with , and you're ready to go. This isn't too different from the git + magit combo. As of now, 's installation/setup boils down to two steps if you're on macOS: While you may try Homebrew on Linux, you're likely to prefer your native package manager. If that fails, building wuzapi from source is also an option. While runs as a RESTful API service + webhooks , I wanted to simplify the Emacs integration by using json-rpc over standard I/O, enabling us to leverage incoming notifications in place of . I floated the idea of adding json-rpc to wuzapi to 's author Nicolas, and to my delight, he was keen on it. He's now merged my initial proof of concept , and I followed up with a handful of additional patches (all merged now): With the latest Wasabi Emacs package and wuzapi binary, you now get the initial WhatsApp experience I've been working towards. At present, you can send/receive messages to/from 1:1 or group chats. You can also download/view images as well as videos. Viewing reactions is also supported. Needless to say, you may find some initial rough edges in addition to missing features. Having said that, I'd love to hear your feedback and experience. As mentioned is currently available on GitHub . I've now put in quite a bit of effort prototyping things, upstreaming changes to , and building the first iteration of wasabi . I gotta say, it feels great to be able to quickly message and catch up with different chats from the comfort of Emacs. Having said that, it's taken a lot of work to get here and will require plenty more to get to a polished and featureful experience. Since going full-time indie dev, I have the flexibility to work on projects of choice, but that's only to an extent. If I cannot make the project sustainable, I'll eventually move to work on something else that is. If you're keen on Wasabi 's offering, please consider sponsoring the effort , and please reach out to voice your interest ( Mastodon / Twitter / Reddit / Bluesky ). Reckon a WhatsApp Emacs client would help you stay focused at work (less time on your phone)? Ask your employer to sponsor it too ;-) Add JSON-RPC 2.0 stdio mode (via -mode=stdio) for communication Expose more HTTP endpoints as JSON-RPCs . Enable setting a custom data directory via -datadir=/path/to/data . Add Homebrew recipe/installation .

0 views
Armin Ronacher 1 weeks ago

LLM APIs are a Synchronization Problem

The more I work with large language models through provider-exposed APIs, the more I feel like we have built ourselves into quite an unfortunate API surface area. It might not actually be the right abstraction for what’s happening under the hood. The way I like to think about this problem now is that it’s actually a distributed state synchronization problem. At its core, a large language model takes text, tokenizes it into numbers, and feeds those tokens through a stack of matrix multiplications and attention layers on the GPU. Using a large set of fixed weights, it produces activations and predicts the next token. If it weren’t for temperature (randomization), you could think of it having the potential of being a much more deterministic system, at least in principle. As far as the core model is concerned, there’s no magical distinction between “user text” and “assistant text”—everything is just tokens. The only difference comes from special tokens and formatting that encode roles (system, user, assistant, tool), injected into the stream via the prompt template. You can look at the system prompt templates on Ollama for the different models to get an idea. Let’s ignore for a second which APIs already exist and just think about what usually happens in an agentic system. If I were to have my LLM run locally on the same machine, there is still state to be maintained, but that state is very local to me. You’d maintain the conversation history as tokens in RAM, and the model would keep a derived “working state” on the GPU—mainly the attention key/value cache built from those tokens. The weights themselves stay fixed; what changes per step are the activations and the KV cache. From a mental-model perspective, caching means “remember the computation you already did for a given prefix so you don’t have to redo it.” Internally, that usually means storing the attention KV cache for those prefix tokens on the server and letting you reuse it, not literally handing you raw GPU state. There are probably some subtleties to this that I’m missing, but I think this is a pretty good model to think about it. The moment you’re working with completion-style APIs such as OpenAI’s or Anthropic’s, abstractions are put in place that make things a little different from this very simple system. The first difference is that you’re not actually sending raw tokens around. The way the GPU looks at the conversation history and the way you look at it are on fundamentally different levels of abstraction. While you could count and manipulate tokens on one side of the equation, extra tokens are being injected into the stream that you can’t see. Some of those tokens come from converting the JSON message representation into the underlying input tokens fed into the machine. But you also have things like tool definitions, which are injected into the conversation in proprietary ways. Then there’s out-of-band information such as cache points. And beyond that, there are tokens you will never see. For instance, with reasoning models you often don’t see any real reasoning tokens, because some LLM providers try to hide as much as possible so that you can’t retrain your own models with their reasoning state. On the other hand, they might give you some other informational text so that you have something to show to the user. Model providers also love to hide search results and how those results were injected into the token stream. Instead, you only get an encrypted blob back that you need to send back to continue the conversation. All of a sudden, you need to take some information on your side and funnel it back to the server so that state can be reconciled on either end. In completion-style APIs, each new turn requires resending the entire prompt history. The size of each individual request grows linearly with the number of turns, but the cumulative amount of data sent over a long conversation grows quadratically because each linear-sized history is retransmitted at every step. This is one of the reasons long chat sessions feel increasingly expensive. On the server, the model’s attention cost over that sequence also grows quadratically in sequence length, which is why caching starts to matter. One of the ways OpenAI tried to address this problem was to introduce the Responses API, which maintains the conversational history on the server (at least in the version with the saved state flag). But now you’re in a bizarre situation where you’re fully dealing with state synchronization: there’s hidden state on the server and state on your side, but the API gives you very limited synchronization capabilities. To this point, it remains unclear to me how long you can actually continue that conversation. It’s also unclear what happens if there is state divergence or corruption. I’ve seen the Responses API get stuck in ways where I couldn’t recover it. It’s also unclear what happens if there’s a network partition, or if one side got the state update but the other didn’t. The Responses API with saved state is quite a bit harder to use, at least as it’s currently exposed. Obviously, for OpenAI it’s great because it allows them to hide more behind-the-scenes state that would otherwise have to be funneled through with every conversation message. Regardless of whether you’re using a completion-style API or the Responses API, the provider always has to inject additional context behind the scenes—prompt templates, role markers, system/tool definitions, sometimes even provider-side tool outputs—that never appears in your visible message list. Different providers handle this hidden context in different ways, and there’s no common standard for how it’s represented or synchronized. The underlying reality is much simpler than the message-based abstractions make it look: if you run an open-weights model yourself, you can drive it directly with token sequences and design APIs that are far cleaner than the JSON-message interfaces we’ve standardized around. The complexity gets even worse when you go through intermediaries like OpenRouter or SDKs like the Vercel AI SDK, which try to mask provider-specific differences but can’t fully unify the hidden state each provider maintains. In practice, the hardest part of unifying LLM APIs isn’t the user-visible messages—it’s that each provider manages its own partially hidden state in incompatible ways. It really comes down to how you pass this hidden state around in one form or another. I understand that from a model provider’s perspective, it’s nice to be able to hide things from the user. But synchronizing hidden state is tricky, and none of these APIs have been built with that mindset, as far as I can tell. Maybe it’s time to start thinking about what a state synchronization API would look like, rather than a message-based API. The more I work with these agents, the more I feel like I don’t actually need a unified message API. The core idea of it being message-based in its current form is itself an abstraction that might not survive the passage of time. There’s a whole ecosystem that has dealt with this kind of mess before: the local-first movement. Those folks spent a decade figuring out how to synchronize distributed state across clients and servers that don’t trust each other, drop offline, fork, merge, and heal. Peer-to-peer sync, and conflict-free replicated storage engines all exist because “shared state but with gaps and divergence” is a hard problem that nobody could solve with naive message passing. Their architectures explicitly separate canonical state, derived state, and transport mechanics — exactly the kind of separation missing from most LLM APIs today. Some of those ideas map surprisingly well to models: KV caches resemble derived state that could be checkpointed and resumed; prompt history is effectively an append-only log that could be synced incrementally instead of resent wholesale; provider-side invisible context behaves like a replicated document with hidden fields. At the same time though, if the remote state gets wiped because the remote site doesn’t want to hold it for that long, we would want to be in a situation where we can replay it entirely from scratch—which for instance the Responses API today does not allow. There’s been plenty of talk about unifying message-based APIs, especially in the wake of MCP (Model Context Protocol). But if we ever standardize anything, it should start from how these models actually behave, not from the surface conventions we’ve inherited. A good standard would acknowledge hidden state, synchronization boundaries, replay semantics, and failure modes — because those are real issues. There is always the risk that we rush to formalize the current abstractions and lock in their weaknesses and faults. I don’t know what the right abstraction looks like, but I’m increasingly doubtful that the status-quo solutions are the right fit.

0 views
Simon Willison 1 weeks ago

Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model

Hot on the heels of Tuesday's Gemini 3 Pro release, today it's Nano Banana Pro , also known as Gemini 3 Pro Image . I've had a few days of preview access and this is an astonishingly capable image generation model. As is often the case, the most useful low-level details can be found in the API documentation : Designed to tackle the most challenging workflows through advanced reasoning, it excels at complex, multi-turn creation and modification tasks. [...] These 14 images can include the following: Max Woolf published the definitive guide to prompting Nano Banana just a few days ago. I decided to try his example prompts against the new model, requesting results in 4K. Here's what I got for his first test prompt, using Google's AI Studio : Here's what I got: The result came out as a 24.1MB, 5632 × 3072 pixel PNG file. I don't want to serve that on my own blog so here's a Google Drive link for the original . Then I ran his follow-up prompt: I'll note that it did put the plate-sized cookie on a regular plate. Here's the 24.9MB PNG . The new model isn't cheap. Here's the API pricing : it's 24 cents for a 4K image and 13.4 cents for a 1K or 2K image. Image inputs are 0.11 cents (just over 1/10th of a cent) each - an earlier version of their pricing page incorrectly said 6.7 cents each but that's now been fixed. Unlike most of Google's other models it also isn't available for free via AI Studio: you have to configure an API key with billing in order to use the model there. So this thing is great at following instructions. How about rendering text? I tried this prompt, this time using the Gemini consumer app in "thinking" mode (which now uses Nano Banana Pro for image generation). Here's a share link - my prompt was: Infographic explaining how the Datasette open source project works This is a great opportunity to test its ability to run searches (aka "Grounding with Google Search"). Here's what it created based on that 9 word prompt: That's a pretty extraordinary result for such a simple prompt. The text is all spelled correctly and rendered without glitches. The content is solid too - it even included logos for the most popular publish platforms, and a tiny thumbnail of the Datasette UI which is close-enough for an infographic. "Data Ingestion (Read-Only)" is a bit off. Everything else looks great to me. Here's the thinking trace for that image generation process: Initiating Datasette Infographic I'm starting a detailed infographic about the Datasette project. I plan to use Google Search to gather specific information about its core processes, which I will then visualize in a multi-panel diagram. Developing the Datasette workflow I am visualizing the Datasette workflow as described in the search results. I'm focusing on the conversion of data from different formats into a SQLite database, which is then presented as a web application complete with a user interface and JSON output. Visualizing Datasette Workflow I've sketched out the initial structure for the infographic, focusing on a flowchart style. The key components include data input from diverse sources, conversion to SQLite, the Datasette core application, its main features (web UI, JSON API, SQL querying), ecosystem plugins, and finally, deployment options. I aim for clear icons and labels to represent each step. Now I will focus on refining the flow and clarifying the relationships between the parts. Defining Core Datasette Stages I've sketched out the initial infographic structure. I have identified the key components, focusing on visual representation of the data flow and the key benefits and features of the system. I have made updates to the flowchart with clear icons and labels. Now I'm working to expand the data input section and refine the visual representation of each step. Constructing Datasette Infographic I am translating the current Datasette workflow outline into visual elements. The input section will show icons for different data formats. I've sketched the data conversion stage with clear illustrations of the necessary tools. I will add a depiction of the Datasette core application, showcasing its interfaces and functions. The plugin ecosystem and deployment options will follow. Now, I am refining the visual representation. Analyzing Datasette's Architecture I'm currently working on representing the Datasette open-source project's architecture visually. I've broken it down into key components: raw data processing, SQLite conversion, core application functionalities, plugin integration, and deployment methods. I'm focusing on creating a clear, linear representation to show the workflow. Mapping the Datasette Flow I've crafted a comprehensive graphic to illustrate Datasette's operation. It spans the data ingestion, SQLite conversion, main application features, plugin interactions, and deployment stages, aiming for clarity and a logical flow. This note from the announcement post caught my eye: We believe it's critical to know when an image is AI-generated. This is why all media generated by Google's tools are embedded with our imperceptible SynthID digital watermark. Today, we are putting a powerful verification tool directly in consumers' hands: you can now upload an image into the Gemini app and simply ask if it was generated by Google AI, thanks to SynthID technology. We are starting with images, but will expand to audio and video soon. Last night I used Nano Banana Pro to generate a fake photograph of raccoons stealing our food delivery, then scrubbed out the little diamond icon using the Apple Photos "cleanup" tool. I uploaded that Gemini app and asked "Was this image created with AI?": It replied: Yes, it appears that all or part of this image was created with Google Al. SynthID detected a watermark in 25-50% of the image. Presumably that 25-50% figure is because the rest of the photo was taken by me - it was just the raccoons that were added by Nano Banana Pro. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . High-resolution output : Built-in generation capabilities for 1K, 2K, and 4K visuals. Advanced text rendering : Capable of generating legible, stylized text for infographics, menus, diagrams, and marketing assets. Grounding with Google Search : The model can use Google Search as a tool to verify facts and generate imagery based on real-time data (e.g., current weather maps, stock charts, recent events). Thinking mode : The model utilizes a "thinking" process to reason through complex prompts. It generates interim "thought images" (visible in the backend but not charged) to refine the composition before producing the final high-quality output. Up to 14 reference images : You can now mix up to 14 reference images to produce the final image. Up to 6 images of objects with high-fidelity to include in the final image Up to 5 images of humans to maintain character consistency

0 views
Filippo Valsorda 1 weeks ago

The 2025 Go Cryptography State of the Union

This past August, I delivered my traditional Go Cryptography State of the Union talk at GopherCon US 2025 in New York. It goes into everything that happened at the intersection of Go and cryptography over the last year. You can watch the video (with manually edited subtitles, for my fellow subtitles enjoyers) or read the transcript below (for my fellow videos not-enjoyers). The annotated transcript below was made with Simon Willison’s tool . All pictures were taken around Rome, the Italian contryside, and the skies of the Northeastern United States. Welcome to my annual performance review. We are going to talk about all of the stuff that we did in the Go cryptography world during the past year. When I say "we," it doesn't mean just me, it means me, Roland Shoemaker, Daniel McCarney, Nicola Morino, Damien Neil, and many, many others, both from the Go team and from the Go community that contribute to the cryptography libraries all the time. I used to do this work at Google, and I now do it as an independent as part of and leading Geomys , but we'll talk about that later. When we talk about the Go cryptography standard libraries, we talk about all of those packages that you use to build secure applications. That's what we make them for. We do it to provide you with encryption and hashes and protocols like TLS and SSH, to help you build secure applications . The main headlines of the past year: We shipped post quantum key exchanges, which is something that you will not have to think about and will just be solved for you. We have solved FIPS 140, which some of you will not care about at all and some of you will be very happy about. And the thing I'm most proud of: we did all of this while keeping an excellent security track record, year after year. This is an update to something you've seen last year. The Go Security Track Record It's the list of vulnerabilities in the Go cryptography packages. We don't assign a severity—because it's really hard, instead they're graded on the "Filippo's unhappiness score." It goes shrug, oof, and ouch. Time goes from bottom to top, and you can see how as time goes by things have been getting better. People report more things, but they're generally more often shrugs than oofs and there haven't been ouches. More specifically, we haven't had any oof since 2023. We didn't have any Go-specific oof since 2021. When I say Go-specific, I mean: well, sometimes the protocol is broken, and as much as we want to also be ahead of that by limiting complexity, you know, sometimes there's nothing you can do about that. And we haven't had ouches since 2019 . I'm very happy about that. But if this sounds a little informal, I'm also happy to report that we had the first security audit by a professional firm. Trail of Bits looked at all of the nuts and bolts of the Go cryptography standard library: primitives, ciphers, hashes, assembly implementations. They didn't look at the protocols, which is a lot more code on top of that, but they did look at all of the foundational stuff. And I'm happy to say that they found nothing . Two of a kind t-shirts, for me and Roland Shoemaker. It is easy though to maintain a good security track record if you never add anything, so let's talk about the code we did add instead. First of all, post-quantum key exchanges. We talked about post-quantum last year, but as a very quick refresher: Now, we focused on post-quantum key exchange because the key exchange defends against the most urgent risk, which is that somebody might be recording connections today, keeping them saved on some storage for the next 5-50 years and then use the future quantum computers to decrypt those sessions. I'm happy to report that we now have ML-KEM, which is the post-quantum key exchange algorithm selected by the NIST competition, an international competition run in the open. You can use it directly from the crypto/mlkem standard library package starting in Go 1.24, but you're probably not gonna do that. Instead, you're probably going to just use crypto/tls, which by default now uses a hybrid of X25519 and ML-KEM-768 for all connections with other systems that support it. Why hybrid? Because this is new cryptography. So we are still a little worried that somebody might break it. There was one that looked very good and had very small ciphertext, and we were all like, “yes, yes, that's good, that's good.” And then somebody broke it on a laptop. It was very annoying. We're fairly confident in lattices. We think this is the good one. But still, we are taking both the old stuff and the new stuff, hashing them together, and unless you have both a quantum computer to break the old stuff and a mathematician who broke the new stuff, you're not breaking the connection. crypto/tls can now negotiate that with Chrome and can negotiate that with other Go 1.24+ applications. Not only that, we also removed any choice you had in ordering of key exchanges because we think we know better than you and— that didn't come out right, uh. … because we assume that you actually want us to make those kind of decisions, so as long as you don't turn it off, we will default to post-quantum. You can still turn it off. But as long as you don't turn it off, we'll default to the post-quantum stuff to keep your connection safe from the future. Same stuff with x/crypto/ssh. Starting in v0.38.0. SSH does the same thing, they just put X25519 and ML-KEM-768 in a different order, which you would think doesn't matter—and indeed it doesn't matter—but there are rules where "no, no, no, you have to put that one first." And the other rule says "no, you have to put that one first." It's been a whole thing. I'm tired. OpenSSH supports it, so if you connect to a recent enough version of OpenSSH, that connection is post-quantum and you didn't have to do anything except update. Okay, but you said key exchanges and digital signatures are broken. What about the latter? Well, key exchanges are urgent because of the record-now-decrypt-later problem, but unless the physicists that are developing quantum computers also develop a time machine, they can't use the QC to go back in time and use a fake signature today. So if you're verifying a signature today, I promise you it's not forged by a quantum computer. We have a lot more time to figure out post-quantum digital signatures. But if we can, why should we not start now? Well, it's different. Key exchange, we knew what hit we had to take. You have to do a key exchange, you have to do it when you start the connection, and ML-KEM is the algorithm we have, so we're gonna use it. Signatures, we developed a lot of protocols like TLS, SSH, back when it was a lot cheaper to put signatures on the wire. When you connect to a website right now, you get five signatures. We can't send you five 2KB blobs every time you connect to a website. So we are waiting to give time to protocols to evolve, to redesign things with the new trade-offs in mind of signatures not being cheap. We are kind of slow rolling intentionally the digital signature side because it's both not as urgent and not as ready to deploy. We can't do the same “ta-da, it's solved for you” show because signatures are much harder to roll out. Let's talk about another thing that I had mentioned last year, which is FIPS 140. FIPS 140 is a US government regulation for how to do cryptography. It is a list of algorithms, but it's not just a list of algorithms. It's also a list of rules that the modules have to follow. What is a module? Well, a module used to be a thing you would rack. All the rules are based on the idea that it's a thing you can rack. Then the auditor can ask “what is the module’s boundary?” And you're like, “this shiny metal box over here." And, you know, that works. When people ask those questions of libraries, though, I do get a little mad every time. Like, what are the data input ports of your library? Ports. Okay. Anyway, it's an interesting thing to work with. To comply with FIPS 140 in Go, up to now, you had to use an unsupported GOEXPERIMENT, which would replace all of the Go cryptography standard library, all of the stuff I'm excited about, with the BoringCrypto module, which is a FIPS 140 module developed by the BoringSSL folks. We love the BoringSSL folks, but that means using cgo, and we do not love cgo. It has memory safety issues, it makes cross-compilation difficult, it’s not very fast. Moreover, the list of algorithms and platforms of BoringCrypto is tailored to the needs of BoringSSL and not to the needs of the Go community, and their development cycle doesn't match our development cycle: we don't decide when that module gets validated. Speaking of memory safety, I lied a little. Trail of Bits did find one vulnerability. They found it in Go+BoringCrypto, which was yet another reason to try to push away from it. Instead, we've got now the FIPS 140-3 Go Cryptographic Module. Not only is it native Go, it's actually just a different name for the internal Go packages that all the regular Go cryptography package use for the FIPS 140 algorithms. We just moved them into their own little bubble so that when they ask us “what is the module boundary” we can point at those packages. Then there's a runtime mode which enables some of the self-tests and slow stuff that you need for compliance. It also tells crypto/tls not to negotiate stuff that's not FIPS, but aside from that, it doesn't change any observable behavior. We managed to keep everything working exactly the same: you don't import a different package, you don't do anything different, your applications just keep working the same way. We're very happy about that. Finally, you can at compile time select a GOFIPS140 frozen module, which is just a zip file of the source of the module as it was back when we submitted it for validation, which is a compliance requirement sometimes. By the way, that means we have to be forward compatible with future versions of Go, even for internal packages, which was a little spicy. You can read more in the upstream FIPS 140-3 docs . You might be surprised to find out that using a FIPS 140 algorithm from a FIPS 140 module is not actually enough to be FIPS 140 compliant The FIPS 140 module also has to be tested for that specific algorithm. What we did is we just tested them all, so you can use any FIPS 140 algorithm without worrying about whether it's tested in our module. When I say we tested them all, I mean that some of them we tested with four different names. NIST calls HKDF alternatively SP 800-56C two-step KDF, SP 800-133 Section 6.3 CKG, SP 800-108 Feedback KDF, and Implementation Guidance D.P OneStepNoCounter KDF (you don't wanna know). It has four different names for the same thing. We just tested it four times, it's on the certificate, you can use it whatever way you want and it will be compliant. But that's not enough. Even if you use a FIFS 140 algorithm from a FIPS 140 module that was tested for the algorithm it's still not enough because it has to run on a platform that was tested as part of the validation. So we tested on a lot of platforms. Some of them were paid for by various Fortune 100s that had an interest in them getting tested, but some of them had no sponsors. We really wanted to solve this problem for everyone, once and for all, so Geomys just paid for all the FreeBSD, macOS, even Windows testing so that we could say “run it on whatever and it's probably going to be compliant.” (Don't quote me on that.) How did we test on that many machines? Well, you know, we have this sophisticated data center… Um, no. No, no. I got a bunch of stuff shipped to my place. That's my NAS now. It's an Ampere Altra Q64-22, sixty-four arm64 cores, and yep, it's my NAS. Then I tested it on, you know, this sophisticated arm64 macOS testing platform. And then on the Windows one, which is my girlfriend's laptop. And then the arm one, which was my router. Apparently I own an EdgeRouter now? It's sitting in the data center which is totally not my kitchen. It was all a very serious and regimented thing, and all of it is actually recorded, in recorded sessions with the accredited laboratories, so all this is now on file with the US government. You might or might not be surprised to hear that the easiest way to meet the FIPS 140 requirements is not to exceed them. That's annoying and a problem of FIPS 140 in general: if you do what everybody else does, which is just clearing the bar, nobody will ask questions, so there’s a strong temptation to lower security in FIPS 140 mode. We just refused to accept that. Instead, we figured out complex stratagems. For example, for randomness, the safest thing to do is to just take randomness from the kernel every time you need it. The kernel knows if a virtual machine was just cloned and we don't, so we risk generating the same random bytes twice. But NIST will not allow that. You need to follow a bunch of standards for how the randomness is generated, and the kernel doesn’t. So what we do is we do everything that NIST asks and then every time you ask for randomness, we squirrel off, go to the kernel, get a little piece of extra entropy, stir it into the pot before giving back the result, and give back the result. It's still NIST compliant because it's as strong as both the NIST and the kernel solution, but it took some significant effort to show it is compliant. We did the same for ECDSA. ECDSA is a digital signature mechanism. We've talked about it a few other times. It's just a way to take a message and a private key and generate a signature, here (s, r) . To make a signature, you also need a random number, and that number must be used only once with the same private key. You cannot reuse it. That number is k here. Why can you not reuse it? Because if you reuse it, then you can do this fun algebra thing and then pop the private key falls out by just smashing two signatures together. Bad, really, really bad. How do we generate this number that must never be the same? Well, one option is we make it random. But what if your random number generator breaks and generates twice the same random number? That would leak the private key, and that would be bad. So the community came up with deterministic ECDSA . Instead of generating the nonce at random, we are going to hash the message and the private key. This is still actually a little risky though, because if there's a fault in the CPU , for example, or a bug, because for example you're taking the wrong inputs , you might still end up generating the same value but signing a slightly different message. How do we mitigate both of those? We do both. We take some randomness and the private key and the message, we hash them all together, and now it's really, really hard for the number to come out the same. That's called hedged ECDSA. The Go crypto library has been doing hedged ECDSA from way before it was called hedged and way before I was on the team . Except… random ECDSA has always been FIPS. Deterministic ECDSA has been FIPS since a couple years ago. Hedged ECDSA is technically not FIPS. We really didn't want to make our ECDSA package less secure, so we found a forgotten draft that specifies a hedged ECDSA scheme, and we proceeded to argue that actually if you read SP 800-90A Revision 1 very carefully you realize that if you claim that the private key is just the DRBG entropy plus two-thirds of the DRBG nonce, you are allowed to use it because of SP 800-57 Part 1, etc etc etc . We basically just figured out a way to claim it was fine and the lab eventually said "okay, shut up." I'm very proud of that one. If you want to read more about this, check out the announcement blog post . If you know you need commercial services for FIPS 140, here’s Geomys FIPS 140 commercial services page . If you don't know if you need them, you actually probably don't. It's fine, the standard library will probably solve this for you now. Okay, but who cares about this FIPS 140 stuff? "Dude, we've been talking about FIPS 140 for 10 minutes and I don't care about that." Well, I care because I spent my last year on it and that apparently made me the top committer for the cycle to the Go repo and that's mostly FIPS 140 stuff. I don't know how to feel about that. There have been actually a lot of positive side effects from the FIPS 140 effort. We took care to make sure that everything that we found we would leave in a better state. For example, there are new packages that moved from x/crypto into the standard library: crypto/hkdf, crypto/pbkdf, crypto/sha3. SHA-3 is faster and doesn't allocate anymore. HKDF has a new generic API which lets you pass in a function that returns either a concrete type that implements Hash or a function that returns a Hash interface, which otherwise was a little annoying. (You had to make a little closure.) I like it. We restructured crypto/aes and crypto/cipher and in the process merged a contribution from a community member that made AES-CTR, the counter mode, between 2 and 9 times faster. That was a pretty good result. The assembly interfaces are much more consistent now. Finally, we finished cleaning up crypto/rsa. If you remember from last year, we made the crypto/rsa sign and verify operations not use math/big and use constant time code. Now we also made key generation, validation, and pre-computation all not use math/big. That loading keys that were serialized to JSON a lot faster, and made key generation much faster. But how much faster? Benchmarking key generation is really hard because it's a random process: you take a number random number and you check, is it prime? No. Toss. Is it prime? Nope. Toss. Is it prime? You keep doing this. If you're lucky, it’s very fast. If you are unlucky, very slow. It’s a geometric distribution and if you want to average it out, you have to run for hours. Instead, I figured out a new way by mathematically deriving the average number of pulls you are supposed to do and preparing a synthetic run that gives exactly the expected mean number of checks, so that we get a representative sample to benchmark deterministically . That was a lot of fun. Moreover, we detect more broken keys, and we did a rare backwards compatibility break to stop supporting keys smaller than 1024 bits. 1024 is already pretty small, you should be using 2048 minimum, but if you're using less than 1024, it can be broken on the proverbial laptop. It's kind of silly that a production library lets you do something so insecure, and you can't tell them apart just by looking at the code. You have to know what the size of the key is. So we just took that out. I expected people to yell at me. Nobody yelled at me. Good job community. Aside from adding stuff, you know that we are very into testing and that testing is how we keep that security track record that we talked about. I have one bug in particular that is my white whale. (You might say, "Filippo, well-adjusted people don't have white whales." Well, we learned nothing new, have we?) My white whale is this assembly bug that we found at Cloudflare before I joined the Go team. I spent an afternoon figuring out an exploit for it with Sean Devlin in Paris, while the yellow jackets set fire to cop cars outside. That's a different story. It's an assembly bug where the carry—literally the carry like when you do a pen and paper multiplication—was just not accounted for correctly. You can watch my talk Squeezing a Key through a Carry Bit if you are curious to learn more about it. The problem with this stuff is that it's so hard to get code coverage for it because all the code always runs. It's just that you don't know if it always runs with that carry at zero, and if the carry was one, it’d do the wrong math. I think we've cracked it, by using mutation testing. We have a framework that tells the assembler, "hey, anywhere you see an add-with-carry, replace it with a simple add that discards the carry." Then we run the tests. If the tests still pass, the test did not cover that carry. If that happens we fail a meta-test and tell whoever's sending the CL, “hey, no, no, no, you gotta test that.” Same for checking the case in which the carry is always set. We replace the add-with-carry with a simple add and then insert a +1. It's a little tricky. If you want to read more about it, it's in this blog post . I'm very hopeful that will help us with all this assembly stuff. Next, accumulated test vectors . This is a little trick that I'm very very fond of. Say you want to test a very large space. For example there are two inputs and they can both be 0 to 200 bytes long, and you want to test all the size combinations. That would be a lot of test vectors, right? If I checked in a megabyte of test vectors every time I wanted to do that, people eventually would yell at me. Instead what we do is run the algorithm with each size combination, and take the result and we put it inside a rolling hash. Then at the end we take the hash result and we check that it comes out right. We do this with two implementations. If it comes out to the same hash, great. If it comes out not to the same hash, it doesn't help you figure out what the bug is, but it tells you there's a bug. I'll take it. We really like reusing other people's tests. We're lazy. The BoringSSL people have a fantastic suite of tests for TLS called BoGo and Daniel has been doing fantastic work integrating that and making crypto/tls stricter and stricter in the process. It's now much more spec compliant on the little things where it goes like, “no, no, no, you're not allowed to put a zero here” and so on. Then, the Let's Encrypt people have a test tool for the ACME protocol called Pebble. (Because it's a small version of their production system called Boulder! It took me a long time to figure it out and eventually I was like ooooohhh.) Finally, NIST has this X.509 interoperability test suite, which just doesn't have a good name. It's good though. More assembly cleanups. There used to be places in assembly where—as if assembly was not complicated enough—instructions were just written down as raw machine code. Sometimes even the comment was wrong! Can you tell the comment changed in that patch? This is a thing Roland and Joel found. Now there's a test that will just yell at you if you try to commit a or instruction. We also removed all the assembly that was specifically there for speeding up stuff on CPUs that don't have AVX2. AVX2 came out in 2015 and if you want to go fast, you're probably not using the CPU generation from back then. We still run on it, just not as fast. More landings! I’m going to speed through these ones. This is all stuff that we talked about last year and that we actually landed. Stuff like data independent timing to tell the CPU, "no, no, I actually did mean for you to do that in constant time, goddammit." And server-side TLS Encrypted Client Hello, which is a privacy improvement. We had client side, now we have server side. crypto/rand.Read never fails. We promised that, we did that. Now, do you know how hard it is to test the failure case of something that never fails? I had to re-implement the seccomp library to tell the kernel to break the getrandom syscall to check what happens when it doesn’t work. There are tests all pointing guns at each other to make sure the fallback both works and is never hit unexpectedly. It's also much faster now because Jason Donenfeld added the Linux getrandom VDSO. Sean Liao added rand.Text like we promised. Then more stuff like hash.Cloner , which I think makes a lot of things a little easier, and more and more and more and more. The Go 1.24 and Go 1.25 release notes are there for you. x/crypto/ssh is also under our maintenance and some excellent stuff happened there, too. Better tests, better error messages, better compatibility, and we're working on some v2 APIs . If you have opinions, it’s time to come to those issues to talk about them! It’s been an exciting year, and I'm going to give you just two samples of things we're planning to do for the next year. One is TLS profiles. Approximately no one wants to specifically configure the fifteen different knobs of a TLS library. Approximately no one—because I know there are some people who do and they yell at me regularly. But instead most people just want "hey, make it broadly compatible." "Hey, make it FIPS compliant." "Hey, make it modern." We're looking for a way to make it easy to just say what your goal is, and then we do all the configuration for you in a way that makes sense and that evolves with time. I'm excited about this one. And maybe something with passkeys? If you run websites that authenticate users a bunch with password hashes and maybe also with WebAuthN, find me, email us, we want feedback. We want to figure out what to build here, into the standard library. Alright, so it's been a year of cryptography, but it's also been a year of Geomys. Geomys launched a year ago here at GopherCon. If you want an update, we went on the Fallthrough podcast to talk about it , so check that out. We are now a real company and how you know is that we have totes: it's the equivalent of a Facebook-official relationship. The best FIPS 140 side effect has been that we have a new maintainer. Daniel McCarney joined us to help with the FIPS effort and then we were working very well together so Geomys decided to just take him on as a permanent maintainer on the Go crypto maintenance team. I’m very excited about that. This is all possible thanks to our clients, and if you have any questions, here are the links. You might also want to follow me on Bluesky at @filippo.abyssdomain.expert or on Mastodon at @[email protected] . My work is made possible by Geomys , an organization of professional Go maintainers, which is funded by Smallstep , Ava Labs , Teleport , Tailscale , and Sentry . Through our retainer contracts they ensure the sustainability and reliability of our open source maintenance work and get a direct line to my expertise and that of the other Geomys maintainers. (Learn more in the Geomys announcement .) Here are a few words from some of them! Teleport — For the past five years, attacks and compromises have been shifting from traditional malware and security breaches to identifying and compromising valid user accounts and credentials with social engineering, credential theft, or phishing. Teleport Identity is designed to eliminate weak access patterns through access monitoring, minimize attack surface with access requests, and purge unused permissions via mandatory access reviews. Ava Labs — We at Ava Labs , maintainer of AvalancheGo (the most widely used client for interacting with the Avalanche Network ), believe the sustainable maintenance and development of open source cryptographic protocols is critical to the broad adoption of blockchain technology. We are proud to support this necessary and impactful work through our ongoing sponsorship of Filippo and his team. Post-quantum cryptography is about the future. We are worried about quantum computers that might exist… 5-50 (it's a hell of a range) years from now, and that might break all of asymmetrical encryption. (Digital signatures and key exchanges.) Post-quantum cryptography runs on classical computers. It's cryptography that we can do now that resists future quantum computers. Post-quantum cryptography is fast, actually. If you were convinced that for some reason it was slow, that's a common misconception. However, post-quantum cryptography is large. Which means that we have to send a lot more bytes on the wire to get the same results.

0 views
The Coder Cafe 1 weeks ago

Build Your Own Key-Value Storage Engine—Week 2

Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It’s hosted by ScyllaDB, the monstrously fast and scalable database. Agenda Week 0: Introduction Week 1: In-Memory Store Week 2: LSM Tree Foundations Before delving into this week’s tasks, it’s important to understand what you will implement. This week, you will implement a basic log-structured merge-tree (LSM tree). At its core, an LSM tree is a data structure that prioritizes write efficiency by trading off some read complexity. It buffers writes in memory and uses append-only files on disk, then rewrites data during compaction. It consists of two main components: A mutable in-memory data structure called a memtable, used to store recent writes. A set of immutable SSTables (Sorted String Table) stored on disk. Regularly, the current memtable is snapshotted, its entries are sorted by key, and a new immutable SSTable file is written. In addition, a MANIFEST file is an append-only list of SSTable filenames. It tells the engine which SSTable files exist and in which order to read them, newest to oldest. Why LSM trees shine for write-heavy workloads: Fast writes with sequential I/O: New updates are buffered in memory (memtable) and later written sequentially to disk during a flush (SSTable), which is faster than the random I/O patterns common with B-trees, for example. Decouples writes from read optimization: Writes complete against the memtable, while compaction work runs later (you will tackle that in a future week). Space and long-term efficiency: Compaction processes remove dead data and merge many small files into larger sorted files, which keeps space usage in check and sustains read performance over time. For the memtable, you will start with a hashtable. In a future week, you will learn why a hashtable is not the most efficient data structure for an LSM tree, but it is a simple starting point. For the SSTables, you will use JSON as the data format. Get comfortable with a JSON parser if you are not already. 💬 If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server ( channel): Join the Discord This week’s implementation is single-threaded. You will revisit that assumption later. Implement a hashtable to store requests (create or update). You can probably reuse a lot of code from Week 1. When your memtable contains 2,000 entries: Flush the memtable as a new immutable JSON SSTable file with keys sorted. The SSTable file is a JSON array of objects, each with two fields, and . Keys are unique within a file. For example, if your memtable contains the following entries: You need to create the following SSTable: Use a counter for the filename prefix, for example , , . After writing the new SSTable, append its filename to the MANIFEST (append only), then clear the memtable: For now, the flush is a stop-the-world operation. While the file is being written, do not serve reads or writes. You will revisit that later. Create an empty file if it doesn’t exist. Derive the next SSTable ID from the MANIFEST so you don't reuse the same filename. Check the memtable: If found, return the corresponding value. If not found, read the MANIFEST to list SSTable filenames: Scan SSTables from newest to oldest (for example , then , then ). Use a simple linear scan inside each file for now. Stop at the first hit and return the corresponding value. If still not found, return . There are no changes to the client you built in week 1. Run it against the same file ( put.txt ) to validate that your changes are correct. Keep a small LRU cache of known-absent keys (negative cache) between the memtable and SSTables. This avoids repeated disk scans for hot misses: after the first miss, subsequent lookups are O(1). Implementation details are up to you. Instead of parsing the MANIFEST file for each request, you can cache the content in-memory. That’s it for this week! You have built the first version of an LSM tree: a memtable in memory, SSTable files written by regular flushes, and a MANIFEST that lists those SSTables. For now, durability isn’t guaranteed. Data already flushed to SSTables will be read after a restart, but anything still in the memtable during a crash is lost. In two weeks, you will make sure that any request acknowledged to a client remains in your storage engine, even after a restart. The flush trigger you used was pretty simple: once the memtable contains 2,000 entries. In real systems, flushes can be triggered by various factors, for example: Some databases flush when the memtable reaches a target size in bytes, ensuring predictable memory usage. A flush can also occur after a period of time has passed. This occurs because the database eventually needs to release commit log segments. For tables with very low write activity, this can sometimes lead to data resurrection scenarios. Here’s an old issue from the ScyllaDB codebase that illustrates this behavior. Regarding the model, this series assumes a simple key–value one: every PUT stores the whole value, so a GET just finds the newest entry and returns it. If you need a richer model (e.g., rows with many fields or collections), writes are often partial (patches) rather than full replacements. Therefore, reads must reconstruct the result by scanning newest to oldest and merging changes until all required fields are found or a full-write record is encountered. Last but not least, in this series, you implicitly rely on client-side ordering: the validation client issues requests sequentially. Production KV databases typically attach a sequence number or a logical timestamp to each write to handle out-of-order arrivals, merging, and reconciling results. Pure wall-clock timestamps are convenient but brittle; see Kyle Kingsbury’s notes on clock pitfalls for a deeper dive. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. The Log-Structured Merge-Tree (LSM-Tree) // The original LSM tree whitepaper. Log Structured Merge Tree - ScyllaDB // LSM tree definition from ScyllaDB technical glossary . ❤️ If you enjoyed this post, please hit the like button. Week 0: Introduction Week 1: In-Memory Store Week 2: LSM Tree Foundations A mutable in-memory data structure called a memtable, used to store recent writes. A set of immutable SSTables (Sorted String Table) stored on disk. Fast writes with sequential I/O: New updates are buffered in memory (memtable) and later written sequentially to disk during a flush (SSTable), which is faster than the random I/O patterns common with B-trees, for example. Decouples writes from read optimization: Writes complete against the memtable, while compaction work runs later (you will tackle that in a future week). Space and long-term efficiency: Compaction processes remove dead data and merge many small files into larger sorted files, which keeps space usage in check and sustains read performance over time. This week’s implementation is single-threaded. You will revisit that assumption later. Flush the memtable as a new immutable JSON SSTable file with keys sorted. The SSTable file is a JSON array of objects, each with two fields, and . Keys are unique within a file. For example, if your memtable contains the following entries: You need to create the following SSTable: Use a counter for the filename prefix, for example , , . After writing the new SSTable, append its filename to the MANIFEST (append only), then clear the memtable: Create an empty file if it doesn’t exist. Derive the next SSTable ID from the MANIFEST so you don't reuse the same filename. Check the memtable: If found, return the corresponding value. If not found, read the MANIFEST to list SSTable filenames: Scan SSTables from newest to oldest (for example , then , then ). Use a simple linear scan inside each file for now. Stop at the first hit and return the corresponding value. If still not found, return . Some databases flush when the memtable reaches a target size in bytes, ensuring predictable memory usage. A flush can also occur after a period of time has passed. This occurs because the database eventually needs to release commit log segments. For tables with very low write activity, this can sometimes lead to data resurrection scenarios. Here’s an old issue from the ScyllaDB codebase that illustrates this behavior. The Log-Structured Merge-Tree (LSM-Tree) // The original LSM tree whitepaper. Log Structured Merge Tree - ScyllaDB // LSM tree definition from ScyllaDB technical glossary .

0 views
W. Jason Gilmore 1 weeks ago

Toggling Between Fullscreen Editor and Terminal In VSCode and Cursor

November 18, 2025: This article was originally published on June 4, 2024 and later updated for clarity after I returned to it and couldn't figure out which file to modify! I've been a Vim user for decades however a few years ago I switched to VS Code and then subsequently Cursor for my web development work. When building modern web apps you'll spend almost as much time running shell commands as coding, so I need to have a terminal within easy reach at all times. In fact I typically keep several terminal tabs open, including one opened to the local MySQL instance, one running a worker, and one to execute various shell commands including those related to managing my Git repository. I want this transition between editor and terminal to be as seamless as possible and so I setup two keyboard shortcuts to help me quickly move back and forth between the two. Furthermore, the transition will always open the terminal in fullscreen mode so I'm not fighting with screen real estate on a laptop. To configure these shortcuts, open the keyboard shortcuts ( ) file in JSON mode and add the following entries: I've used for the toggling shortcut, however you can switch this to whatever you'd like. If you're running Windows I suppose you would change the shortcut to or something like that. Once defined, save the changes and then try using the keyboard shortcut to switch between the two. With the terminal maximized your VS Code enviroment will look like this: If you have any other VS Code screen optimization tips, I'd love to hear about them! Hit me up on Twitter at @wjgilmore .

0 views
Jim Nielsen 2 weeks ago

Data Storage As Files on Disk Paired With an LLM

I recently added a bunch of app icons from macOS Tahoe to my collection . Afterwards, I realized some of them were missing relational metadata. For example, I have a collection of iMove icons through the years which are related in my collection by their App Store ID. However, the latest iMovie icon I added didn’t have this ID. This got me thinking, "Crap, I really want this metadata so I can see apps over time . Am I gonna have to go back through each icon I just posted and find their associated App Store ID?” Then I thought: “Hey, I bet AI could figure this out — right? It should be able to read through my collection of icons (which are stored as JSON files on disk), look for icons with the same name and developer, and see where I'm missing and .” So I formulated a prompt (in hindsight, a really poor one lol): look through all the files in and find any that start with and then find me any icons like iMovie that have a correlation to other icons in where it's missing and But AI did pretty good with that. I’ll save you the entire output, but Cursor thought for a bit, then asked to run this command: I was like, “Ok. I couldn’t write that myself, but that looks about right. Go ahead.” It ran the command, thought some more, then asked to run another command. Then another. It seemed unsatisfied with the results, so it changed course and wrote a node script and asked permission to run that. I looked at it and said, “Hey that’s probably how I would’ve approached this.” So I gave permission. It ran the script, thought a little, then rewrote it and asked permission to run again. Here’s the final version it ran: And with that, boom! It found a few newly-added icons with corollaries in my archive, pointed them out, then asked if I wanted to add the missing metadata. The beautiful part was I said “go ahead” and when it finished, I could see and review the staged changes in git. This let me double check the LLM’s findings with my existing collection to verify everything looked right — just to make sure there were no hallucinations. Turns out, storing all my icon data as JSON files on disk (rather than a database) wasn’t such a bad idea. Part of the reason I’ve never switched from static JSON files on disk to a database is because I always figured it would be easier for future me to find and work with files on disk (as opposed to learning how to setup, maintain, and query a database). Turns out that wasn’t such a bad bet. I’m sure AI could’ve helped me write some SQL queries to do all the stuff I did here. But what I did instead already fit within a workflow I understand: files on disk, modified with scripting, reviewed with git, checked in, and pushed to prod. So hey, storing data as JSON files in git doesn’t look like such a bad idea now, does it future Jim? Reply via: Email · Mastodon · Bluesky

0 views
xenodium 2 weeks ago

Want a WhatsApp Emacs client? Will you fund it?

Like it or not, WhatsApp is a necessity for some of us. I wish it weren't the case, but here we are. Given the circumstances, I wish I could use WhatsApp a little more on my terms. And by that, I mean from an Emacs client, of course. Surely I'm not the only one who feels this way, right? Right?! Fortunately, I'm not alone . With that in mind, I've been hard at work prototyping, exploring what's feasible. Spoiler alert: it's totally possible, though will require a fair bit of work. Thankfully, two wonderful projects offer a huge leg up: wuzapi and whatsmeow . wuzapi offers a REST API on top of whatsmeow , a Go library leveraging WhatsApp's multi-device web API. Last week, I prototyped sending a WhatsApp message using 's API. I got there fairly quickly by onboarding myself on to using its web interface and wiring shell-maker to send an HTTP message request via . While these two were enough for a quick demo, they won't cut it for a polished Emacs experience. While I can make REST work, I would like a simpler integration under the hood. REST is fine for outgoing messages, but then I need to integrate webhooks for incoming events. No biggie, can be done, but now I have to deal with two local services opening a couple of ports. Can we simplify a little? Yes we can. You may have seen me talk about agent-shell , my Emacs package implementing Agent Client Protocol (ACP) … Why is this relevant, you may ask? Well, after building a native Emacs implementation, I learned a bit about json-rpc over standard I/O. The simplicity here is that we can bring bidirectional communication to an Emacs-owned process. No need for multiple channels handling incoming vs outgoing messages. So where's this all going? I've been prototyping some patches on top of wuzapi to expose over standard I/O (as an alternative to ). This prototype goes far beyond my initial experiment with sending messages, and yet the Emacs integration is considerably simpler, not to mention looking very promising. Here's a demo showing incoming WhatsApp messages, received via , all through a single Emacs-owned process. Look ma, no ports! These early prototypes are encouraging, but we've only scratched the surface. Before you can send and receive messages, you need to onboard users to the WhatsApp Emacs client. That is, you need to create a user, manage/connect to a session, authorize via a QR code, and more. You'll want this flow to be realiable and that's just onboarding. From there, you'll need to manage contacts, chats, multiple message types, incoming notifications… the list goes on. That's just the Emacs side. As mentioned, I've also been patching . My plan is to upstream these changes , rather than maintaining a fork. I've prototyped quite a few things now, including the onboarding experience with QR code scanning. At this point, I feel fairly optimistic about feasibility, which is all pretty exciting! But there's a bunch of work needed. Since going full-time indie dev, I have the time available (for now), but it's hard to justify this effort without aiming for some level of sustainability. If you're interested in making this a reality, please consider sponsoring the effort , and please reach out to voice your interest ( Mastodon / Twitter / Reddit / Bluesky ). Reckon a WhatsApp Emacs client would help you stay focused at work (less time on your phone)? Ask your employer to sponsor it too ;-)

0 views
<antirez> 2 weeks ago

Scaling HNSWs

I’m taking a few weeks of pause on my HNSWs developments (now working on some other data structure, news soon). At this point, the new type I added to Redis is stable and complete enough, it’s the perfect moment to reason about what I learned about HNSWs, and turn it into a blog post. That kind of brain dump that was so common pre-AI era, and now has become, maybe, a bit more rare. Well, after almost one year of thinking and implementing HNSWs and vector similarity stuff, it is time for some writing. However this is not going to be an intro on HNSWs: too many are present already. This is the “extra mile” instead. If you know HNSWs, I want to share with you my more “advanced” findings, especially in the context of making them fast enough to allow for a “Redis” experience: you know, Redis is designed for low latency and high performance, and HNSWs are kinda resistant to that, so there were challenges to expose HNSWs as an abstract data structure. This blog post will be split into several sections. Think of them as pages of the same book, different chapters of the same experience. Oh and, by the way, I already wrote and subsequently lost this blog post :D [long, sad story about MacOS and bad habits – I hadn’t lost something like that since the 90s, during blackouts], so here most of the problem will be to recall what I wrote a few days ago and, while I’m at it, to better rephrase what I didn’t like very much. ## A few words about the state of HNSW Before digging into the HNSWs internals and optimizations, I want to say a few things about HNSWs. The original paper introducing HNSWs is a great piece of computer science literature, and HNSWs are amazing data structures, but: I don’t believe they are the last word for searching, in a greedy way, for nearby vectors according to a distance function. The paper gives the feeling it lacks some “pieces”, almost like if the researchers, given six months more, had a lot more to explore and say. For instance, I modified the paper myself, extending it in order to support removal of entries, actual removals, not just tombstone deletions where the element is marked as gone and collected later: deleting items is totally missing from the paper. Similarly, there are, right now, efforts in order to really check if the “H” in the HNSWs is really needed, and if instead a flat data structure with just one layer would perform more or less the same (I hope I’ll cover more about this in the future: my feeling is that the truth is in the middle, and that it makes sense to modify the level selection function to just have levels greater than a given threshold). All this to say that, if you are into data structures research, I believe that a great area is to imagine evolutions and refinements of HNSWs, without getting trapped within the idea that the evolutions are only in the sense of: let’s do it, but for disk (see Microsoft efforts), or the like. Ok, enough with the premise, let’s go to the actual low level stuff :) ## Scaling memory Redis is an in-memory system, and both HNSWs and vectors have the unfortunate quality of being very space-hungry. There are three reasons for this: 1. HNSWs have a lot of pointers, like 16, 32 or more pointers (this is a tunable parameter of HNSWs) to neighbor nodes. 2. HNSWs have many levels, being a skiplist-alike data structure. This exacerbates the first problem. 3. HNSW’s satellite data is a vector of floating point numbers, so, in the vanilla case, 4 bytes per component, and normally you can have 300-3000 components, this is the usual range. So, what are the lessons learned here? There are folks that compress pointers, since it is very likely that many pointers (8 bytes in 64 bit systems) will have the highest four bytes all the same. This is smart, I didn’t implement it yet, because in Redis I need to go fast, and this is a tradeoff between space and time: but maybe it is worth it, maybe not. I’ll dig more. However, if you do the math, the fact that there are many layers is not *so* terrible as it looks. On average, the multiple layers per node make the situation worse by just ~1.3x (if the probability of level increase is 0.25 in the level selection function), since many nodes will be just at layer 0. But still 1.3 is more than 1, and if that “H” in HNSWs really is not *so* useful… [Spoiler, what I found is that the seek time if you have everything at layer 0 is greater, the main loop for the greedy search will start from less optimal places and it will eventually reach the right cluster, but will take more computation time. However this is just early results.] So here the *real* low hanging fruit is: vector quantization. What I found is that if you use 8 bit quantization what you get is an almost 4x speedup, a 4x reduction of your vectors (but not a 4x reduction of the whole node: the pointers are still there, and they take a lot of space), and a recall that is virtually the same in real world use cases. This is the reason why Redis Vector Sets use 8 bit quantization by default. You can specify, via VADD options, that you want full precision vectors or binary quantized vectors, where we just take the sign, but I’m skeptical about using both full size vectors and binary quantized vectors. Before talking about them, let’s see what kind of quantization I used for 8 bit. What I do is to compute the maximum absolute value of the component of each vector (so quantization is per-vector), then I use signed 8 bit values to represent the quant from -127 to 127. This is not as good as storing both min and max value, but it is faster when computing cosine similarity, since I can do this: /* Each vector is quantized from [-max_abs, +max_abs] to [-127, 127] * where range = 2*max_abs. */ const float scale_product = (range_a/127) * (range_b/127); Then I multiply things together in the integer domain with (actually in the code the main loop is unrolled and uses multiple accumulators, to make modern CPUs more busy) for (; i And finally we can return back to the floating point distance with: float dotf = dot0 * scale_product; Check the vectors_distance_q8() for more information, but I believe you got the idea: it is very simple to go from the integer quants domain to the unquantized dotproduct with trivial operations. So, 8 bit quantization is a great deal, and full precision was a *needed* feature, because there will be people doing things with vectors generated in a way where each small amount makes a difference (no, with learned vectors this is not the case…) but, why binary quantization? Because I wanted users to have a simple way to not waste space when their *original* information is already binary. Imagine you have a set of users and they have yes/no properties, and you want to find similar users, items, whatever. Well: this is where binary quantization should be used, it’s just, again, an option of the VADD command. ## Scaling speed: threading and locality Oh, you know, I have to tell you something about myself: I’m not a fan of threaded systems when it is possible to do a lot with a single core, and then use multiple cores in a shared-nothing architecture. But HNSWs are different. They are *slow*, and they are accessed almost always in read-only ways, at least in most use cases. For this reason, my Vector Sets implementation is fully threaded. Not just reads, even writes are partially threaded, and you may wonder how this is possible without it resulting in a mess, especially in a system like Redis, where keys can be accessed in different ways by the background saving process, the clients, and so forth. Well, to start, let’s focus on reads. What happens is that as long as nobody is writing in the data structure, we can spawn threads that do the greedy collection of near vectors and return back the results to the blocked client. However, my implementation of HNSWs was written from scratch, I mean, from the empty C file opened with vim, it has 0% of shared code with the two implementations most other systems use, so there are a few “novelties”. One of such different things is that in order to avoid re-visiting already visited nodes, I use an integer stored in each node that is called “epoch”, instead of using another data structure to mark (like, in a hash table) nodes already visited. This is quite slow, I believe. The epoch instead is local to the node, and the global data structure increments the epoch for each search. So in the context of each search, we are sure that we can find epochs that are just But with threads, there are multiple searches occurring at the same time! And, yep, what I needed was an array of epochs: typedef struct hnswNode { uint32_t level; /* Node's maximum level */ … many other stuff … uint64_t visited_epoch[HNSW_MAX_THREADS]; } That’s what you can read in hnsw.h. This is, again, a space-time tradeoff, and again time won against space. So, how was it possible to have threaded writes? The trick is that in HNSW inserts, a lot of time is spent looking for neighbors candidates. So writes are split into a reading-half and commit-half, only the second needs a write lock, and there are a few tricks to make sure that the candidates we accumulated during the first part are discarded if the HNSW changed in the meantime, and some nodes may no longer be valid. There is, however, another problem. What about the user deleting the key, while background threads are working on the value? For this scenario, we have a function that waits for background operations to return before actually reclaiming the object. With these tricks, it is easy to get 50k ops/sec on real world vector workloads, and these are numbers I got from redis-benchmark itself, with all the overhead involved. The raw numbers of the flat HNSW library itself are much higher. ## Scaling memory: reclaiming it properly Before talking about how to scale HNSWs into big use cases with multiple instances involved, and why Redis Vector Sets expose the actual data structure in the face of the user (I believe programmers are smart and don’t need babysitting, but it’s not *just* that), I want to go back and talk again about memory, because there is an interesting story to tell about this specific aspect. Most HNSWs implementations are not able to reclaim memory directly when you delete a node from the graph. I believe there are two main reasons for that: 1. People misunderstand the original HNSW paper in a specific way: they believe links can be NOT reciprocal among neighbors. And there is a specific reason why they think so. 2. The paper does not say anything about deletion of nodes and how to fix the graph after nodes go away and we get missing links in the “web” of connections. The first problem is a combination (I believe) of lack of clarity in the paper and the fact that, while implementing HNSWs, people face a specific problem: when inserting a new node, and good neighbors are searched among existing nodes, often the candidates already have the maximum number of outgoing links. What to do, in this case? The issue is often resolved by linking unidirectionally from the new node we are inserting to the candidates that are already “full” of outgoing links. However, when you need to delete a node, you can no longer resolve all its incoming links, so you can’t really reclaim memory. You mark it as deleted with a flag, and later sometimes there is some rebuilding of the graph to “garbage collect” stale nodes, sometimes memory is just leaked. So, to start, my implementation in Redis does things differently by forcing links to be bidirectional. If A links to B, B links to A. But, how to do so, given that A may be busy? Well, this gets into complicated territory but what happens is that heuristics are used in order to drop links from existing nodes, with other neighbors that are well connected, and if our node is a better candidate even for the target node, and if this is not true there are other ways to force a new node to have at least a minimal number of links, always trying to satisfy the small world property of the graph. This way, when Redis deletes a node from a Vector Set, it always has a way to remove all the pointers to it. However, what to do with the remaining nodes that now are missing a link? What I do is to create a distance matrix among them, in order to try to link the old node neighbors among them, trying to minimize the average distance. Basically for each pair of i,j nodes in our matrix, we calculate how good is their connection (how similar their vectors are) and how badly linking them affects the *remaining* possible pairs (since there could be elements left without good pairs, if we link two specific nodes). After we build this matrix of scores, we then proceed with a greedy pairing step. This works so well that you can build a large HNSW with millions of elements, later delete 95% of all your elements, and the remaining graph still has good recall and no isolated nodes and so forth. That is what I mean when I say that there is space in HNSWs for new papers to continue the work. ## Scaling HNSWs to multiple processes When I started to work at Redis Vector Sets, there was already a vector similarity implementation in Redis-land, specifically as an index type of RediSearch, and this is how most people think at HNSWs: a form of indexing of existing data. Yet I wanted to provide Redis with a new HNSW implementation exposed in a completely different way. Guess how? As a data structure, of course. And this tells a story about how Redis-shaped is my head after so many years, or maybe it was Redis-shaped since the start, and it is Redis that is shaped after my head, since I immediately envisioned how to design a Redis data structure that exposed HNSWs to the users, directly, and I was puzzled that the work with vectors in Redis was not performed exactly like that. At the same time, when I handed my design document to my colleagues at Redis, I can’t say that they immediately “saw” it as an obvious thing. My reasoning was: vectors are like scores in Redis Sorted Sets, except they are not scalar scores where you have a total order. Yet you can VADD, VREM, elements, and then you can call VSIM instead of ZRANGE in order to have *similar* elements. This made sense not just as an API, but I thought of HNSWs as strongly composable, and not linked to a specific use case (not specific to text embeddings, or image embeddings, or even *learned* embeddings necessarily). You do: VADD my_vector_set VALUES [… components …] my_element_string So whatever is in your components, Redis doesn't care, when you call VSIM it will report similar elements. But this also means that, if you have different vectors about the same use case split in different instances / keys, you can ask VSIM for the same query vector into all the instances, and add the WITHSCORES option (that returns the cosine distance) and merge the results client-side, and you have magically scaled your hundred of millions of vectors into multiple instances, splitting your dataset N times [One interesting thing about such a use case is that you can query the N instances in parallel using multiplexing, if your client library is smart enough]. Another very notable thing about HNSWs exposed in this raw way, is that you can finally scale writes very easily. Just hash your element modulo N, and target the resulting Redis key/instance. Multiple instances can absorb the (slow, but still fast for HNSW standards) writes at the same time, parallelizing an otherwise very slow process. This way of exposing HNSWs also scales down in a very significant way: sometimes you want an HNSW for each user / item / product / whatever you are working with. This is very hard to model if you have an index on top of something, but it is trivial if your HNSWs are data structures. You just can have a Vector Set key for each of your items, with just a handful of elements. And of course, like with any other Redis key, you can set an expiration time on the key, so that it will be removed automatically later. All this can be condensed into a rule that I believe should be more present in our industry: many programmers are smart, and if instead of creating a magic system they have no access to, you show them the data structure, the tradeoffs, they can build more things, and model their use cases in specific ways. And your system will be simpler, too. ## Scaling loading times If I don’t use threading, my HNSW library can add word2vec (300 components for each vector) into an HNSW at 5000 elements/second if I use a single thread, and can query the resulting HNSW at 90k queries per second. As you can see there is a large gap. This means that loading back an HNSW with many millions of elements from a Redis dump file into memory would take a lot of time. And this time would impact replication as well. Not great. But, this is true only if we add elements from the disk to the memory in the most trivial way, that is storing “element,vector” on disk and then trying to rebuild the HNSW in memory. There is another lesson to learn here. When you use HNSWs, you need to serialize the nodes and the neighbors as they are, so you can rebuild everything in memory just allocating stuff and turning neighbors IDs into pointers. This resulted in a 100x speedup. But do you really believe the story ends here? Hehe. Recently Redis has stronger security features and avoids doing bad things even when the RDB file is corrupted by an attacker. So what I needed to do was to make sure the HNSW is valid after loading, regardless of the errors and corruption in the serialized data structure. This involved many tricks, but I want to take the freedom to just dump one comment I wrote here, as I believe the reciprocal check is particularly cool: /* Second pass: fix pointers of all the neighbors links. * As we scan and fix the links, we also compute the accumulator * register "reciprocal", that is used in order to guarantee that all * the links are reciprocal. * * This is how it works, we hash (using a strong hash function) the * following key for each link that we see from A to B (or vice versa): * * hash(salt || A || B || link-level) * * We always sort A and B, so the same link from A to B and from B to A * will hash the same. Then we xor the result into the 128 bit accumulator. * If each link has its own backlink, the accumulator is guaranteed to * be zero at the end. * * Collisions are extremely unlikely to happen, and an external attacker * can't easily control the hash function output, since the salt is * unknown, and also there would be to control the pointers. * * This algorithm is O(1) for each node so it is basically free for * us, as we scan the list of nodes, and runs on constant and very * small memory. */ ## Scaling use cases: JSON filters I remember the day when the first working implementation of Vector Sets felt complete. Everything worked as expected and it was the starting point to start with the refinements and the extra features. However in the past weeks and months I internally received the feedback that most use cases need some form of mixed search: you want near vectors to a given query vector (like most similar movies to something) but also with some kind of filtering (only released between 2000 and 2010). My feeling is that you need to query for different parameters less often than product people believe, and that most of the time you can obtain this more efficiently by adding, in this specific case, each year to a different vector set key (this is another instance of the composability of HNSWs expressed as data structures versus a kind of index). However I was thinking about the main loop of the HNSW greedy search, that is something like this: // Simplified HNSW greedy search algorithm. Don’t trust it too much. while(candidates.len() > 0) { c = candidates.pop_nearest(query); worst_distance = results.get_worst_dist(query); if (distance(query,c) > worst_distance) break; foreach (neighbor from c) { if (neighbor.already_visited()) continue; neighbor.mark_as_visited(); if (results.has_space() OR neighbor.distance(query) candidates.add(neighbor); results.add(neighbor); } } } return results; So I started to play with the idea of adding a JSON set of metadata for each node. What if, once I have things like {“year”: 1999}, this was enough to filter while I perform the greedy search? Sure, the search needed to be bound, but there is a key insight here: I want, to start, elements that are *near* to the query vector, so I don’t really need to explore the whole graph if the condition on the JSON attributes is not satisfied by many nodes. I’ll let the user specify the effort, and anyway very far away results that match the filter are useless. So that’s yet another way how my HNSW differs: it supports filtering by expressions similar to the ones you could write inside an “if” statement of a programming language. And your elements in the Vector Set can be associated with JSON blobs, expressing their properties. Then you can do things like: VSIM movies VALUES … your vector components here… FILTER '.year >= 1980 and .year ## A few words on memory usage HNSW’s fatal issue is — in theory — that they are normally served from memory. Actually, you can implement HNSWs on disk, even if there are better data structures from the point of view of disk access latencies. However, in the specific case of Redis and Vector Sets the idea is to provide something that is very fast, easy to work with: the flexibility of in-memory data structures help with that. So the question boils down to: is the memory usage really so bad? Loading the 3 million Word2Vec entries into Redis with the default int8 quantization takes 3GB of RAM, 1kb for each entry. Many use cases have just a few tens of million of entries, or a lot less. And what you get back from HNSWs, if well implemented, and in memory, is very good performance, which is crucial in a data structure and in a workload that is in itself slow by definition. In my MacBook I get 48k ops per second with redis-benchmark and VSIM against this key (holding the word2vec dataset). My feeling is that the memory usage of in-memory HNSWs is very acceptable for many use cases. And even in the use cases where you want the bulk of your vectors on disk, even if there is to pay for slower performance, your hot set should likely be served from RAM. This is one of the reasons why I believe that, to be active in HNSW research is a good idea: I don’t think they will be replaced anytime soon for most use cases. It seems more likely that we will continue to have different data structures that are ideal for RAM and for disk depending on the use cases and data size. Moreover, what I saw recently, even just scanning the Hacker News front page, is people with a few millions of items fighting with systems that are slower or more complicated than needed. HNSWs and carefully exposing them in the right way can avoid all that. ## Conclusions I like HNSWs, and working and implementing them was a real pleasure. I believe vectors are a great fit for Redis, even in an AI-less world (for instance, a few months ago I used them in order to fingerprint Hacker News users, replicating an old work published on HN in the past). HNSWs are simply too cool and powerful for a number of use cases, and with AI, and learned embeddings, all this escalates to a myriad of potential use cases. However, like most features in Redis, I expect that a lot of time will pass before people realize they are useful and powerful and how to use them (no, it’s not just a matter of RAG). This happened also with Streams: finally there is mass adoption, after so many years. If instead you are more interested in HNSW and the implementation I wrote, I believe the code is quite accessible, and heavily commented: https://github.com/redis/redis/blob/unstable/modules/vector-sets/hnsw.c If you want to learn more about Redis Vector Sets, please feel free to read the README file I wrote myself. There is also the official Redis documentation, but I suggest you start from here: https://github.com/redis/redis/tree/unstable/modules/vector-sets Thanks for reading such a long blog post! And have a nice day. References. This is the paper about the "H" in HNSW and how useful it is -> https://arxiv.org/abs/2412.01940 Comments

0 views
Shayon Mukherjee 2 weeks ago

A hypothetical search engine on S3 with Tantivy and warm cache on NVMe

I’ve been curious about how far you can push object storage as a foundation for database-like systems. In previous posts, I explored moving JSON data from PostgreSQL to Parquet on S3 and building MVCC-style tables with constant-time deletes using S3’s conditional writes. These experiments showed that decoupling storage from compute unlocks interesting trade-offs while lowering costs and simpler operations in exchange for higher cold query latency. Search engines traditionally don’t fit this model.

0 views
Simon Willison 3 weeks ago

Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican

OpenAI partially released a new model yesterday called GPT-5-Codex-Mini, which they describe as "a more compact and cost-efficient version of GPT-5-Codex". It's currently only available via their Codex CLI tool and VS Code extension, with proper API access " coming soon ". I decided to use Codex to reverse engineer the Codex CLI tool and give me the ability to prompt the new model directly. I made a video talking through my progress and demonstrating the final results. OpenAI clearly don't intend for people to access this model directly just yet. It's available exclusively through Codex CLI which is a privileged application - it gets to access a special backend API endpoint that's not publicly documented, and it uses a special authentication mechanism that bills usage directly to the user's existing ChatGPT account. I figured reverse-engineering that API directly would be somewhat impolite. But... Codex CLI is an open source project released under an Apache 2.0 license. How about upgrading that to let me run my own prompts through its existing API mechanisms instead? This felt like a somewhat absurd loophole, and I couldn't resist trying it out and seeing what happened. The openai/codex repository contains the source code for the Codex CLI tool, which OpenAI rewrote in Rust just a few months ago. I don't know much Rust at all. I made my own clone on GitHub and checked it out locally: Then I fired up Codex itself (in dangerous mode, because I like living dangerously): And ran this prompt: Figure out how to build the rust version of this tool and then build it This worked. It churned away for a bit and figured out how to build itself. This is a useful starting point for a project like this - in figuring out the compile step the coding agent gets seeded with a little bit of relevant information about the project, and if it can compile that means it can later partially test the code it is writing while it works. Once the compile had succeeded I fed it the design for the new feature I wanted: Add a new sub-command to the Rust tool called "codex prompt" codex prompt "prompt goes here" - this runs the given prompt directly against the OpenAI API that Codex uses, with the same code path and authentication credentials as the rest of Codex. codex prompt -m <model_name> "prompt goes here" - same again but lets you set the model codex prompt "prompt goes here" -s/--system "system prompt goes here" - runs with a custom system/developer message codex prompt --models - this lists all available models that can be used with the "codex prompt" command My design for this new sub-command is lifted directly from my own llm command . Codex got to work. I've shared the full transcript here (using the tool I described here ), but this TODO list it made itself is a useful summary of what it decided to do: I like that it figured out the justfile in the repo and decided to use it to run formatting and linting commands without me needing to tell it to. I tried running the first version of the code it wrote like this: ... and it didn't quite work. I got this: Note that it DID think about SVG creation, but then decided it should look at the current directory. This isn't what I want - it appeared to be running in Codex's normal mode with a system prompt telling it to edit files on disk. I wanted it to respond to the prompt without acting as if it had a full workspace available to it. I prompted it again, pasting in the broken output: this did not seem to work - here is what happened - note that the reasoning summary comes out on a lot of lines when I would rather it streams out to stderr on the same line (flushing constantly) - and then despite the reasoning summary thinking about pelican on SVG it did not actually act on that and display the result, which is very surprising. [pasted content] It churned for a bit more, but from the output it looked like it was still trying to include Codex-specific tools and workspace information. So I said: Note that this new 'codex prompt' command should NOT be able to execute any tools at all, and should not have to think about the workspace or sandbox - it should send a single prompt with no tools and then stream and display a response and then stop. This got me a new build, but when I ran it: My guess is that OpenAI's custom backend API for Codex doesn't like it if you attempt to call it without passing in the expected instructions. Fair enough! Rather than paste that error in I got Codex to try the command itself, so it could see the error: You try running this and see what happens: ./target/debug/codex prompt 'Generate an SVG of a pelican riding a bicycle' Codex switched back to the default instructions. From its reply: now forces a tool-free request: we keep the workspace defaults, inject a lightweight system/developer message (“don’t run tools”), disable apply_patch/web search via config overrides, and build the outbound prompt as just that system span plus the user text. I guess that will have to do. I asked for one more feature: Add a --debug option to the codex prompt command which causes the full JSON request and response to be printed to stderr, plus the URL that is being accessed and the HTTP verb ... and we're ready to try this thing out! Notably I haven't written a single line of Rust myself here and paid almost no attention to what it was actually doing. My main contribution was to run the binary every now and then to see if it was doing what I needed yet. I've pushed the working code to a prompt-subcommand branch in my repo if you want to take a look and see how it all works. With the final version of the code built, I drew some pelicans. Here's the full terminal transcript , but here are some highlights. This is with the default GPT-5-Codex model: I pasted it into my tools.simonwillison.net/svg-render tool and got the following: I ran it again for GPT-5: And now the moment of truth... GPT-5 Codex Mini! I don't think I'll be adding that one to my SVG drawing toolkit any time soon. I had Codex add a option to help me see exactly what was going on. The output starts like this: This reveals that OpenAI's private API endpoint for Codex CLI is . Also interesting is how the key (truncated above, full copy here ) contains the default instructions, without which the API appears not to work - but it also shows that you can send a message with in advance of your user prompt. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . This is a little bit cheeky Codex CLI is written in Rust Iterating on the code Let's draw some pelicans Bonus: the --debug option

0 views
Simon Willison 3 weeks ago

Video + notes on upgrading a Datasette plugin for the latest 1.0 alpha, with help from uv and OpenAI Codex CLI

I'm upgrading various plugins for compatibility with the new Datasette 1.0a20 alpha release and I decided to record a video of the process. This post accompanies that video with detailed additional notes. I picked a very simple plugin to illustrate the upgrade process (possibly too simple). datasette-checkbox adds just one feature to Datasette: if you are viewing a table with boolean columns (detected as integer columns with names like or or ) and your current user has permission to update rows in that table it adds an inline checkbox UI that looks like this: I built the first version with the help of Claude back in August 2024 - details in this issue comment . Most of the implementation is JavaScript that makes calls to Datasette 1.0's JSON write API . The Python code just checks that the user has the necessary permissions before including the extra JavaScript. The first step in upgrading any plugin is to run its tests against the latest Datasette version. Thankfully makes it easy to run code in scratch virtual environments that include the different code versions you want to test against. I have a test utility called (for "test against development Datasette") which I use for that purpose. I can run it in any plugin directory like this: And it will run the existing plugin tests against whatever version of Datasette I have checked out in my directory. You can see the full implementation of (and its friend described below) in this TIL - the basic version looks like this: I started by running in the directory, and got my first failure... but it wasn't due to permissions, it was because the for the plugin was pinned to a specific mismatched version of Datasette: I fixed this problem by swapping to and ran the tests again... and they passed! Which was a problem because I was expecting permission-related failures. It turns out when I first wrote the plugin I was lazy with the tests - they weren't actually confirming that the table page loaded without errors. I needed to actually run the code myself to see the expected bug. First I created myself a demo database using sqlite-utils create-table : Then I ran it with Datasette against the plugin's code like so: Sure enough, visiting produced a 500 error about the missing method. The next step was to update the test to also trigger this error: And now fails as expected. It this point I could have manually fixed the plugin itself - which would likely have been faster given the small size of the fix - but instead I demonstrated a bash one-liner I've been using to apply these kinds of changes automatically: runs OpenAI Codex in non-interactive mode - it will loop until it has finished the prompt you give it. I tell it to consult the subset of the Datasette upgrade documentation that talks about Datasette permissions and then get the command to pass its tests. This is an example of what I call designing agentic loops - I gave Codex the tools it needed ( ) and a clear goal and let it get to work on my behalf. The remainder of the video covers finishing up the work - testing the fix manually, commiting my work using: Then shipping a 0.1a4 release to PyPI using the pattern described in this TIL . Finally, I demonstrated that the shipped plugin worked in a fresh environment using like this: Executing this command installs and runs a fresh Datasette instance with a fresh copy of the new alpha plugin ( ). It's a neat way of confirming that freshly released software works as expected. This video was shot in a single take using Descript , with no rehearsal and perilously little preparation in advance. I recorded through my AirPods and applied the "Studio Sound" filter to clean up the audio. I pasted in a closing slide from my previous video and exported it locally at 1080p, then uploaded it to YouTube. Something I learned from the Software Carpentry instructor training course is that making mistakes in front of an audience is actively helpful - it helps them see a realistic version of how software development works and they can learn from watching you recover. I see this as a great excuse for not editing out all of my mistakes! I'm trying to build new habits around video content that let me produce useful videos while minimizing the amount of time I spend on production. I plan to iterate more on the format as I get more comfortable with the process. I'm hoping I can find the right balance between production time and value to viewers. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
codedge 3 weeks ago

Managing secrets with SOPS in your homelab

Sealed Secrets, Ansible Vault, 1Password or SOPS - there are multiple ways how and where to store your secrets. I went with SOPS and age with my ArgoCD GitOps environment. Managing secrets in your homelab, be it within a Kubernetes cluster or while deploying systems and tooling with Ansible, is a topic that arises with almost 100% certainty. In general you need to decide, whether you want secrets to be held and managed externally or internally. One important advantage I see with internally managed solutions is, that I do not need an extra service. No extra costs and connections, no chicken-egg-problem when hosting your passwords inside your own Kubernetes cluster, but cannot reach it when the cluster is down. Therefore I went with SOPS for both, secrets for my Ansible scripts and secrets I need to set for my K8s cluster. While SOPS can be used with PGP, GnuPG and more, I settled with age as encryption. With SOPS your secrets live, encrypted, inside your repository and can be en-/decrypted on-the-fly whenever needed. The private key for encryption should, of course, never be committed into your git repository or made available to untrusted sources. First, we need to install SOPS, age and generate an age key. SOPS is available for all common operating systems via the package manager. I either use Mac or Arch: Now we need to generate an age key and link it to SOPS as the default key to encrypt with. Generate an age key Our age key will live in . Now we tell SOPS where to find our age key. I put the next line in my . The last thing to do is to put a in your folder from where you want to encrypt your files. This file acts as a configuration regarding the age recipient (key) and how the data should be encrypted. My config file looks like this: You might wonder yourself about the first rule with . I will just quote the [KSOPS docs](To make encrypted secrets more readable, we suggest using the following encryption regex to only encrypt data and stringData values. This leaves non-sensitive fields, like the secret’s name, unencrypted and human readable.) here: To make encrypted secrets more readable, we suggest using the following encryption regex to only encrypt data and stringData values. This leaves non-sensitive fields, like the secret’s name, unencrypted and human readable. All the configuration can be found in the SOPS docs . Let’s now look into the specifics using our new setup with either Ansible or Kubernetes. Ansible can automatically process (decrypt) SOPS-encrypted files with the [Community SOPS Collection](Community SOPS Collection). Additionally in my I enabled this plugin ( see docs ) via Now, taken from the official Ansible docs: After the plugin is enabled, correctly named group and host vars files will be transparently decrypted with SOPS. The files must end with one of these extensions: .sops.yaml .sops.yml .sops.json That’s it. You can now encrypt your group or host vars files and Ansible can automatically decrypt them. SOPS can be used with Kubernetes with the KSOPS Kustomize Plugin . The configuration we already prepared, we only need to apply KSOPS to our cluster. I use the following manifest - see more examples in my homelab repository : Externally managed: this includes either a self-hosted and externally hosted secrets solution like AWS KMS, password manager like 1Password or similar Internally managed: solutions where your secrets live next to your code, no external service is need Arch Linux: Two separate rules depending on the folder, where the encrypted files are located Files ending with are targeted The age key, that should be used for en-/decryption is specified

0 views
sunshowers 3 weeks ago

`SocketAddrV6` is not roundtrip serializable

A few weeks ago at Oxide , we encountered a bug where a particular, somewhat large, data structure was erroring on serialization to JSON via . The problem was that JSON only supports map keys that are strings or numbers, and the data structure had an infrequently-populated map with keys that were more complex than that 1 . We fixed the bug, but a concern still remained: what if some other map that was empty most of the time had a complex key in it? The easiest way to guard against this is by generating random instances of the data structure and attempting to serialize them, checking that this operation doesn’t panic. The most straightforward way to do this is with property-based testing , where you define: Modern property-based testing frameworks like , which we use at Oxide, combine these two algorithms into a single strategy , through a technique known as integrated shrinking . (For a more detailed overview, see my monad tutorial , where I talk about the undesirable performance characteristics of monadic composition when it comes to integrated shrinking.) The library has a notion of a canonical strategy for a type, expressed via the trait . The easiest way to define instances for large, complex types is to use a derive macro . Annotate your type with the macro: As long as all the fields have defined for them—and the library defines the trait for most types in the standard library—your type has a working random generator and shrinker associated with it. It’s pretty neat! I put together an implementation for our very complex type, then wrote a property-based test to ensure that it serializes properly: And, running it: The test passed! But while we’re here, surely we should also be able to deserialize a , and then ensure that we get the same value back, right? We’ve already done the hard part, so let’s go ahead and add this test: The roundtrip test failed! Why in the world did the test fail? My first idea was to try and do a textual diff of the outputs of the two data structures. In this case, I tried out the library, with something like: And the output I got was: There’s nothing in the output! No or as would typically be printed. It’s as if there wasn’t a difference at all, and yet the assertion failing indicated the before and after values just weren’t the same. We have one clue to go by: the integrated shrinking algorithm in tries to shrink maps down to empty ones. But it looks like the map is non-empty . This means that something in either the key or the value was suspicious. A is defined as: Most of these types were pretty simple. The only one that looked even remotely suspicious was the , which ostensibly represents an IPv6 address plus a port number. What’s going on with the ? Does the implementation for it do something weird? Well, let’s look at it : Like a lot of abstracted-out library code it looks a bit strange, but at its core it seems to be simple enough: The is self-explanatory, and the is probably the port number. But what are these last two values? Let’s look at the constructor : What in the world are these two and values? They look mighty suspicious. A thing that caught my eye was the “Textual representation” section of the , which defined the representation as: Note what’s missing from this representation: the field! We finally have a theory for what’s going on: Why did this not show up in the textual diff of the values? For most types in Rust, the representation breaks out all the fields and their values. But for , the implementation (quite reasonably) forwards to the implementation . So the field is completely hidden, and the only way to look at it is through the method . Whoops. How can we test this theory? The easiest way is to generate random values of where is always set to zero, and see if that passes our roundtrip tests. The ecosystem has pretty good support for generating and using this kind of non-canonical strategy. Let’s try it out: Pretty straightforward, and similar to how lets you provide custom implementations through . Let’s test it out again: All right, looks like our theory is confirmed! We can now merrily be on our way… right? This little adventure left us with more questions than answers, though: The best place to start looking is in the IETF Request for Comments (RFCs) 2 that specify IPv6. The Rust documentation for helpfully links to RFC 2460, section 6 and section 7 . The field is actually a combination of two fields that are part of every IPv6 packet: Section 6 of the RFC says: Flow Labels The 20-bit Flow Label field in the IPv6 header may be used by a source to label sequences of packets for which it requests special handling by the IPv6 routers, such as non-default quality of service or “real-time” service. This aspect of IPv6 is, at the time of writing, still experimental and subject to change as the requirements for flow support in the Internet become clearer. […] And section 7: Traffic Classes The 8-bit Traffic Class field in the IPv6 header is available for use by originating nodes and/or forwarding routers to identify and distinguish between different classes or priorities of IPv6 packets. At the point in time at which this specification is being written, there are a number of experiments underway in the use of the IPv4 Type of Service and/or Precedence bits to provide various forms of “differentiated service” for IP packets […]. Let’s look at the Traffic Class field first. This field is similar to IPv4’s differentiated services code point (DSCP) , and is meant to provide quality of service (QoS) over the network. (For example, prioritizing low-latency gaming and video conferencing packets over bulk downloads.) The DSCP field in IPv4 is not part of a , but the Traffic Class—through the field—is part of a . Why is that the case? Rust’s definition of mirrors the defined by RFC 2553, section 3.3 : Similarly, Rust’s mirrors the struct. There isn’t a similar RFC for ; the de facto standard is Berkeley sockets , designed in 1983. The Linux man page for defines it as: So , which includes the Traffic Class, is part of , but the very similar DSCP field is not part of . Why? I’m not entirely sure about this, but here’s an attempt to reconstruct a history: (Even if could be extended to have this field, would it be a good idea to do so? Put a pin in this for now.) RFC 2460 says that the Flow Label is “experimental and subject to change”. The RFC was written back in 1998, over a quarter-century ago—has anyone found a use for it since then? RFC 6437 , published in 2011, attempts to specify semantics for IPv6 Flow Labels. Section 2 of the RFC says: The 20-bit Flow Label field in the IPv6 header [RFC2460] is used by a node to label packets of a flow. […] Packet classifiers can use the triplet of Flow Label, Source Address, and Destination Address fields to identify the flow to which a particular packet belongs. The RFC says that Flow Labels can potentially be used by routers for load balancing, where they can use the triplet source address, destination address, flow label to figure out that a series of packets are all associated with each other. But this is an internal implementation detail generated by the source program, and not something IPv6 users copy/pasting an address generally have to think about. So it makes sense that it isn’t part of the textual representation. RFC 6294 surveys Flow Label use cases, and some of the ones mentioned are: But this Stack Exchange answer by Andrei Korshikov says: Nowadays […] there [are] no clear advantages of additional 20-bit QoS field over existent Traffic Class (Differentiated Class of Service) field. So “Flow Label” is still waiting for its meaningful usage. In my view, putting in was an understandable choice given the optimism around QoS in 1998, but it was a bit of a mistake in hindsight. The Flow Label field never found widespread adoption, and the Traffic Class field is more of an application-level concern. In general, I think there should be a separation between types that are losslessly serializable and types that are not, and violates this expectation. Making the Traffic Class (QoS) a socket option, like in IPv4, avoids these serialization issues. What about the other additional field, ? What does it mean, and why does it not have to be zeroed out? The documentation for a says that in its textual representation, the scope identifier is included after the IPv6 address and a character, within square brackets. So, for example, the following code sample: prints out . What does this field mean? The reason exists has to do with link-local addressing . Imagine you connect two computers directly to each other via, say, an Ethernet cable. There isn’t a central server telling the computers which addresses to use, or anything similar—in this situation, how can the two computers talk to each other? To address this issue, OS vendors came up with the idea to just assign random addresses on each end of the link. The behavior is defined in RFC 3927, section 2.1 : When a host wishes to configure an IPv4 Link-Local address, it selects an address using a pseudo-random number generator with a uniform distribution in the range from 169.254.1.0 to 169.254.254.255 inclusive. (You might have seen these 169.254 addresses on your home computers if your router is down. Those are link-local addresses.) Sounds simple enough, right? But there is a pretty big problem with this approach: what if a computer has more than one interface on which a link-local address has been established? When a program tries to send some data over the network, the computer has to know which interface to send the data out on. But with multiple link-local interfaces, the outbound one becomes ambiguous. This is described in section 6.3 of the RFC: Address Ambiguity Application software run on a multi-homed host that supports IPv4 Link-Local address configuration on more than one interface may fail. This is because application software assumes that an IPv4 address is unambiguous, that it can refer to only one host. IPv4 Link-Local addresses are unique only on a single link. A host attached to multiple links can easily encounter a situation where the same address is present on more than one interface, or first on one interface, later on another; in any case associated with more than one host. […] The IPv6 protocol designers took this lesson to heart. Every time an IPv6-capable computer connects to a network, it establishes a link-local address starting with . (You should be able to see this address via on Linux, or your OS’s equivalent.) But if you’re connected to multiple networks, all of them will have addresses beginning with . Now if an application wants to establish a connection to a computer in this range, how can it tell the OS which interface to use? That’s exactly where comes in: it allows the to specify which network interface to use. Each interface has an index associated with it, which you can see on Linux with . When I run that command, I see: The , , and listed here are all the indexes that can be used as the scope ID. Let’s try pinging our address: Aha! The warning tells us that for a link-local address, the scope ID needs to be specified. Let’s try that using the syntax: Success! What if we try a different scope ID? This makes sense: the address is only valid for scope ID 2 (the interface). When we told to use a different scope, 3, the address was no longer reachable. This neatly solves the 169.254 problem with IPv4 addresses. Since scope IDs can help disambiguate the interface on which a connection ought to be made, it does make sense to include this field in , as well as in its textual representation. The keen-eyed among you may have noticed that the commands above printed out an alternate representation: . The at the end is the network interface that corresponds to the numeric scope ID. Many programs can handle this representation, but Rust’s can’t. Another thing you might have noticed is that the scope ID only makes sense on a particular computer. A scope ID such as means different things on different computers. So the scope ID is roundtrip serializable, but not portable across machines. In this post we started off by looking at a somewhat strange inconsistency and ended up deep in the IPv6 specification. In our case, the instances were always for internal services talking to each other without any QoS considerations, so was always zero. Given that knowledge, we were okay adjusting the property-based tests to always generate instances where was set to zero. ( Here’s the PR as landed .) Still, it raises questions: Should we wrap in a newtype that enforces this constraint? Should provide a non-standard alternate serializer that also includes the field? Should not forward to when hides fields? Should Rust have had separate types from the start? (Probably too late now.) And should Berkeley sockets not have included at all, given that it makes the type impossible to represent as text without loss? The lesson it really drives home for me is how important the principle of least surprise can be. Both and have lossless textual representations, and does as well. By analogy it would seem like would, too, and yet it does not! IPv6 learned so much from IPv4’s mistakes, and yet its designers couldn’t help but make some mistakes of their own. This makes sense: the designers could only see the problems they were solving then, just as we can only see those we’re solving now—and just as we encounter problems with their solutions, future generations will encounter problems with ours. Thanks to Fiona , and several of my colleagues at Oxide, for reviewing drafts of this post. Discuss on Hacker News and Lobsters . This is why our Rust map crate where keys can borrow from values, , serializes its maps as lists or sequences.  ↩︎ The Requests for Discussion we use at Oxide are inspired by RFCs, though we use a slightly different term (RFD) to convey the fact that our documents are less set in stone than IETF RFCs are.  ↩︎ The two fields sum up to 28 bits, and the field is a , so there’s four bits remaining. I couldn’t find documentation for these four bits anywhere—they appear to be unused padding in the . If you know about these bits, please let me know!  ↩︎ a way to generate random instances of a particular type, and given a failing input, a way to shrink it down to a minimal failing value. generate four values: an , a , a , and another then pass them in to . A left square bracket ( ) The textual representation of an IPv6 address Optionally , a percent sign ( ) followed by the scope identifier encoded as a decimal integer A right square bracket ( ) A colon ( ) The port, encoded as a decimal integer. generated a with a non-zero field. When we went to serialize this field as JSON, we used the textual representation, which dropped the field. When we deserialized it, the field was set to zero. As a result, the before and after values were no longer equal. What does this field mean? A is just an plus a port ; why is a different? Why is the not part of the textual representation? , , and are all roundtrip serializable. Why is not? Also: what is the field? a 20-bit Flow Label, and an 8-bit Traffic Class 3 . QoS was not originally part of the 1980s Berkeley sockets specification. DSCP came about much later ( RFC 2474 , 1998). Because C structs do not provide encapsulation, the definition was set in stone and couldn’t be changed. So instead, the DSCP field is set as an option on the socket, via . By the time IPv6 came around, it was pretty clear that QoS was important, so the Traffic Class was baked into the struct. as a pseudo-random value that can be used as part of a hash key for load balancing, or as extra QoS bits on top of the 8 bits provided by the Traffic Class field. This is why our Rust map crate where keys can borrow from values, , serializes its maps as lists or sequences.  ↩︎ The Requests for Discussion we use at Oxide are inspired by RFCs, though we use a slightly different term (RFD) to convey the fact that our documents are less set in stone than IETF RFCs are.  ↩︎ The two fields sum up to 28 bits, and the field is a , so there’s four bits remaining. I couldn’t find documentation for these four bits anywhere—they appear to be unused padding in the . If you know about these bits, please let me know!  ↩︎

0 views
Ahmad Alfy 1 months ago

Your URL Is Your State

Couple of weeks ago when I was publishing The Hidden Cost of URL Design I needed to add SQL syntax highlighting. I headed to PrismJS website trying to remember if it should be added as a plugin or what. I was overwhelmed with the amount of options in the download page so I headed back to my code. I checked the file for PrismJS and at the top of the file, I found a comment containing a URL: I had completely forgotten about this. I clicked the URL, and it was the PrismJS download page with every checkbox, dropdown, and option pre-selected to match my exact configuration. Themes chosen. Languages selected. Plugins enabled. Everything, perfectly reconstructed from that single URL. It was one of those moments where something you once knew suddenly clicks again with fresh significance. Here was a URL doing far more than just pointing to a page. It was storing state, encoding intent, and making my entire setup shareable and recoverable. No database. No cookies. No localStorage. Just a URL. This got me thinking: how often do we, as frontend engineers, overlook the URL as a state management tool? We reach for all sorts of abstractions to manage state such as global stores, contexts, and caches while ignoring one of the web’s most elegant and oldest features: the humble URL. In my previous article, I wrote about the hidden costs of bad URL design . Today, I want to flip that perspective and talk about the immense value of good URL design. Specifically, how URLs can be treated as first-class state containers in modern web applications. Scott Hanselman famously said “ URLs are UI ” and he’s absolutely right. URLs aren’t just technical addresses that browsers use to fetch resources. They’re interfaces. They’re part of the user experience. But URLs are more than UI. They’re state containers . Every time you craft a URL, you’re making decisions about what information to preserve, what to make shareable, and what to make bookmarkable. Think about what URLs give us for free: URLs make web applications resilient and predictable. They’re the web’s original state management solution, and they’ve been working reliably since 1991. The question isn’t whether URLs can store state. It’s whether we’re using them to their full potential. Before we dive into examples, let’s break down how URLs encode state. Here’s a typical stateful URL: For many years, these were considered the only components of a URL. That changed with the introduction of Text Fragments , a feature that allows linking directly to a specific piece of text within a page. You can read more about it in my article Smarter than ‘Ctrl+F’: Linking Directly to Web Page Content . Different parts of the URL encode different types of state: Sometimes you’ll see multiple values packed into a single key using delimiters like commas or plus signs. It’s compact and human-readable, though it requires manual parsing on the server side. Developers often encode complex filters or configuration objects into a single query string. A simple convention uses key–value pairs separated by commas, while others serialize JSON or even Base64-encode it for safety. For flags or toggles, it’s common to pass booleans explicitly or to rely on the key’s presence as truthy. This keeps URLs shorter and makes toggling features easy. Another old pattern is bracket notation , which represents arrays in query parameters. It originated from early web frameworks like PHP where appending to a parameter name signals that multiple values should be grouped together. Many modern frameworks and parsers (like Node’s library or Express middleware) still recognize this pattern automatically. However, it’s not officially standardized in the URL specification, so behavior can vary depending on the server or client implementation. Notice how it even breaks the syntax highlighting on my website. The key is consistency . Pick patterns that make sense for your application and stick with them. Let’s look at real-world examples of URLs as state containers: PrismJS Configuration The entire syntax highlighter configuration encoded in the URL. Change anything in the UI, and the URL updates. Share the URL, and someone else gets your exact setup. This one uses anchor and not query parameters, but the concept is the same. GitHub Line Highlighting It links to a specific file while highlighting lines 108 through 136. Click this link anywhere, and you’ll land on the exact code section being discussed. Google Maps Coordinates, zoom level, and map type all in the URL. Share this link, and anyone can see the exact same view of the map. Figma and Design Tools Before shareable design links, finding an updated screen or component in a large file was a chore. Someone had to literally show you where it lived, scrolling and zooming across layers. Today, a Figma link carries all that context like canvas position, zoom level, selected element. Literally everything needed to drop you right into the workspace. E-commerce Filters This is one of the most common real-world patterns you’ll encounter. Every filter, sort option, and price range preserved. Users can bookmark their exact search criteria and return to it anytime. Most importantly, they can come back to it after navigating away or refreshing the page. Before we discuss implementation details, we need to establish a clear guideline for what should go into the URL. Not all state belongs in URLs. Here’s a simple heuristic: Good candidates for URL state: Poor candidates for URL state: If you are not sure if a piece of state belongs in the URL, ask yourself: If someone else clicking this URL, should they see the same state? If so, it belongs in the URL. If not, use a different state management approach. The modern API makes URL state management straightforward: The event fires when the user navigates with the browser’s Back or Forward buttons. It lets you restore the UI to match the URL, which is essential for keeping your app’s state and history in sync. Usually your framework’s router handles this for you, but it’s good to know how it works under the hood. React Router and Next.js provide hooks that make this even cleaner: Now that we’ve seen how URLs can hold application state, let’s look at a few best practices that keep them clean, predictable, and user-friendly. Don’t pollute URLs with default values: Use defaults in your code when reading parameters: For high-frequency updates (like search-as-you-type), debounce URL changes: When deciding between and , think about how you want the browser history to behave. creates a new history entry, which makes sense for distinct navigation actions like changing filters, pagination, or navigating to a new view — users can then use the Back button to return to the previous state. On the other hand, updates the current entry without adding a new one, making it ideal for refinements such as search-as-you-type or minor UI adjustments where you don’t want to flood the history with every keystroke. When designed thoughtfully, URLs become more than just state containers. They become contracts between your application and its consumers. A good URL defines expectations for humans, developers, and machines alike A well-structured URL draws the line between what’s public and what’s private, client and server, shareable and session-specific. It clarifies where state lives and how it should behave. Developers know what’s safe to persist, users know what they can bookmark, and machines know whats worth indexing. URLs, in that sense, act as interfaces : visible, predictable, and stable. Readable URLs explain themselves. Consider the difference between the two URLs below. The first one hides intent. The second tells a story. A human can read it and understand what they’re looking at. A machine can parse it and extract meaningful structure. Jim Nielsen calls these “ examples of great URLs ”. URLs that explain themselves. URLs are cache keys. Well-designed URLs enable better caching strategies: You can even visualize a user’s journey without any extra tracking code: Your analytics tools can track this flow without additional instrumentation. Every URL parameter becomes a dimension you can analyze. URLs can communicate API versions, feature flags, and experiments: This makes gradual rollouts and backwards compatibility much more manageable. Even with the best intentions, it’s easy to misuse URL state. Here are common pitfalls: The classic single-page app mistake: If your app forgets its state on refresh, you’re breaking one of the web’s fundamental features. Users expect URLs to preserve context. I remember a viral video from years ago where a Reddit user vented about an e-commerce site: every time she hit “Back,” all her filters disappeared. Her frustration summed it up perfectly. If users lose context, they lose patience. This one seems obvious, but it’s worth repeating: URLs are logged everywhere: browser history, server logs, analytics, referrer headers. Treat them as public. Choose parameter names that make sense. Future you (and your team) will thank you. If you need to base64-encode a massive JSON object, the URL probably isn’t the right place for that state. Browsers and servers impose practical limits on URL length (usually between 2,000 and 8,000 characters) but the reality is more nuanced. As this detailed Stack Overflow answer explains, limits come from a mix of browser behavior, server configurations, CDNs, and even search engine constraints. If you’re bumping against them, it’s a sign you need to rethink your approach. Respect browser history. If a user action should be “undoable” via the back button, use . If it’s a refinement, use . That PrismJS URL reminded me of something important: good URLs don’t just point to content. They describe a conversation between the user and the application. They capture intent, preserve context, and enable sharing in ways that no other state management solution can match. We’ve built increasingly sophisticated state management libraries like Redux, MobX, Zustand, Recoil and others. They all have their place but sometimes the best solution is the one that’s been there all along. In my previous article, I wrote about the hidden costs of bad URL design. Today, we’ve explored the flip side: the immense value of good URL design. URLs aren’t just addresses. They’re state containers, user interfaces, and contracts all rolled into one. If your app forgets its state when you hit refresh, you’re missing one of the web’s oldest and most elegant features. Shareability : Send someone a link, and they see exactly what you see Bookmarkability : Save a URL, and you’ve saved a moment in time Browser history : The back button just works Deep linking : Jump directly into a specific application state Path Segments ( ). Best used for hierarchical resource navigation : - User 123’s posts - Documentation structure - Application sections Query Parameters ( ). Perfect for filters , options , and configuration : - UI preferences - Pagination - Data filtering - Date ranges Anchor ( ). Ideal for client-side navigation and page sections: - GitHub line highlighting - Scroll to section - Single-page app routing (though it’s rarely used these days) Search queries and filters Pagination and sorting View modes (list/grid, dark/light) Date ranges and time periods Selected items or active tabs UI configuration that affects content Feature flags and A/B test variants Sensitive information (passwords, tokens, PII) Temporary UI states (modal open/closed, dropdown expanded) Form input in progress (unsaved changes) Extremely large or complex nested data High-frequency transient states (mouse position, scroll position) Same URL = same resource = cache hit Query params define cache variations CDNs can cache intelligently based on URL patterns

0 views
マリウス 1 months ago

Zeit v1

Zeit began nearly five years ago as a pet project. I needed a lightweight, user-friendly tool to track time, with the added capability to export data for integration with other, mostly home-brewed software used for accounting and invoicing. At the time, I had only a basic set of features in mind and no clear long-term plan for the tool. Little did I know that I seemingly wasn’t alone in my need for a time tracker that stays out of the way and doesn’t come with an attached (paid) cloud service. Whenever users requested new features or options, I either implemented them myself or accepted their pull requests without much hesitation. My guiding principle was simple: If a small enhancement could make the software more useful to even one other person, I was happy to introduce it. Nearly five years after its initial release, Zeit has stood the test of time ( hah ) quite well and continues to grow in popularity on GitHub . What began as a minimal command-line time-tracking utility has evolved into a more feature-rich program. Depending on your perspective, you might say it now has a few too many features. Fast forward to today, and the first version of Zeit (referred to as zeit v0 ) has strayed far from its original goal of being a clean, minimal command-line tool. Instead, it has grown into an unwieldy user experience, cluttered with features that are neither intuitive nor well thought out. From a code standpoint, many of the decisions that made sense a few years ago now seem suboptimal, especially as we look to the future. While I could have sifted through the original v0 codebase to clean it up and remove features that were added by contributors who eventually stopped maintaining them, I opted to rewrite Zeit from scratch. The new version is built on more modern dependencies, with a cleaner, more streamlined codebase, and is free of the “one-off” features added for individual users who no longer appear to use Zeit . Over the past five years, I’ve learned a great deal from user feedback. With Zeit v1 , I’ve implemented the most practical and useful feature requests in a way that feels cohesive and polished, rather than like an afterthought. Starting with the database, Zeit v1 replaces the old storage engine, BuntDB , with BadgerDB v4 . BadgerDB is an embeddable, persistent, and fast key-value (KV) database written in pure Go. This new database not only stores time entries but also user-specific configurations, eliminating the need for a separate config file and reducing external dependencies. This shift addresses past issues with unnecessary dependencies, such as Viper , that would eventually cause more headaches than they would benefit the project. “Why not use SQLite?” you might ask. The answer is simple: Cross-compiling. Unfortunately, there is no fully compliant SQLite implementation written in pure Go. Using the official SQLite implementation requires , which complicates cross-compilation for various platforms. Additionally, the data Zeit generates fits well into a key-value store and doesn’t require the complexity of a relational database on the client side. Next, I cleaned up the overall project structure and organized it into distinct areas for the database layer, internal business logic, the command-line interface (CLI), and output. Unlike v0 , Zeit now uses Charm ’s lipgloss v2 library to render CLI output, taking advantage of the terminal’s default theme for colors. This results in a more seamless integration of Zeit into your terminal user interface (TUI) aesthetics. Additionally, most Zeit commands (e.g., , , etc.) now support JSON output alongside the standard CLI output. This makes it easier to integrate Zeit with other tools. For example, to build your own project/task picker, you can leverage the JSON output of the command and use to create a list of project/task entries. You can then feed that list into your favorite dmenu launcher, simplifying the process of managing your time-tracking data: One change that will break compatibility with existing integrations is the new command-line interface, which adopts a similar approach to many of my newer tools, such as whats . In the past, Zeit users had to learn and memorize command-line flags like , , , and even less intuitive ones like or . While Zeit v1 still supports similar flags, its primary focus now shifts to a more natural way of using command-line arguments: As demonstrated by this otherwise complex example, which tracks a new block of time with a note on the personal project and knowledge task, starting four hours ago and ending ten minutes ago, the use of a more natural approach to command-line arguments significantly enhances a user’s understanding of the command. However, because Zeit still supports flags, the same command can also be executed using those: The structure is kept (almost) identical across various commands and can hence be as well used for filters: This command lists all tracked time blocks for the personal project and knowledge task, from last week (at this time) until two hours ago today. As shown, the need for a detailed explanation is minimal, as the command’s purpose is easily understood just by looking at it. Similarly, as demonstrated in the previous example, the same flags can also be used with the command: If you use Zeit daily, you may find the natural arguments interface more intuitive and enjoyable than working with flags. However, if you’re building a tool that interacts with to inject or extract data, you’ll likely prefer sticking to the more programmatically robust flags. With the complete rewrite of Zeit , one major change is its license. Historically, all of my software projects on GitHub have been published under the GNU GPL v3 license, allowing anyone to use the software under conditions deemed appropriate by the FSF and the OSI . However, as I explained in a previous status update here , these organizations were founded in a different era and, in my view, have failed to adapt to the realities of today. One glaring example of this is their incoherent stance on freedom , particularly when it comes to freedom of speech. It’s curious that many advocates of the GNU/OSI philosophies call for limitations on free speech while insisting that software must be usable without restriction in order to qualify as free and open source . To put it simply, Zeit v1 is no longer published under the GNU GPL or any of the OSI-approved licenses. Instead, it is now licensed under a partially modified HL3 license, which I’ve dubbed the SEGV license . This is not an open source license in the traditional (and, in my opinion, flawed) sense, but rather a source-available license. That said, I reject the taxonomy imposed by the FSF and the OSI and will continue to call my software open source , as the license change won’t have any practical impact for the average user. However, it is designed to ideally prevent certain groups whose goals I consider morally wrong from using the software. I’ve completed the first release of Zeit v1 , marking the official debut of this complete rewrite, now with version number v1.0.0 . Along with the new version, Zeit also has an official website: zeit.observer While the site currently serves as a simple landing page, it will grow in functionality over time, as indicated by the features listed as coming “soon” . Please note, however, that this new version is a full rewrite and not compatible with existing Zeit v0 databases. If you’re currently using Zeit v0 , worry not: You can export your entries using , and then import them into v1 with the new command. Just make sure you first export the database using Zeit v0 and only then upgrade to Zeit v1 and run the import command. If you’re looking for a command-line utility for time tracking, especially if you’re already using another tracker, I’d love for you to give Zeit v1 a try and share your thoughts . Let me know your top three missing features and which platforms you typically use for time tracking.

1 views
Farid Zakaria 1 months ago

Nix derivation madness

I’ve written a bit about Nix and I still face moments where foundational aspects of the package system confounds and surprises me. Recently I hit an issue that stumped me as it break some basic comprehension I had on how Nix works. I wanted to produce the build and runtime graph for the Ruby interpreter. I have Ruby but I don’t seem to have the derivation, , file present on my machine. No worries, I think I can it and download it from the NixOS cache. I guess the NixOS cache doesn’t seem to have it. 🤷 This was actually perplexing me at this moment. In fact there are multiple discourse posts about it. My mental model however of Nix though is that I must have first evaluated the derivation (drv) in order to determine the output path to even substitute. How could the NixOS cache not have it present? Is this derivation wrong somehow? Nope. This is the derivation Nix believes that produced this Ruby binary from the database. 🤨 What does the binary cache itself say? Even the cache itself thinks this particular derivation, , produced this particular Ruby output. What if I try a different command? So I seem to have a completely different derivation, , that resulted in the same output which is not what the binary cache announces. WTF? 🫠 Thinking back to a previous post, I remember touching on modulo fixed-output derivations . Is that what’s going on? Let’s investigate from first principles. 🤓 Let’s first create which is our fixed-output derivation . ☝️ Since this is a fixed-output derivation (FOD) the produced path will not be affected to changes to the derivation beyond the contents of . Now we will create a derivation that uses this FOD. The for the output for this derivation will change on changes to the derivation except if the derivation path for the FOD changes. This is in fact what makes it “modulo” the fixed-output derivations. Let’s test this all out by changing our derivation. Let’s do this by just adding some garbage attribute to the derivation. What happens now? The path of the derivation itself, , has changed but the output path remains consistent. What about the derivation that leverages it? It also got a new derivation path but the output path remained unchanged. 😮 That means changes to fixed-output-derivations didn’t cause new outputs in either derivation but it did create a complete new tree of files. 🤯 That means in nixpkgs changes to fixed-output derivations can cause them to have new store paths for their but result in dependent derivations to have the same output path. If the output path had already been stored in the NixOS cache, then we lose the link between the new and this output path. 💥 The amount of churn that we are creating in derivations was unbeknownst to me. It can get even weirder! This example came from @ericson2314 . We will duplicate the to another file whose only difference is the value of the garbage. Let’s now use both of these in our derivation. We can now instantiate and build this as normal. What is weird about that? Well, let’s take the JSON representation of the derivation and remove one of the inputs. We can do this because although there are two input derivations, we know they both produce the same output! Let’s load this modified derivation back into our and build it again! We got the same output . Not only do we have a trait for our output paths to derivations but we can also take certain derivations and completely change them by removing inputs and still get the same output! 😹 The road to Nix enlightenment is no joke and full of dragons.

2 views
Dayvster 1 months ago

AI’s Trap: Settling for Boilerplate Over Elegant Code

We are all familiar with Picasso's "The Bull" series, in which he progressively simplifies the image of a bull down to its most basic, yet still recognizable form. Steve Jobs was famously inspired by this concept, leading him to advocate for simplicity and elegance in design and technology above countless features and excessive complexity. Distill a concept even as complex as software or UX down to its essence, and what you are left with is something beautiful and elegant that fulfills its purpose with minimal fuss. ## So Why Do We Accept Ugly Code Then? I've noticed a worrying trend in programming that, as tools around a programming language improve and automate more of our work, we increase our tolerance for boilerplate, repetitive, and frankly, ugly code. We accept it and we tell ourselves it's ok, the linter will fix it, the formatter will fix it, the compiler will optimize it, in the end it's all ones and zeroes anyway, right, why would any of this matter? Since AI has entered the equation, tolerance for boilerplate and excuses for ugly code has only increased. We tell ourselves that as long as the AI can generate the code for us, it doesn't matter if it's elegant or not. After all, we didn't write it, the AI did. So why should we care about the quality of the code? After all, we relinquished ownership of the code the moment we asked the AI to generate it for us. Now, you may not be someone who uses AI to generate code, and kudos to you, welcome to the club, however, even as someone who relatively early noticed that AI does not produce something that I could sign off on as my own work proudly and with confidence. I have engaged in the practice of using AI to generate some more tedious tasks for myself, such as just taking a JSON response from an API and asking Copilot, ChatGPT, or Grok to generate a type definition in the language I am currently working with. I work on many personal and professional projects, and I encounter different types of people and teams, some embrace AI, others shun it. However I have noticed that in teams or projects where AI is embraced or even encouraged or mandated as part of the development process, tend to produce a lot of boilerplate and a lot of very ugly inelegant code that few wish to take real ownership of, because it is not their creation they handed ownership of it to the AI and therefore abandoned the developer - code relationship that is so essential to producing quality software. ## Developers Should Love Their Code I harp on this a lot, development is a unique one of a kind blend of engineering and creative expression, those two aspects of our craft should not be at odds with each other, but should rather complement each other. When you write code, you should love it, you should be proud of it, you should want to show it off to others who can understand it and appreciate it, in essence, your way of expressing your way of thinking and how you tackle problems. I've briefly touched upon the fact that handing off ownership of your code to AI means abandoning that relationship between you, the developer, and your code. I want to expand on that a bit more. If you don't care about the code you write, if you don't love what you are doing or the creative process to solve a specific problem, you will forever lack understanding of that specific problem that you are solving. It will get solved for you by the AI, and you will rob yourself of the opportunity to learn and understand that problem on a deeper level. This is something that has kicked me in the ass a fair few times in the past, I'd get a problem to solve, think oh this is easy, draft up an initial solution solve the problem at a very superficial level and then in the coming weeks get 10 QA tickets filed against that problem because there is more to the problem than meets the eye and there are often things that you miss or do not even consider during your first implementation. AI will do exactly the same thing every single time. The difference, though, is that every time I was given a problem to solve, it resulted in fewer and fewer QA tickets because I understood and learnt from my past experiences and mistakes and knew how to approach problem-solving more effectively. AI will not do any of that, it will always solve the problem given to it at face value without any deeper understanding or context. It will not learn from past mistakes, grow, or adapt its mindset to shift its approach based on new information or insights. It will always solve the problem, and worst of all, it will never learn to understand the problem so well that it can simplify the solution down to its essence and produce something elegant. There is a certain beauty to solving a problem and all of its potential side effects in a very minimal and easily readable way. That is an ideal that I strive for in my own work, and I encourage you to do the same. Love your code, be proud of it, and strive for elegance over boilerplate. ## The Cost of Boilerplate Boilerplate code is not just an eyesore, it comes with real costs. It increases the cognitive load on developers, makes the project less enjoyable to work on, and increases the time to onboard new team members. And increases the time to fix or resolve issues, which directly impacts the experience of the end user of your product. As a third-order effect, it also increases the soft requirement for more tooling and more AI assistance to manage and maintain your codebase, which is a dangerous spiral best avoided altogether. A question I've been asking myself is how much of this is by happenstance and how much of this is by design. I don't want to get all conspiracy theorist on you, but it does make me wonder if there are vested interests in making us accept boilerplate and ugly code as the norm, because it increases the demand for AI tools and services that can help us manage and maintain our codebases. I'm not confident enough to attach any weight to this thought, but it's something worth pondering. ## Aesthetic Responsibility It's very easy for us as developers to simply dismiss aesthetics as something superficial and unimportant in the grand scheme of things. After all, we are not building art, we are building software that solves problems and delivers value to users. However, I would argue that aesthetics play a crucial role in the quality and maintainability of our code. When we write code that is elegant and beautiful, we are not just making it easier for ourselves to read and understand, we are also making it easier for others to read and understand. We are creating a shared language and a shared understanding of the problem we are solving and the solution we are implementing. When we write ugly and boilerplate code, we are creating barriers and obstacles for ourselves and others. We are making it harder to read and understand, we are making it harder to maintain and evolve, and we are making it harder to collaborate and share knowledge. We're increasing the friction instead of reducing it. It always feels nice as a human to look at something beautiful, whether it's a piece of art, a well-designed product, a sports car, a building with stunning architecture, and so on and so forth. But when it comes to code, we often dismiss aesthetics as unimportant, after all, it's a means to an end, right? I would argue that as developers, we have an aesthetic responsibility to ourselves and to others to write code that is elegant and beautiful, not just for the sake of aesthetics, but for the sake of quality and maintainability. But it is very easy lately to just let the tools we have been given and AI do the heavy lifting for us, and in doing so, we risk losing sight of our aesthetic responsibility as developers. ## How Does AI Increase the Tolerance for Boilerplate? You know the drill: you are working away on a piece of a project, and you're on a tight deadline with a backlog full of features to implement and bugs to fix. You think to yourself, "I just need to get this done quickly, I can refactor it later(a mindset I generally encourage)." So you ask the AI to generate the code for you, and it spits out a solution that does not work, so you refine your prompt and try again, and again, and again until you get something that works. Now you have a working solution, great! One quick glance at your `git status`, **HOLY SH.. WHY ARE THERE 32 FILES CHANGED? AND WHY ARE MOST OF THEM NEW FILES?** Ok..OK, you have a working solution. It's best to open up a PR and let the Copilot PR bot handle the code review, and maybe a couple of co-workers will spot some things and suggest improvements. Once you have more time on your hands, you will absolutely go back and refactor this mess down to something elegant and beautiful. You just need to get this next task done first, oh, and it's on a tight deadline as well... Before you know it, you have a codebase full of boilerplate and code that could easily be done in half as many lines or less, you will never go back and refactor it because there is always something more urgent to do, and the cycle continues. Maintenance? Not a problem, we can just use AI to generate documentation for us or explain code blocks for us. Testing? AI can generate tests for us, no need to think about edge cases or test coverage. Performance? AI can optimize our code for us, no need to understand the underlying algorithms or data structures. It creeps up on you slowly, but surely. AI stands as an excuse to not care about the quality of your code, but only the quality or functionality of your outcome. There's a lot to be said and discussed about this topic. You might be rightfully asking yourself, does it even matter if the code is ugly and I feel no pride in it, as long as the end user gets the functionality they need? ## "Good Enough" Is Not Good Enough I always liked the saying, "How you do anything is how you do everything." It has multiple different explanations and interpretations, but to me, it boils down to this: if you accept mediocrity in one aspect of your life, you will accept mediocrity in all aspects of your life. Or, how you do the little things is how you do the big things. If you accept "good enough" code, you will eventually have a good enough product on your hands that might have to compete with products that were built with care and pride. Do you want to deliver good enough products, or do you want to deliver great products that you can be proud of? Completely ignoring the market and financial viability of this approach, will you be happy with the work you do if mediocrity is the standard you hold yourself to? Or worse, if mediocrity is the modus operandi of your team or company? If you truly believe that "good enough" is good enough, then by all means continue down that path. But I urge you to test that belief, challenge it, start a project that you may never finish or get paid for, but attempt to take a complex concept and distill it down to its essence in the most elegant way possible, do what Picasso did with his bull series, and see how far you can push yourself to create something beautiful out of something complex while still maintaining its core functionality and purpose. ## Conclusion AI is a powerful tool that can help us be more productive and efficient, but it should not be used as an excuse to accept boilerplate and ugly code. As developers, we should strive for elegance and simplicity in our code, and we should take pride in the work we do. We should love our code and our craft, and we should never settle for "good enough" when it comes to the quality of our work. **Write code you would sign your name under.** If you've enjoyed this article and made it this far, thank you sincerely for your time. I hope it was worth it and that it sparked some thoughts and reflections on your own approach to coding and craftsmanship, and I sincerely hope it did not feel wasted. If you have any thoughts or feedback on this article, please feel free to reach out to me on [Twitter](https://twitter.com/dayvsterdev). I'm always open to discussions and feedback, or just general chit chat about just about anything I find interesting.

0 views