Latest Posts (20 found)

I feel your pain Sara

I stumbled on this piece of code recently that made me laugh, cry, sigh in despair, and think of poor Sara doing her best to make the web a better place. I guess people have forgot that is a thing that exists. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views

Bliki: Vibe Coding

Vibe coding is building a software application by prompting an LLM, telling it what to build, trying it out, prompting for changes - but without looking at any of the code that the LLM generates. This technique can be used by people without any knowledge of programming. However the resulting software often shows problems with maintainability, correctness, and security - so is best used for disposable software written for a limited audience. The term was coined in February 2025 by Andrej Karpathy, an experienced programmer, in a post on X: There's a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like “decrease the padding on the sidebar by half” because I'm too lazy to find it. I “Accept All” always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works. -- Andrej Karpathy The key point about vibe coding is “forget that the code even exists” . This is what gives it much of its usefulness, but also its limitations. Since the November Inflection many programmers are getting LLMs to write all their code, commenting that they may never write a line of code directly again. However they do care about this code, reviewing it, paying attention to its internal structure. In that case, they aren't forgetting the code exists, so it's really a different thing that I call Agentic Programming . Sadly the term “vibe coding” really caught on, so many people use it to mean agentic programming. However I feel that despite this rapid Semantic Diffusion , it's worth trying to keep the concepts of vibe coding and agentic programming separate, as they are both different to use and different in their consequences. Because a vibe coder doesn't look at the code, they don't need programming skills, so it's perfect for someone with no programming knowledge to build applications for their own use. Experienced programmers may also find it handy for rapid development of disposable software or prototypes. Vibe coding is still new, so we are exploring its limitations, and those limitations change as the sophistication of models and their harnesses change. These limitations do introduce considerable risks, particularly if the vibed software is used widely or has access to sensitive information. Perhaps the most serious risk is that of security. LLMs are inherently vulnerable as they provide a large attack surface for predators. Vibe coded applications can often expose sensitive information or worse, credentials to attack deeper into an organization's systems. Even non-programmers need to be aware of the Lethal Trifecta . With little attention to the code, vibed software can rapidly produce many lines of code of a very low quality. Such code makes it difficult, even for an LLM, to modify and enhance the software in the future. While it's possible that growing LLM capabilities will allow it to work with even the largest bowls of spaghetti software, thus far it seems clear that well-structured software makes life easier for LLMs too. LLMs are famous for habit of hallucinating incorrect facts and presenting these with great confidence. This habit also leads them to create software that behaves incorrectly - and those errors may not be manifest to the user. Furthermore the non-determinism of LLMs means that it's likely that asking an LLM to enhance some software could easily lead it to introduce errors, even in parts of the code that shouldn't change due to the new request. We should thus treat LLM-generated software with skepticism, it can still be useful, but we need to be aware of the risks. On the whole vibe coding software is best used for disposable software that's only used by its author or a close group of collaborators who understand and accept the risks involved. Code that is more complex, more widely-used, and with more consequences to its risks should not be forgotten about.

0 views
Unsung Today

Chrome’s abnormal tab search

Chrome’s find option, like every search coming from a good home, does something clever with accented characters – it normalizes them: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/chromes-abnormal-tab-search/1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/chromes-abnormal-tab-search/1.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/chromes-abnormal-tab-search/2.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/chromes-abnormal-tab-search/2.1600w.avif" type="image/avif"> No matter whether you search with a proper accented character, or with its basic Latin equivalent, all the same stuff matches: The “ø” letter is treated the same as “o” both in the input field, and then in the search itself. Yet, Chrome’s tab search inexplicably doesn’t do that, which confused me when working on a post about diacritics earlier this week. Here, it should match all four open tabs: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/chromes-abnormal-tab-search/3.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/chromes-abnormal-tab-search/3.1600w.avif" type="image/avif"> Tab search was introduced years ago; the Occam’s Razor says this isn’t a recent bug, but that the feature has always behaved like this. I filed the bug , but even if it gets fixed quickly, I think this doesn’t reflect well on Chrome’s team. If the right code already exists for ⌘F, why not reuse it? If it cannot be reused, why not repurpose at least its unit tests or the QA process to make sure this doesn’t fall through the cracks? Normalization should be treated as a core property of any search, rather than an optional “nice to have.” But, Marcin, didn’t you just invalidate your assertion that diacritics actually matter ? After all, wouldn’t you input “nestlé” instead of “nestle” if they did? To this, I have a few answers: Why does it matter specifically for the ⌘F and the tab search experience? I have this personal theory: the simplest the search, the more the users will blame themselves if it doesn’t work, and assume the tab or the string just isn’t there, rather than rewrite their query. That’s what happened to me. I assumed that the tab wasn’t open and tried to get to it again, wasting time and effort. The rule might be universally true for any UI surface – the tighter it gets, the less likely we assume it can break. After all, there is a manual for a typewriter, but there isn’t one for the pencil! And these UIs do feel positively basic; they are small windows with basically one input field and an immediate as-you-type reaction. #definitions #keyboard #localization Input is not output. This is no different than autocorrect, autocomplete, or other IME helpers. The very fact that on many keyboards accented characters are hard to input is itself a sign of anglo-centrism of companies that made early typewriters (Remington, which established a lot of European layouts like QWERTZ and AZERTY, employed a person who bragged he didn’t actually speak any languages in a “how hard could it be” way) and then most microcomputers. There is this really interesting rule, also known as Postel’s Law : “be conservative in what you output, but liberal in what you accept as input.” It’s not universally applicable – sometimes it’s better to teach the user to be more explicit if it benefits them in the longer run – but it feels appropriate to me here.

0 views

The famous o3 "GeoGuessr" prompt did not work

In April last year, Kelsey Piper discovered that OpenAI’s o3 model was surprisingly good at figuring out where a photo was taken from. Like human “geoguessr” pros , o3 could sometimes take a nondescript photo of a beach and tell you exactly where it is. Here’s the example Kelsey gave: Several people reproduced this with good results: not a 100% success rate, but clearly far better than you’d do with a random human guess. The lesson here is that model capabilities can surprise us . The o3 model had been released for two weeks before Kelsey’s tweet without anyone noticing how good it was at geolocation. What obscure capabilities did we never find? What capabilities of current models are we missing today? Some people drew another lesson from this: that “prompt engineering” can unlock brand-new capabilities. This is because Kelsey had a magic prompt that she built over time. When o3 got something wrong, she would ask it how it could have avoided the mistake, and then included that in the prompt. Here’s the first 10% of that prompt, so you get the idea: You are playing a one-round game of GeoGuessr. Your task: from a single still image, infer the most likely real-world location. Note that unlike in the GeoGuessr game, there is no guarantee that these images are taken somewhere Google’s Streetview car can reach: they are user submissions to test your image-finding savvy. Private land, someone’s backyard, or an offroad adventure are all real possibilities (though many images are findable on streetview). Be aware of your own strengths and weaknesses: following this protocol, you usually nail the continent and country… This prompt impressed a lot of people, who tried it out and reported that it correctly identified a lot of images. But of course, o3 correctly identified a lot of images with just a basic “think carefully about where this picture was taken?” prompt. Did the prompt actually help? It’d be tough to figure that out just from playing around in ChatGPT. You’d need to build an evaluation set of images and run o3 against them twice: once with the fancy prompt and once without it. So that’s what I did . I pulled 200 images from Wikimedia Commons, Geograph Britain and Ireland, and iNaturalist for the benchmark. You can read the AI-generated summary here , but here’s the key table: In general, the basic prompt did better on average. It consistently guessed closer to the actual location. Both prompts did pretty well, actually. Despite the fancy prompt being 10x larger, it only caused o3 to think for slightly longer (about one second on average, though the max was about double, at 10 minutes instead of 5 minutes). The images in my benchmark were fairly generic geoguessr-style outdoor images, with twelve indoor images thrown in for an extra challenge (the fancy prompt also did slightly worse on these). What’s going on? I think this shows how easy it is to fool yourself about the quality of prompting . When the model is already pretty good at a task, you can give it a very elaborate prompt without impacting performance. It’ll still be pretty good, except this time it’s good because of what you did . This is particularly true if you’re iterating with the model and asking it “what should I add to the prompt” for each mistake. Models will happily make up stories for you about their own reasoning processes, and will almost always say “yes, that helped a lot!” when you ask them if a particular prompt tweak made things better. The only way to actually know is by constructing some kind of benchmark 1 . It’s also interesting to me that nobody checked this at the time. It took me about six hours of fairly-distracted work and about $15 to construct and run this benchmark. Why didn’t anyone do this when they were writing articles about how good the o3 prompt was? One charitable reason might be that the story was more about o3’s real geolocation ability than about the magic prompt. The pricing for o3 also used to be about five times more expensive (though a benchmark of 40 images instead of 200 would still have thrown doubt on how much water the prompt was carrying). Also, AI just moves so fast . Geolocation was only the story for about a week: after that, GPT-4o’s sycophancy was what people were talking about. Another reason is that AI tooling wasn’t as good then. The benchmark was so easy for me to run because GPT-5.5 did most of the heavy lifting. Prior to strong agents, you would have had to write the (simple) benchmark yourself. I can’t point the finger too hard: I didn’t bother at the time either. Maybe my benchmark isn’t very good? The photos look reasonable enough: a wide variety of geoguessr-like shots of roads and landscapes, mostly. I could have tried to gather a few thousand photos instead of a few hundred, but if the magic prompt really was a big improvement you’d still expect to see that manifest on a benchmark this size. If someone wants to go and build a hundred-dollar geolocation benchmark instead of my fifteen-dollar one, I think that’d be an interesting project. Finally, let’s use the benchmark to answer a question I’ve had for a while: do gpt-5.4 and gpt-5.5 have o3’s geolocation abilities? The answer, apparently, is no. Whatever o3 had that made it good at this task hasn’t transferred to newer models. Benchmarks can mislead as well, but they’re better than just vibes. Benchmarks can mislead as well, but they’re better than just vibes. ↩

0 views

LSM Trees Explained

☕ Welcome to The Coder Cafe! Some people reached out after I published the Build Your Own Key-Value Storage Engine series to say they hadn’t gone through all eight posts, but they were curious about the core ideas. So I distilled everything into a single post. No implementation, no exercises, just the core concepts behind LSM trees. Get cozy, grab a coffee, and let’s begin! Fundamental Insights To understand LSM trees, we first need to understand why writes are hard. A B-tree-based database updates data in place . When we write a key, the engine finds the right page on disk and modifies it. This is a random write: the disk head has to seek to an arbitrary location before writing. On spinning disks, that seek takes time. But even on SSDs, random writes cause problems: they wear out cells unevenly and trigger expensive internal garbage collection. LSM trees take a completely different approach. Instead of writing data where it ultimately belongs, they write data sequentially . Writes are recorded in memory and appended to a log file for durability. When the in-memory buffer fills up, its contents are streamed to a new file in one sequential pass. Sequential writes are dramatically faster than random writes because there is no seeking involved. The disk just keeps writing forward. The price of this design is complexity. Data doesn’t live in one place. It accumulates across multiple files over time, and those files need to be periodically merged and reorganized in the background to stay manageable. That background work is what every piece of an LSM tree is built around. The in-memory buffer is called the memtable . The sorted files on disk are called SSTables . We’ll look at each in detail. Every write in an LSM tree starts in memory, in a structure called the memtable. The memtable is a mutable, in-memory store . When a write request arrives, the engine records the key-value pair in the memtable and appends it to a sequential log file on disk (called the write-ahead log, or WAL, which we’ll cover in the next section). The WAL write is a sequential append, so it is fast. There is no random I/O, no page lookup, no in-place modification. This is why LSM trees can sustain very high write throughput. A hashtable works for lookups but not for in-order iteration. Sorting a hashtable takes at flush time. A better choice is an ordered data structure. The most common in practice is a skip list ; for example, LevelDB and RocksDB both use one as their default. A radix trie is another elegant option: it keeps keys in lexicographic order naturally, so iterating in order is just a depth-first traversal, and flushing becomes a simple stream with no sorting step needed. A balanced BST works too. Production implementations typically attach a monotonic sequence number to each entry, so the engine can always determine which version of a key is the most recent, regardless of arrival order. The memtable doesn’t grow forever . At some point, it gets flushed to disk, and a new empty memtable takes its place. What triggers that flush depends on the implementation: it can be a size limit (a number of entries or a memory threshold), elapsed time, or memory pressure, for example. That flush produces a sorted file on disk called an SSTable, which we’ll look at after the WAL. There is a problem with keeping writes in memory: if the process crashes, everything in the memtable is gone. Any write the client received an acknowledgment for is now lost. That breaks a core database guarantee: durability . The solution is a Write-Ahead Log, or WAL. Before writing to the memtable, the engine appends the operation to the WAL, an append-only file on disk . Only after the WAL entry is safely persisted does the engine update the memtable and acknowledge the client. This ordering is what the “write-ahead” in the name refers to: the log is always written before the in-memory state changes. The WAL is not the final home for data; it’s a safety net. If the engine crashes and restarts, it replays the WAL from the beginning to reconstruct the memtable, recovering any writes that hadn’t been flushed to disk yet. One subtlety: writing to a file is not the same as persisting it. Operating systems buffer writes in memory before flushing to disk. To guarantee durability, the engine must call after each WAL entry, forcing the OS to flush its buffers to physical storage. This is not free, though. adds latency to every write. Production systems often use instead, which persists the data without flushing unnecessary file metadata, keeping WAL appends faster. Many also use a technique called group commit to amortize this cost further: instead of syncing after every write, they batch multiple WAL entries and call once for the group. The WAL introduces write amplification : the ratio of data written to disk versus data actually requested by a client. Every byte we write to the database gets written to disk twice: once to the WAL immediately, and once to an SSTable when the memtable is eventually flushed. That cost buys us durability. As we said, when the memtable fills up, it gets written to disk as a Sorted String Table, or SSTable . An SSTable is an immutable, sorted file. Immutable means it is never modified after creation. Sorted means keys are stored in lexicographic order. Both properties matter: Immutability makes SSTables safe to read concurrently without locking. Sorted order makes lookups inside a file efficient. In a simple implementation, an SSTable is just a JSON array of key-value pairs, sorted by key: Production systems use a binary block-based format instead. The SSTable is divided into fixed-size blocks, typically 4 KB, though the exact size varies by implementation. Data blocks hold the actual key-value entries. The SSTable also contains an index block storing the first key of each data block, which makes it possible to binary search for the right block without reading the entire file. In most implementations, the index block is written at the end of the file, since block boundaries are only known after all data blocks have been streamed out. To look up a key, we read the index block, binary search it to find the right data block, fetch that single block from disk, verify its integrity with a checksum, and then binary search within the block. When the index block is not cached, this means most lookups read two disk pages: the index block and one data block. In practice, index blocks are typically kept in memory, so most lookups require only one disk read . Each data block also carries a checksum computed over the block’s bytes. Before using the data, the engine verifies the checksum. If they don’t match, the block is corrupted, and the read fails safely rather than returning garbage. As SSTables accumulate, the engine maintains a catalog file (often called a MANIFEST in systems like RocksDB), which is an append-only log listing all existing SSTables in order of creation. This catalog is the engine’s source of truth for what files exist on disk. On startup, the engine reads it to know which files are live, and replays the WAL to restore the memtable. After a successful flush, the old WAL can be discarded. The data is now safely in an SSTable. Production systems also compress data blocks , typically with a fast algorithm like Snappy, LZ4, or zstd. Compression reduces disk footprint and I/O at the cost of CPU, and it interacts with block sizing: a compressed block may be smaller than a disk page, so implementations often track both logical and physical block sizes. LSM trees are optimized for writes. Reads are where the trade-off shows . To look up a key, the engine searches in order of recency : first the memtable, then SSTables from newest to oldest. The first match wins. This ordering matters because the same key can appear multiple times across different SSTables. Each write to a key produces a new entry rather than updating the existing one. The newest version is the correct one. The problem becomes clear as SSTables accumulate. A key that was written once and never updated might still require the engine to search through dozens of SSTables before finding it, or confirming it doesn’t exist. Each SSTable search is a disk read. This is called read amplification : a single logical read triggers multiple physical reads. For a key that doesn’t exist at all, the engine must check every SSTable before returning a not-found error. That’s the worst case for read amplification, and it gets worse the more SSTables there are. This is a fundamental tension in LSM trees, and it reflects a deeper principle known as the RUM conjecture: a storage engine can excel at two of reads, updates, and memory efficiency, but not all three at once . LSM trees make a deliberate choice: optimize for updates, accept read amplification as the cost. The sorted structure also enables efficient range scans. To retrieve all keys between and , the engine scans the memtable in order, then merges sorted streams from the relevant SSTables. The answer to accumulating SSTables is compaction . Compaction is a background process that takes multiple SSTables, merges them into fewer, cleaner ones , and discards the originals. The result is fewer files to search through, which directly reduces read amplification. It also reclaims disk space consumed by redundant entries: if the same key appears in three different SSTables, compaction keeps only the newest version and discards the rest. One common algorithm is a k-way merge . The engine opens iterators over all SSTables being compacted, each positioned at the first entry. It uses a min-heap to always pull the smallest key across all iterators. When the same key appears in multiple SSTables, the engine picks the version from the newest SSTable and discards the older ones. The merged output is streamed into new SSTable files. In practice, real systems limit the number of SSTables that can participate in a single compaction run to keep resource consumption under control. Updating the catalog after compaction requires care . The engine must not delete the old SSTables before the new ones are safely written to disk. The safe sequence is: write new SSTables, fsync, write a new catalog pointing to the new files, fsync, then delete the old SSTables. A crash at any point leaves the engine in a recoverable state: either the old files are still referenced by the old catalog, or the new files are referenced by the new catalog. Compaction is not free . It consumes I/O and CPU in the background, competing with foreground reads and writes. Every byte of data gets rewritten multiple times across its lifetime, adding to write amplification. Tuning when compaction triggers (and how aggressively it runs) is one of the main knobs in LSM tree performance. We might expect deletion to be straightforward: find the key, remove it. In an LSM tree, it is anything but straightforward . SSTables are immutable. We cannot reach into an existing SSTable and remove an entry. So when a key is deleted, the engine writes a special marker to the memtable called a tombstone , an entry that says “ this key is deleted ”. It eventually gets flushed to an SSTable like any other write. During reads, the engine respects tombstones. If a tombstone for a key is found before a value for that key, scanning newest to oldest, the key is treated as deleted, and a not-found error is returned. The tombstone shadows any older value. The tricky part is knowing when it is safe to discard a tombstone during compaction. Consider this situation: a tombstone for key exists in a newer SSTable, and an old value for exists in an older SSTable that hasn’t been compacted yet. If we drop the tombstone during compaction without also removing the old value, the old value becomes visible again. Deleted data reappears. This is called data resurrection , and it is a correctness bug. NOTE : Correctness here means the engine returns what was actually written, not a stale or deleted value. This is different from consistency in the distributed systems sense, which describes the guarantees clients have about which version of data they see across replicas. The rule is strict: a tombstone can only be dropped when the engine can guarantee that no older value for that key exists anywhere below it on disk . In practice, this means the compaction must include the oldest SSTables that could still hold a shadowed value. This is one of those details that seems minor until we get it wrong. A storage engine that resurrects deleted data is not a storage engine we can trust. Getting this right requires knowing exactly where older values can hide, which brings us to how SSTables are organized on disk. Basic compaction, merging all SSTables into one flat pool, works but doesn’t scale. As the dataset grows, a flat pool of SSTables means reads still have to check many files. Leveling is the structural answer . In a leveled LSM tree, SSTables are organized into levels: , , , and so on. Each level has different rules: is the landing zone . When the memtable flushes, the resulting SSTable lands in L0. files can have overlapping key ranges: two L0 files might both contain entries for key . This is acceptable because L0 files are small and short-lived. and deeper levels are different. Each level maintains non-overlapping key ranges across all its files . A given key can exist in at most one file per level. This is the critical property that makes reads efficient: to look up a key in , we don’t scan all files. We use the key ranges to jump directly to the one file that could contain it. When accumulates enough files, a compaction runs to merge into . This merge enforces the non-overlapping invariant: files (which may overlap) get merged with the relevant L1 files (which define the ranges), producing new files with clean, non-overlapping ranges. Similarly, when grows too large, a compaction merges part of into . Each deeper level is typically larger by a fixed ratio, for example, 10x. might hold 10 MB, 100 MB, 1 GB, and so on. Most data ends up in the deepest level. Most compaction work happens between levels. The benefit is controlled read amplification . To look up a key, we check the memtable, scan all files, then do one binary search per deeper level. The number of deeper levels grows logarithmically with data size. For a dataset with a few levels, that’s a small, bounded number of disk reads, regardless of how many total SSTables exist. When compaction falls behind and accumulates too many files, the engine may trigger a write stall : new writes are paused until compaction catches up and is drained. This is one of the more painful operational issues in LSM-based systems. Leveled compaction is also not the only strategy. Tiered compaction , used by Cassandra, for example, takes a different approach: instead of enforcing non-overlapping ranges per level, it groups SSTables of similar size and merges them when a tier grows too large. Tiered compaction generates less write amplification but more read amplification. The right choice depends on the workload. Leveling helps with reads, but there is still one painful case: looking up a key that doesn’t exist . For a missing key, the engine checks the memtable (not there), checks each L0 file (not there), then checks one file per deeper level (not there). Each check is a disk read. Even with leveling, this adds up. Bloom filters solve this. A Bloom filter is a probabilistic data structure that can answer one question: Is this key definitely not in this SSTable? It has no false negatives: if the key is in the SSTable, the filter will say so. It can have false positives (occasionally it says a key might be present when it isn’t), but in practice, the false positive rate is tunable and kept very low. Many implementations attach a Bloom filter to each SSTable, built at creation time from all the keys it contains. The filters are small, a few kilobytes per SSTable, so they can be loaded into memory at startup and kept there. How does it work? A Bloom filter is a bitset. When a key is added, several hash functions are applied to it, each producing an index into the bitset. The bit at each index is set to 1. To check if a key is in the filter, the same hash functions are applied. If any of the resulting bits is 0, the key is definitely not in the SSTable. No disk read needed. If all bits are 1, the key might be there, and the engine proceeds to read the SSTable. The practical impact is significant. For a key that doesn’t exist (the worst case), the engine skips almost every SSTable without a single disk read . Only the rare false positive triggers an unnecessary disk read. Read amplification for missing keys drops dramatically. Some engines take this further and attach Bloom filters not just per SSTable but per data block within an SSTable, enabling even more precise filtering before fetching a block from disk. Everything described so far assumes a single thread. In reality, a storage engine needs to handle concurrent reads and writes, while flush and compaction run in the background . This is where things get subtle. The core problem: a flush operation replaces the current memtable with a new one and registers a new SSTable in the catalog. A compaction operation removes old SSTables and registers new ones. If a read is in the middle of searching an SSTable that gets deleted by a concurrent compaction, that’s a crash. One common solution is a versioned catalog . A catalog is a snapshot of the engine’s state at a point in time: a reference to the current memtable, the current WAL path, and the current catalog file. Every incoming request acquires the latest catalog version, pins it by incrementing a reference count, performs its work, then releases it by decrementing the reference count. Background workers (the flush worker and the compaction worker) never modify an existing catalog . Instead, when a flush or compaction completes, they create a new catalog version pointing to the updated memtable and SSTable set. From that moment, new requests acquire the new catalog. Old requests that pinned the previous catalog continue reading from it safely. An old catalog version is only cleaned up (its SSTables deleted, its WAL file discarded) when its reference count drops to zero. No reader is using it anymore, so it is safe to remove. This approach keeps foreground reads and writes lock-free in the hot path. Background operations never block requests, and requests never block background operations. They operate on independent catalog versions and only synchronize at the moment of catalog swap , which in many implementations is a single atomic pointer update. The versioned catalog is also what makes crash recovery clean. On startup, the engine reads the latest catalog file on disk, which always reflects a consistent state: either from before the last flush/compaction, or after. Any SSTables on disk not referenced by the catalog are orphans from an incomplete operation and can be safely deleted. AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI won’t replace. Written by a Google SWE, trusted by thousands of engineers worldwide. LSM trees optimize for write throughput by turning random disk writes into sequential ones, at the cost of more complex reads. The memtable absorbs writes in memory; an ordered structure like a skip list, balanced BST, or radix trie keeps keys sorted for efficient flushing. The WAL provides durability: every write is logged to disk before the memtable is updated, enabling crash recovery. SSTables are immutable, sorted files produced by flushing the memtable; a binary block format with checksums makes point lookups efficient and reads safe. A catalog file tracks which SSTables are live and is updated atomically to ensure the engine always has a consistent view of disk state. Read amplification is the fundamental trade-off: finding a key may require searching multiple SSTables, one per level, plus all files. Compaction merges SSTables, eliminates redundant entries, and reclaims space, at the cost of write amplification and background I/O. Tombstones handle deletions in an immutable structure; they can only be discarded when no older value they shadow still exists on disk. Leveling organizes SSTables into levels with non-overlapping key ranges, bounding read amplification to one file lookup per level. Tiered compaction is an alternative strategy that trades less write amplification for more read amplification. Bloom filters allow the engine to skip SSTable reads for missing keys with near certainty, eliminating the worst-case read scenario. A versioned catalog is one common approach to enabling lock-free concurrent reads and background operations by letting each request pin a consistent snapshot of engine state. CRDTs Explained Availability Models Explained The PACELC Theorem Explained The Log-Structured Merge-Tree (LSM-Tree) // The original LSM tree whitepaper. Log Structured Merge Tree - ScyllaDB // LSM tree definition from ScyllaDB technical glossary . Build Your Own Key-Value Storage Engine IO devices and latency Fundamental Insights To understand LSM trees, we first need to understand why writes are hard. A B-tree-based database updates data in place . When we write a key, the engine finds the right page on disk and modifies it. This is a random write: the disk head has to seek to an arbitrary location before writing. On spinning disks, that seek takes time. But even on SSDs, random writes cause problems: they wear out cells unevenly and trigger expensive internal garbage collection. LSM trees take a completely different approach. Instead of writing data where it ultimately belongs, they write data sequentially . Writes are recorded in memory and appended to a log file for durability. When the in-memory buffer fills up, its contents are streamed to a new file in one sequential pass. Sequential writes are dramatically faster than random writes because there is no seeking involved. The disk just keeps writing forward. The price of this design is complexity. Data doesn’t live in one place. It accumulates across multiple files over time, and those files need to be periodically merged and reorganized in the background to stay manageable. That background work is what every piece of an LSM tree is built around. The in-memory buffer is called the memtable . The sorted files on disk are called SSTables . We’ll look at each in detail. The Memtable Every write in an LSM tree starts in memory, in a structure called the memtable. The memtable is a mutable, in-memory store . When a write request arrives, the engine records the key-value pair in the memtable and appends it to a sequential log file on disk (called the write-ahead log, or WAL, which we’ll cover in the next section). The WAL write is a sequential append, so it is fast. There is no random I/O, no page lookup, no in-place modification. This is why LSM trees can sustain very high write throughput. A hashtable works for lookups but not for in-order iteration. Sorting a hashtable takes at flush time. A better choice is an ordered data structure. The most common in practice is a skip list ; for example, LevelDB and RocksDB both use one as their default. A radix trie is another elegant option: it keeps keys in lexicographic order naturally, so iterating in order is just a depth-first traversal, and flushing becomes a simple stream with no sorting step needed. A balanced BST works too. Production implementations typically attach a monotonic sequence number to each entry, so the engine can always determine which version of a key is the most recent, regardless of arrival order. The memtable doesn’t grow forever . At some point, it gets flushed to disk, and a new empty memtable takes its place. What triggers that flush depends on the implementation: it can be a size limit (a number of entries or a memory threshold), elapsed time, or memory pressure, for example. That flush produces a sorted file on disk called an SSTable, which we’ll look at after the WAL. The Write-Ahead Log There is a problem with keeping writes in memory: if the process crashes, everything in the memtable is gone. Any write the client received an acknowledgment for is now lost. That breaks a core database guarantee: durability . The solution is a Write-Ahead Log, or WAL. Before writing to the memtable, the engine appends the operation to the WAL, an append-only file on disk . Only after the WAL entry is safely persisted does the engine update the memtable and acknowledge the client. This ordering is what the “write-ahead” in the name refers to: the log is always written before the in-memory state changes. The WAL is not the final home for data; it’s a safety net. If the engine crashes and restarts, it replays the WAL from the beginning to reconstruct the memtable, recovering any writes that hadn’t been flushed to disk yet. One subtlety: writing to a file is not the same as persisting it. Operating systems buffer writes in memory before flushing to disk. To guarantee durability, the engine must call after each WAL entry, forcing the OS to flush its buffers to physical storage. This is not free, though. adds latency to every write. Production systems often use instead, which persists the data without flushing unnecessary file metadata, keeping WAL appends faster. Many also use a technique called group commit to amortize this cost further: instead of syncing after every write, they batch multiple WAL entries and call once for the group. The WAL introduces write amplification : the ratio of data written to disk versus data actually requested by a client. Every byte we write to the database gets written to disk twice: once to the WAL immediately, and once to an SSTable when the memtable is eventually flushed. That cost buys us durability. SSTables As we said, when the memtable fills up, it gets written to disk as a Sorted String Table, or SSTable . An SSTable is an immutable, sorted file. Immutable means it is never modified after creation. Sorted means keys are stored in lexicographic order. Both properties matter: Immutability makes SSTables safe to read concurrently without locking. Sorted order makes lookups inside a file efficient. is the landing zone . When the memtable flushes, the resulting SSTable lands in L0. files can have overlapping key ranges: two L0 files might both contain entries for key . This is acceptable because L0 files are small and short-lived. and deeper levels are different. Each level maintains non-overlapping key ranges across all its files . A given key can exist in at most one file per level. This is the critical property that makes reads efficient: to look up a key in , we don’t scan all files. We use the key ranges to jump directly to the one file that could contain it. Each deeper level is typically larger by a fixed ratio, for example, 10x. might hold 10 MB, 100 MB, 1 GB, and so on. Most data ends up in the deepest level. Most compaction work happens between levels. The benefit is controlled read amplification . To look up a key, we check the memtable, scan all files, then do one binary search per deeper level. The number of deeper levels grows logarithmically with data size. For a dataset with a few levels, that’s a small, bounded number of disk reads, regardless of how many total SSTables exist. When compaction falls behind and accumulates too many files, the engine may trigger a write stall : new writes are paused until compaction catches up and is drained. This is one of the more painful operational issues in LSM-based systems. Leveled compaction is also not the only strategy. Tiered compaction , used by Cassandra, for example, takes a different approach: instead of enforcing non-overlapping ranges per level, it groups SSTables of similar size and merges them when a tier grows too large. Tiered compaction generates less write amplification but more read amplification. The right choice depends on the workload. Bloom Filters Leveling helps with reads, but there is still one painful case: looking up a key that doesn’t exist . For a missing key, the engine checks the memtable (not there), checks each L0 file (not there), then checks one file per deeper level (not there). Each check is a disk read. Even with leveling, this adds up. Bloom filters solve this. A Bloom filter is a probabilistic data structure that can answer one question: Is this key definitely not in this SSTable? It has no false negatives: if the key is in the SSTable, the filter will say so. It can have false positives (occasionally it says a key might be present when it isn’t), but in practice, the false positive rate is tunable and kept very low. Many implementations attach a Bloom filter to each SSTable, built at creation time from all the keys it contains. The filters are small, a few kilobytes per SSTable, so they can be loaded into memory at startup and kept there. How does it work? A Bloom filter is a bitset. When a key is added, several hash functions are applied to it, each producing an index into the bitset. The bit at each index is set to 1. To check if a key is in the filter, the same hash functions are applied. If any of the resulting bits is 0, the key is definitely not in the SSTable. No disk read needed. If all bits are 1, the key might be there, and the engine proceeds to read the SSTable. The practical impact is significant. For a key that doesn’t exist (the worst case), the engine skips almost every SSTable without a single disk read . Only the rare false positive triggers an unnecessary disk read. Read amplification for missing keys drops dramatically. Some engines take this further and attach Bloom filters not just per SSTable but per data block within an SSTable, enabling even more precise filtering before fetching a block from disk. Concurrency Everything described so far assumes a single thread. In reality, a storage engine needs to handle concurrent reads and writes, while flush and compaction run in the background . This is where things get subtle. The core problem: a flush operation replaces the current memtable with a new one and registers a new SSTable in the catalog. A compaction operation removes old SSTables and registers new ones. If a read is in the middle of searching an SSTable that gets deleted by a concurrent compaction, that’s a crash. One common solution is a versioned catalog . A catalog is a snapshot of the engine’s state at a point in time: a reference to the current memtable, the current WAL path, and the current catalog file. Every incoming request acquires the latest catalog version, pins it by incrementing a reference count, performs its work, then releases it by decrementing the reference count. Background workers (the flush worker and the compaction worker) never modify an existing catalog . Instead, when a flush or compaction completes, they create a new catalog version pointing to the updated memtable and SSTable set. From that moment, new requests acquire the new catalog. Old requests that pinned the previous catalog continue reading from it safely. An old catalog version is only cleaned up (its SSTables deleted, its WAL file discarded) when its reference count drops to zero. No reader is using it anymore, so it is safe to remove. This approach keeps foreground reads and writes lock-free in the hot path. Background operations never block requests, and requests never block background operations. They operate on independent catalog versions and only synchronize at the moment of catalog swap , which in many implementations is a single atomic pointer update. The versioned catalog is also what makes crash recovery clean. On startup, the engine reads the latest catalog file on disk, which always reflects a consistent state: either from before the last flush/compaction, or after. Any SSTables on disk not referenced by the catalog are orphans from an incomplete operation and can be safely deleted. AI is getting better every day. Are you? At The Coder Cafe, we serve fundamental concepts to make you an engineer that AI won’t replace. Written by a Google SWE, trusted by thousands of engineers worldwide. Summary LSM trees optimize for write throughput by turning random disk writes into sequential ones, at the cost of more complex reads. The memtable absorbs writes in memory; an ordered structure like a skip list, balanced BST, or radix trie keeps keys sorted for efficient flushing. The WAL provides durability: every write is logged to disk before the memtable is updated, enabling crash recovery. SSTables are immutable, sorted files produced by flushing the memtable; a binary block format with checksums makes point lookups efficient and reads safe. A catalog file tracks which SSTables are live and is updated atomically to ensure the engine always has a consistent view of disk state. Read amplification is the fundamental trade-off: finding a key may require searching multiple SSTables, one per level, plus all files. Compaction merges SSTables, eliminates redundant entries, and reclaims space, at the cost of write amplification and background I/O. Tombstones handle deletions in an immutable structure; they can only be discarded when no older value they shadow still exists on disk. Leveling organizes SSTables into levels with non-overlapping key ranges, bounding read amplification to one file lookup per level. Tiered compaction is an alternative strategy that trades less write amplification for more read amplification. Bloom filters allow the engine to skip SSTable reads for missing keys with near certainty, eliminating the worst-case read scenario. A versioned catalog is one common approach to enabling lock-free concurrent reads and background operations by letting each request pin a consistent snapshot of engine state. CRDTs Explained Availability Models Explained The PACELC Theorem Explained The Log-Structured Merge-Tree (LSM-Tree) // The original LSM tree whitepaper. Log Structured Merge Tree - ScyllaDB // LSM tree definition from ScyllaDB technical glossary . Build Your Own Key-Value Storage Engine IO devices and latency

0 views

On people writing about their use of AI

I find the trend of people posting about the way they use generative AI to be fascinating at an anthropological level. I do not remember the last time a piece of technology pushed so many different people into writing about the way they use it, or not use it, or abuse it, or misuse it. To me, this is way more interesting and intriguing than the technology itself. I obviously do not know why so many people are doing so, and I suspect they must all have their own specific reasons, but I currently have three main theories but I’m sure there are more than that. The first theory is that a good percentage is trying to capitalize on the trend in an attempt to become some sort of AI thought leader. Those people are insufferable. They usually hang out on LinkedIn, but sometimes they escape containment, and they remember that they do have a blog (and that’s often a Substack, unsurprisingly) where they can post these generic-looking blog posts filled with lists and it’s-not-this-it's-that statements. The second theory is that techies are gonna tech. A lot of the people who have blogs are also into tech, and gen AI is an interesting piece of tech and so it’s natural that those people will end up writing about how they use AI. The third and final theory is that there’s a group of people who feel the need to distance themselves from what AI represents. So those posts are not really about the technology itself, but rather a statement on the state of the world around them, and they want to make it clear if and how they participate in it. This final group is to me the interesting one. Now, if you’re a techie, don’t be mad at me, I’m not saying you’re not interesting, because you are (if instead you’re an AI bro, click here . You're welcome.) I’m saying the last group is the interesting one because to me, it’s fascinating how people feel compelled to justify or explain to strangers on the Internet how they interact with a piece of technology. And it’s especially fascinating because it’s a completely pointless exercise in my opinion. Let’s pretend you just landed on my blog for the first time (hi, welcome, nice to have you here) and you have no idea who I am. For all you know, I might not even be a real person. This entire website could be a psyop run by the Italian government. With that in mind, what’s the value of a post in which I tell you how I use or not use AI from a moral perspective? Would it make a difference if I were to tell you that I don’t use it? Or that I use it maybe once a day to answer a coding-related question? What if I told you that I don’t use AI at all, but in reality, this post was entirely generated by a swarm of AI agents while I was outside walking the dog, enjoying life? Unless you have prior knowledge of me and this blog, a post like that, in a vacuum, would be meaningless. How about the opposite case, though? Let's now pretend you weren’t new here, and you had, in fact, been following this blog since 2017. If that was the case, you wouldn't even need me to write that blog post, because by this point, you’d have all the necessary information to make an informed judgment. And you’d also know that you could ping me via email or via DM and ask me directly if you had any doubt about anything related to this topic. In both cases, a post stating my use of AI would have pretty much zero value. Which genuinely makes me wonder why so many people feel compelled to write about this stuff. If you wrote one of these posts, can I ask you why? Why do you feel the need to explain how you use this technology? Is there a specific reason? I’d love to know. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
Martin Fowler Yesterday

Three more static code analysis sensors

Birgitta Böckeler adds discussion of three more sensors for static code analysis, focusing on checking and enforcing better modularity. Computational sensors for dependency checks were good at enforcing rules, but the rules were limited. Building a computational sensor for coupling data proved lackluster. Prompting an inferential sensor to review modularity was more effective.

0 views
マリウス Yesterday

Photography Workflow with ~~Darktable on Linux~~ Lightroom on GrapheneOS

Disclaimer: I had initially prepared this post under the title Photography Workflow with Darktable on Linux , but after endless fights with Darktable I eventually decided to scrap that workflow altogether and look for an alternative. The workflow documented herein is unfortunately very far from the result I was striving for, yet it is sadly the best I can put together given the current state of open-source RAW development and photo editing software. After I gave Adobe the finger back in 2019 and moved my photography workflow to Capture One on a MacBook , I eventually had to reconsider this approach when I moved back to Linux on the desktop and replaced the device with a Linux laptop . I briefly tried running Capture One in a Windows VM on my laptop , but decided against it, as it was a huge PITA and lacked proper hardware acceleration. Initially I considered a fork of what is probably the best-known open-source RAW developer and photography workflow application out there, Darktable , called Ansel , but ultimately decided against it. The points that Ansel ’s author, Aurélien, brought up seemed like valid criticisms and demonstrated both his knowledge of and his passion for making Darktable a better tool. However, reading further through his website and his GitHub account, it became apparent that he might be the kind of misunderstood genius who has great ideas and ambition, but who would ultimately struggle to operate within, let alone lead the kind of community required to successfully maintain a fork of a piece of software this large. I therefore didn’t have high hopes of this lone cowboy keeping up with, let alone surpassing, the development efforts the Darktable community is currently putting in. Given that Ansel was explicitly billed as a hard-fork that would not remain compatible with the official Darktable release, going down that path felt too risky. Ansel would ultimately have to provide a migration path for existing Darktable users, as otherwise there would be little to no incentive for anyone with a functioning Darktable workflow already in place to put up with the effort. Instead, I decided to stick with Darktable . For about a year I tried to build a new workflow on top of it. The things I would miss the most from Capture One were the VSCO presets that I had brought over from Lightroom , and for which there didn’t seem to be any way to convert them into a format compatible with Darktable while producing roughly similar results. Luckily, João, a developer and photographer, made what he calls t3mujinpack , a collection of film emulation presets for Darktable . In a blog post , he provides details on which film stocks are included and how to make use of them in Darktable . His pack includes the presets I almost exclusively use from VSCO : Kodak’s Portra 160, 400 and 800. While the results aren’t 100% identical to what Capture One produces with the converted VSCO packs, neither are those exports identical to what Lightroom originally produced. Every piece of software has slight differences in its inner workings, so this is to be expected and can be adjusted for. During my travel through all of Spain in 2024 I decided to rely exclusively on Darktable for developing and editing the photos that I would ultimately upload to this site. That was a big mistake. I rarely say bad things about truly open-source software, because ultimately it is open-source, it’s driven by a community of volunteers, and everyone should be happy that these people do what they do. Also, given that it’s open-source, anyone is free to go ahead and improve what they deem worth improving. However, Darktable is, in my opinion, one of the few exceptions that seem to have derailed so badly that it’s fair to say it has reached a point of no return in terms of usability and jankiness . Let me explain by starting with one of the most annoying things: More often than not, Darktable crashes in the middle of editing sessions, apparently due to Wayland-related issues. However, since I’m also running GIMP and Blender , which I would argue do similar, or even slightly more complex things than Darktable , yet don’t run into such issues, I’d assume that this is not a problem with my Wayland setup specifically. I didn’t try to debug the issue further, as I was mainly focused on testing and establishing a workflow. Had Darktable otherwise worked perfectly fine for me and only run into this issue every once in a while, I would have dug deeper to find the root cause. Unfortunately, this was only one of many things that kept me from continuing to use Darktable . Besides the random crashes, Darktable is unbearably janky and slow. The UI feels like it’s about to fall over at any moment, regardless of whether ROCm acceleration is enabled or not. UI elements feel hacked together, the overall navigation is hostile towards regular users, and it’s impossible to find anything just by looking, because everything is hidden behind collapsed modules, tabs and a gazillion sliders and buttons. To give a single example of the sheer UI craziness that is Darktable : To rotate an image to the right (clockwise), you need to drag a slider to the left (counterclockwise). While on a touchscreen interface this might be more intuitive, when using a touchpad on a laptop or even a mouse it definitely doesn’t feel natural. After all, maybe a slider isn’t the best UI element for this operation to begin with? Another issue that I experienced was related to organizing photos. With over 4000 (RAW) photos in the library, Darktable becomes unbearable to work with. Aside from the spontaneous crashes and overall slow UI, finding specific photos in a library of that size is an excruciatingly painful task. Unlike Capture One and Lightroom , Darktable doesn’t easily support a workflow based on individual, smaller libraries, e.g. organized by location or event. There are ways to sort photos within Darktable ’s main library, but I couldn’t find an easy way to split them out into multiple small libraries. Assuming that you managed to find and edit the photos you were looking for, the headaches continue when you try to export them. It appears that Darktable is unable to export photos with pixel-perfect adherence to the crop aspect ratio . The implementation details and the proposed solution appear to be just as janky as everything else, and a quick search for in the Darktable GitHub repository uncovers a lot more of that same jankiness. I ended up running the following command over every photo exported by Darktable , just to obtain a properly shaped image, meaning I’d lose a few pixels here and there: As mentioned a long time back in an update , I ended up with a broken Darktable library, meaning that I lost all the adjustments that I did manage to export up until that point . Short story long, I eventually ditched Darktable for a plan B . After Darktable broke my library and I lost months’ worth of edits, I found myself back at square one. The idea of returning to Adobe felt like defeat, but when I looked at what was actually available for my setup, which is a Google Pixel Tablet running GrapheneOS , Adobe Lightroom for Android turned out to be the only realistic option that could handle RAW files and offer a non-destructive editing workflow. Adobe Lightroom Mobile is, on paper, a reasonably capable RAW editor for Android. It supports a wide range of camera RAW formats and offers the familiar tone curve, HSL sliders, color grading, masking, and healing tools that anyone coming from desktop Lightroom will recognize. It can read photos directly from the device’s own storage, edit them locally without an internet connection, and export to JPEG with full control over quality and output dimensions. In short, the feature set is there. The physical side of the workflow is straightforward. I attach a USB-C SD card reader to the Pixel Tablet, open a file manager, and copy the RAW files from the card into a dedicated folder on the tablet’s internal storage. From there I open Lightroom , import the photos from that folder into a local album, and work through them one by one. Once a photo is where I want it, I export it as a JPEG into the folder on the tablet’s storage. That folder is monitored by Syncthing , which synchronizes the finished exports to my other devices in the background. The performance of Adobe Lightroom on Android is, to put it mildly, terrible. Rendering a RAW preview after entering edit mode takes long enough that you find yourself staring at a loading indicator more often than at the actual photo. Scrolling through a grid of thumbnails is a choppy, stuttering affair that makes you wonder whether the application is doing something computationally expensive or is just poorly written. I acknowledge that the Pixel Tablet is an older budget device, yet Lightroom treats it as if it were running on hardware from 2005. Lightroom on Android is every bit as buggy as Adobe products traditionally are on macOS and Windows, but somehow worse, because the interface is also frequently broken in ways that make the application essentially unusable without restarting it. The UI will routinely enter a state where confirmation and action buttons either stop responding to taps, as if the touch layer has fallen out of sync with whatever is rendered on screen, or simply disappear altogether. The only resolution is to quit the app and reopen it, at which point you hope that the edit you were in the middle of survived. Entire features will similarly go dark without warning. The auto-straighten function, which should detect the horizon in a photo and level it, simply grays out and stops working at some point. No error, no indication as to why it has become unavailable, nothing. Again, restart the app, try again, maybe it works this time. These are not edge cases or exotic scenarios, but rather the normal operating experience of Adobe Lightroom Mobile . One of the things I was most concerned about before committing to this workflow was the prospect of Adobe silently uploading my photos to their cloud infrastructure. The desktop version of Lightroom has a long and well-documented history of syncing content to Adobe’s servers in ways that are easy to miss and difficult to fully disable. On Android, GrapheneOS gives you a tool that the desktop doesn’t: Per-application network permission revocation. I first disabled the cloud sync option within Lightroom ’s own settings, then went into GrapheneOS’s permission manager and removed the network permission from the Lightroom app entirely. It continues to function as a local RAW editor without any network access whatsoever. Photos stay on the device. Nothing leaves without my explicit say-so via Syncthing. Note: To keep things simple, I did not go into the fact that Lightroom is running inside an Android 16 Private Space , which also contains a sandboxed instance of Google Play Services and lets me create a virtual barrier between the rest of the FOSS apps on the Pixel Tablet and this spyware malware crap proprietary software. With this setup, however, importing data becomes slightly more tedious, as it requires the Google Files app to be able to read an attached USB-C storage device (SD card) from within the Private Space . The Google Files app is a giant UX disaster all by itself, into which, for the sake of our both’s time and mental health, I won’t dive into. One pleasant surprise was that I managed to import the VSCO Lightroom presets I purchased well over a decade ago into Lightroom Mobile on Android. The preset files still work, and the film emulations I had relied on for years, in particular the Kodak Portra series, show up in the presets panel and can be applied to photos. With Adobe being Adobe, however, this had to come with a catch. Lightroom Mobile is apparently incapable of remembering which preset was applied to a given photo. Open an edited photo that had a VSCO preset applied, and Lightroom will display a warning telling you it cannot find the preset, even though the preset is sitting right there in the presets list, available and functioning, ready to be applied to new photos. The edit itself is intact… well… at least sometimes. Other times, Lightroom simply loses the edits altogether. It’s the kind of bug that suggests the feature was never properly tested beyond the initial happy path, which is about what you’d expect from Adobe. To be frank, this workflow sucks compared to the one I had on macOS using Capture One . Lightroom is still the terrible POS it had always been, and paying money to a company like Adobe feels like funding a criminal organization. Unfortunately, there doesn’t appear to be a viable alternative, especially not one that’s libre . The remaining options would be to either pay into Apple’s walled garden by purchasing one of their newer iAmtheproduct devices and subscribing to Capture One Mobile , or to rely exclusively on Fuji’s in-camera film simulations (which sadly won’t work for the Sony ). Judging by the reviews of Capture One Mobile , however, the former option doesn’t appear too promising either. Looking at the situation in a more positive light, I nevertheless managed to replace the underlying stack on which my photography workflow runs with more privacy-respecting software ( GrapheneOS ). That’s at least something , although it seems this workflow won’t live that long either, given that Google keeps locking down their Pixel devices and GrapheneOS appears to be pivoting to Motorola-made hardware , who might not release a GrapheneOS-compatible Moto Pad anytime soon. Oh well. Pro tip: A USI 2.0 pen makes using Lightroom on a device like the Pixel Tablet significantly less painful, at least as long as the USI pen actually works properly, which sadly isn’t always the case with the Renaisser pen I own. If you’re looking for a more general review of the Google Pixel Tablet with GrapheneOS, look here .

0 views
David Bushell Yesterday

Google just spat in my face

It’s Google I/O week and this year’s theme is performative slop . Budding Googlers battle it out on stage vying for executive eyeballs. The prize? Exemption from the next culling . As you might know AI isn’t my cup of tea and my AI policy explains why. AI peddlers like Google have made one thing abundantly clear: their product will take your skills. It will take your profession. It will dehumanise you and you’ll pay for it. I figured Google’s Prompt API would be the most offensive attack on an open web I’d witness this month. Nope! Google’s new microsite has sent me apoplectic. Modern Web Guidance is a set of evergreen and expert-vetted skills that guide your AI coding agents across many common use cases to build modern web experiences that are accessible, performant, and secure. Build with Modern Web Guidance At first glance this is nothing more than an advertisement for the AI industrial complex. I made the mistake of engaging my brain for a closer look. Brain engagement is discouraged so I only have myself to blame for the ensuing rage. Google spits in the face of professional web development. Where do I even start? The repeated use of “modern web” implies that current development practices are out of date. Throw away all established knowledge because Google has changed the game . The entire chat-box-driven-development craze has been a long series of “you’re prompting it wrong” arguments. Are we to understand that Google’s new magic incantations have settled the debate once and for all? Which experts? Google, I assume. You are no longer an expert. You are token consumer number six. Expertise are not a privilege extended to consumers. Forgive my ignorance but I struggle to understand how AI addicts define “skills”. From what I can understand these “skills” are text prompts? “Skills” used to refer to the trained abilities required to do a professional job. I’m no prescriptivist but this is slopaganda. Google’s idea of “modern web” is a deskilling effort that should deeply offend developers to their core. It should also offend the AI apologists. Google thinks you’re too stupid to articulate your prayers coherently so just copy-paste the ten commandments. Defer to the almighty bullshitter in the cloud! What do you think a fair wage is for a professional developer who has less agency than Butter Bot? They’ll say this “democratises” web development alongside all and every profession in which AI has been violently forced . And what is the end goal? To deskill you so far down the ladder you’ll be forced into token servitude. To make a handful of billionaires even richer. Prompt boxes are not “just a tool” they are the end of your career. Implement a starter Content Security Policy (CSP) without breaking my app. Don’t break it bro! Pinky promise? Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views
Stratechery Yesterday

Google I/O, World Models, I/O Spaghetti

Google I/O put AI everywhere, for better and for worse. Meanwhile, is DeepMind aligned with Google's business objectives?

0 views

How many app SDKs did Publicis add with LiveRamp acquisition?

I decided to check AppGoblin to see how many apps LiveRamp had when Publicis acquired it for $2b, the answer seems to be ~300 mobile apps out of the top 200k apps. This is based on apps found with the LiveRamp SDKs. The apps also have a particularly, older vibe to them, looking at some of the larger apps we see apps like Flipboard, Badoo and Skout using LiveRamp. The monthly installs here are likely even too high, as they are still based on some adjustments Google Play made to install counts in April that boosted many apps’ lifetime installs. Looking at the trend of market share for LiveRamp , as AppGoblin has crawled more and more apps over the past year, the marketshare for LiveRamp seems to have remained quite small and stead at ~0.13%. There was a ‘high growth’ but it was at the beginning of the data period, so this is was unlikely to be a true high growth period for LiveRamp. Overall I was surprised that the LiveRamp data was so little, though given the name brand of some of these apps, perhaps the LiveRamp deal is much more about online sites than it was mobile properties.

0 views
(think) Yesterday

nREPL Forever

Last week I announced Port , a small prepl client for Emacs. That post focused on Port itself, but writing it left me with the itch to do a follow-up on the bigger picture, because the socket REPL / prepl story is one I’ve been meaning to write up for years. If you’ve been around Clojure long enough, you remember the chatter. Socket REPL landed in Clojure 1.8 (January 2016), prepl in Clojure 1.10 (December 2018), and for a couple of years there was a steady stream of posts, tweets, and Slack threads to the effect of “this is what we should be building tools on. nREPL is on the way out.” Some serious people put their weight behind that idea, and some of them went and built tools to prove it. Now it’s 2026 and we can take stock. The pitch was good. Socket REPL is just the Clojure REPL exposed on a TCP port. prepl wraps it with a structured printer so the bytes coming back are EDN-tagged maps ( , , , ) instead of a human-readable prompt. Both ship with Clojure itself. No external server library, no middleware, no third-party namespaces. You start a JVM, you bind a port, you’re done. The intellectual case for moving off nREPL had been made by Rich Hickey himself, most clearly in a March 2015 clojure-dev post that’s worth reading in full. Rich didn’t actually attack nREPL by name in that message. What he did was argue carefully for what a REPL is : a thing that reads characters, evaluates forms, prints results, and loops, with those streams available to user code so that things like nested REPLs and debuggers compose naturally. The money line: While framing and RPC orientation might make things easier for someone who just wants to implement an eval window, it makes the resulting service strictly less powerful than a REPL. His proposal, in the same post, was that tools should open multiple connections to the running program: one for the human-facing stream, and dedicated channels for IDE operations. The socket REPL (which landed in 1.8 the following January) and prepl (which arrived in 1.10) were the official implementation of that worldview. A handful of editor projects took the cue and built clients: It was real momentum. If you were following Clojure tooling in 2018-2020, it genuinely felt like nREPL might be the past, and the future would be some combination of socket REPL plus a thin self-installing protocol on top of it. You can find a fair number of “RIP nREPL” hot takes from that period if you go looking. I went and surveyed each of those projects recently while working on Port. The pattern is depressingly consistent: Tutkain started on prepl. In November 2021, its v0.11 release explicitly stopped using prepl message framing and switched to a hand-rolled EDN-RPC protocol that Tutkain boots onto the raw socket REPL by sending it a base64-encoded blob. The new protocol has request ids, op dispatch ( , , , , , , …), and server-managed thread bindings. In other words: Tutkain grew into nREPL, just spelled differently. Chlorine never used prepl directly. It used socket REPL plus an -style upgrade blob. Its author’s successor project, Lazuli , abandoned the whole approach in favor of nREPL. The post-mortem is worth reading and is fairly blunt: tools that attempted prepl went back to nREPL because, honestly, it’s simply better. Conjure had a prepl client in its early Rust days. The current Lua/Fennel rewrite ships only an nREPL client. The author’s reasoning in the release notes was that nREPL “has complete ecosystem adoption and brilliant ClojureScript support.” Clojure-Sublimed technically still talks to a raw socket REPL, but only after sending it an EDN-printing prelude that upgrades the REPL to a structured protocol of tonsky’s own design. His post on the topic is one of the most thoughtful pieces I’ve read on Clojure REPL design, and his conclusion is roughly: the bare socket REPL is more useful than prepl because you can install your own protocol on top of it. Which is true. But notice that everyone who reached that conclusion ended up reinventing the same wheel: ids, ops, request/response correlation, completion support, lookup, interrupts. You know, the things nREPL has had since 2010. So the trajectory looks roughly like this: Pure prepl clients are nearly extinct in the wild. The one I found that qualifies is propel by Oliver Caldwell (of Conjure fame), which is delightful, about 70 lines of Clojure, and explicitly synchronous (one outstanding eval at a time). That works! But it’s not a foundation for the kind of feature set people expect from an editor. Here’s where I land. Rich isn’t wrong that prepl is closer to a “real” REPL in the strict sense. prepl genuinely is a more faithful encoding of read-eval-print: each form goes in, each result comes out, and the semantics match what you’d get at the standard REPL prompt. The thing is, “real REPL” is not the property you optimize for when you’re building editor tooling. The properties editor tooling actually needs are: nREPL was explicitly designed for those properties. The ops, middleware, and transport abstractions exist precisely because the people building it knew the consumers are not humans typing at a prompt, they’re programs negotiating a session. Calling nREPL “not a real REPL” is technically defensible and practically beside the point. Nobody on the consuming end is confused about what nREPL is for . I wrote about nREPL’s revival in 2018 . At that point I had just finished migrating the project out of Clojure Contrib, and the goal was to give it a real home and a working development process. It was a lot of work, but in hindsight things played out pretty well. Looking at where things ended up: Meanwhile prepl is, as best as I can tell, mostly a curiosity. It got me a side project I had fun with. It did not displace nREPL. The history of tooling protocols is full of cases where “purer”, “simpler”, or “more elegant” lost to “shipped, documented, and battle-tested.” LSP beat fifteen ad-hoc language protocols. DAP beat the same fifteen debuggers. nREPL beat prepl in the (Clojure) editor space. It’s not that the simpler thing is bad. prepl is a fine, elegant little protocol, and there’s a real case for embedding it in CI scripts, ops automation, deployment pipelines, or anywhere you want to drive a Clojure VM programmatically without pulling in a server library. Use it there. But for editor tooling? The Clojure community made an enormous, multi-year, multi-tool investment in nREPL. We have the protocol, the middleware, the manual, the books, the conference talks. nREPL works, it’s actively maintained, it’s increasingly portable across Clojure dialects, and the design decisions that Rich called out as un-REPL-like are the exact ones that make it a good substrate for editors. So I’ll say what I felt awkward saying back in 2018: nREPL forever. It’s the right abstraction for the job, and it’s not going anywhere. One more thing. After finishing Port I got curious what a minimal nREPL client would look like by comparison, so I went and built one. As you can imagine, it turned out to be significantly simpler. If that sounds interesting, take a look at neat , a small, language-agnostic nREPL client for Emacs. Keep hacking! Tutkain for Sublime Text Chlorine for Atom Conjure for Neovim (in its early Rust incarnation) Clojure-Sublimed by Nikita Tonsky a steady drip of smaller experiments around , , and friends Editor decides nREPL is too heavy or an undesirable external dependency and starts on prepl. Editor discovers prepl has no ids, no ops, no interrupts, no server-side completion, no namespace tracking, no test runner integration, etc. Editor rolls a custom protocol on top of socket REPL, or… Editor gives up and goes to nREPL. A way to correlate a request with its response when output and results are interleaved. A way to multiplex – one connection, several logical conversations. Server-side hooks for the operations every IDE expects: completion, lookup, go-to-definition, find-references, test running, stacktrace structuring, interrupt. A protocol stable enough that ten different editors can target it without each one inventing its own dialect. nREPL itself is healthier than it has ever been. Active maintainers, a proper manual , a steady release cadence, an actual ecosystem organization on GitHub. Most popular Clojure editors support it. CIDER , Calva , Cursive (via its own client), Conjure, vim-iced , you name it. babashka ships with nREPL built in. You boot a and you get an nREPL server, no extra dependencies. That’s how a lot of people use nREPL in scripting contexts today, and it’s been a hit. basilisp (the Clojure dialect on Python) has nREPL support . nREPL running on Python, talking to Emacs, evaluating Clojure. Nice. ClojureCLR has a working nREPL story now, and jank (the C++ Clojure) has nREPL on its roadmap too. The middleware ecosystem ( , , , , , …) is alive, well, and continues to add features.

0 views
(think) Yesterday

neat: a language-agnostic nREPL client for Emacs

I think I’ll take my REPL neat My parens black and my bed at three CIDER’s too sweet for me… Last week I announced Port , a small prepl client for Emacs. Today I’m following it up with another small Emacs package. Meet neat , a tiny, deliberately language-agnostic nREPL client. For years I’ve been hearing some version of the same request: “could CIDER work with my non-Clojure nREPL server?”. Babashka, Basilisp, nREPL-CLR, even some homegrown servers people built on top of nREPL for languages I’d never heard of. 1 The answer was always the same kind of squishy “sort of, in theory, with caveats”, because while bare nREPL is genuinely language-agnostic, CIDER is not. CIDER was built for Clojure and assumes Clojure pretty much everywhere. I always thought the right answer was “let’s gradually make CIDER more language-agnostic.” That’s the kind of plan that sounds reasonable until you actually try it. The thing that pushed me over the edge was, oddly enough, building Port. Port is small, focused, and doesn’t try to be CIDER. Working on it for a couple of weeks reminded me how (deceptively) productive it is to start from a clean slate when the new requirements don’t match the assumptions baked into a mature codebase. Trying to retrofit CIDER into a language-agnostic shape would have meant fighting with every helper that ever assumed exists, every middleware contract defines, every project-type heuristic that knows about and and nothing else. A whole lot of “is the server Clojure, or is it the other thing?” branches. The Port experience reaffirmed that the right move for a genuinely different client is a new project , not a thousand cuts to an existing one. So was born. The name is short, says what it does (it’s neat, both in the small-and-tidy sense and in the “no deps, no special assumptions, just the protocol” sense), and conveniently leaves room for puns I haven’t fully committed to yet. I might land on a backronym one day. For now it’s just “neat”. neat is a small Emacs nREPL client. The code is split across four files: It only uses Emacs builtins. There are no external runtime dependencies, not even on , because neat doesn’t assume Clojure on the other end. If you write , , , or anything else that talks nREPL, you turn on in that buffer and it just works. The connection routing is also intentionally library-friendly. There’s a buffer-local override so downstream packages can implement their own routing logic, plus a global default for the simple “one server at a time” case that most people will want. Capability discovery is done at connect time via the nREPL op. neat doesn’t hardcode “this server has completions, this one doesn’t” assumptions. If the server reports a op, the CAPF backend lights up (with type annotations next to each candidate, when the server provides them). If it reports , eldoc starts working and jumps to definitions via an xref backend. If neither is there, you still get a perfectly serviceable raw REPL. Start an nREPL server. Anything that speaks the protocol will do. For a Clojure server: Then in Emacs: A REPL buffer pops up, the prompt follows the server’s reported namespace, and you can type expressions at it. Multi-line input works because only submits when the form parses as balanced under (Emacs Lisp syntax by default, which is close enough for any Lisp). Input history is persisted across sessions. If there’s a file in the project, the prompt defaults to its contents, so is enough to connect. To evaluate from a source buffer, turn on the minor mode: The familiar bindings are there, intentionally compatible with what CIDER users expect: ships the buffer contents as an op; uses the standard op instead, so the server can attribute file and line numbers to errors. Use the latter when you’re actually loading a file from disk and care about good diagnostics. sets the buffer-local , which gets sent as the field on every op from that buffer. For languages where the namespace is declared in the source (Clojure’s , etc.), swap in a parser via . For juggling multiple connections, opens a tabulated-list buffer with one row per live connection, where you can set the default or disconnect interactively. That’s roughly the whole user-facing surface today. There’s no jack-in command, no inspector, no debugger, no test runner. Likely there will never be, but if you need those you should probably be using CIDER anyways… If you write Clojure and CIDER works for you, keep using CIDER. It’s mature, full-featured, and supported, and I’m going to keep working on it for as long as people use it. Nothing about neat changes that. But if you find yourself in one of these situations: then neat might be a better fit. It’s small enough that you can read the whole thing in an afternoon, and the library/UI split ( and are perfectly usable from other packages) is genuinely designed for downstream consumers. neat is part of a broader push I’ve been chewing on for a while now: making nREPL a healthy multi-language ecosystem rather than a Clojure-only protocol. That push has three legs: This is also why I keep teasing a “reference CLI client” in conversations. An editor client is one thing, but a small command-line nREPL client written in a non-Lisp language would be a much sharper test of how language-agnostic the protocol really is. neat is plausibly a precursor to that. Time will tell how far I push this; for now I just wanted to get the Emacs side moving. As always, big thanks to Clojurists Together and everyone supporting my open source work. You make it possible for me to keep tweaking and improving CIDER, nREPL, clj-refactor, and friends, and occasionally try something “neat” on the side. isn’t replacing any of the existing Clojure tooling for Emacs. It’s just another tool in the box for the people who want it. Feedback, ideas, and contributions are most welcome over at the issue tracker . Keep hacking! https://github.com/clojure-emacs/cider/issues/3905   ↩︎ For a long time I planned to extract CIDER’s nREPL client code into a reusable package, but now that we have I probably will finally abandon this idea.  ↩︎ : bencode encode/decode. : TCP connections, request dispatch, the standard nREPL ops. : a comint-derived REPL buffer. : the entry point, customization group, and minor mode for source buffers. you write a non-Clojure language whose runtime ships an nREPL server, and you’ve been muddling through with a half-supported CIDER setup, you write Clojure but you value minimalism and don’t need the full CIDER feature set, you’re building an Emacs package that needs to talk nREPL and you want a small, dependency-free library to build on, 2 An actual nREPL specification. The spec.nrepl.org draft is (will be) the formal version of what today is “whatever nREPL the project does”. Reference clients. neat is one. The point of building a deliberately Clojure-free client is that it stress-tests the spec. Anywhere neat ends up needing to special-case the server, the spec has a gap. A compatibility test suite. The parameterised integration suite in neat already runs the same assertions against multiple servers and surfaces real divergences (Clojure batching into a single message where Basilisp emits two, for example). I’d like to grow this into a portable suite that any nREPL server can self-check against. https://github.com/clojure-emacs/cider/issues/3905   ↩︎ For a long time I planned to extract CIDER’s nREPL client code into a reusable package, but now that we have I probably will finally abandon this idea.  ↩︎

0 views
Marc Brooker Yesterday

Agentic software development hypothesis

This is the quality content you come here for, right? Agentic Software Development Hypothesis: First objection: Few meaningful tasks have a complete specification. Second objection: Most oracles aren’t deterministic. Weak form : Any coding task for which a complete specification is available will become trivial. Strong form : Any coding task for which a deterministic oracle is available will become trivial. Strongest form: Any coding task for which a non-adversarial ( pythic? ) oracle exists will become trivial.

0 views
Xe Iaso Yesterday

"No way to prevent this" say users of only language where this regularly happens

In the hours following the release of CVE-2026-45584 for the project Microsoft Windows , site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix a memory safety vulnerability resulting in arbitrary code execution inside the virus scanner Windows Defender. This is due to the affected components being written in C++, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Dr. Annabelle Connelly, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."

0 views
Sean Goedecke Yesterday

Prompts are technical debt too

It’s common and correct to say that “all code is technical debt”. Adding code is a necessary evil for developing new features: you almost always have to do it, but each line of code adds to the complexity and maintenance burden of the system. All future changes to the system have to work with the existing code, or at least avoid breaking it. Once systems accumulate enough code, they become impossible for a single person to understand: instead of reading the code and understanding what it does, you must rely on guesses, theories and heuristics 1 . Sensible engineers write as little code as possible. They write a lot of prompts, though! Many large projects now have a set of codebase-specific prompt files: AGENTS.md, CLAUDE.md, those same files in sub-directories, and skills . If you’re building a program that uses AI 2 , you’ll have separate prompts for capabilities and for each tool , as well as a whole set of system prompts . Prompts are important. Minor tweaks to a LLM’s prompt can unlock significant performance improvements. If the same model feels different across Codex, Cursor, OpenCode, and Copilot, it’s almost certainly due to subtle differences in prompting. AI companies spend a lot of time testing and tweaking their prompts, so it makes sense why engineers would spend a lot of time tweaking their AGENTS.md files 3 for their projects. I’d even call switching tools or workflows to be a form of prompting. If I start wrapping my agents in a Ralph loop , pull in a new skill file, or install an MCP server, that’s still a change to my prompts even though I’m not the one who wrote it. I think it is a bad idea to spend a ton of time tweaking a bespoke agentic coding setup. Why is that, given that prompt adjustments can deliver a lot of value? Because prompt adjustments are model-specific . Earlier I said that AI companies spend a lot of time tweaking their prompts. In fact, they spend that amount of time for each new model release. A prompt that worked great for GPT-5.4 won’t necessarily work as well for GPT-5.5. You have to “learn how to hold the model” each time. In other words, a set of prompts that you carefully crafted in January this year might be out of date or actively harmful by February. Worse still, you might not even notice. Model capabilities are already so hard to pin down (unless you’re running every problem through different models and tools), and even weak AI systems are surprisingly good at some problems. You might just think “huh, the new Anthropic model isn’t as impressive as the hype”, or “wow, Claude Code has gotten worse recently”. In this sense, prompts are a worse form of technical debt than code . When technical debt blows up, it usually causes errors or a tangible slowdown as you try to understand the code. Prompts will decay silently. Also, even janky code tends to be relatively stable when untouched, but every single model upgrade could turn a functional prompt into a non-functional one. Could you simply decide not to upgrade models? Some people are trying this, but the pace of improvement is fast enough that that isn’t really practical. A delicately-prompted agentic harness built around GPT-4.1 is always going to underperform a bare-bones harness built around Opus 4.7. This might be a sensible strategy at some point in the future, when the rate of model improvement slows down (or when models are so capable that you don’t need the extra intelligence for normal engineering tasks), but I don’t believe it’s a good strategy today. In my view, most people should just be picking an AI coding tool maintained by a third-party company (Claude Code, Codex, Cursor, Copilot, etc) and leaving it as unconfigured as possible, so they can piggyback on the work of teams of engineers who are evaluating and tweaking prompts with each new model. Avoid MCP and skills unless absolutely necessary, and keep them off by default. At least this way if one of those teams gets it badly wrong, users will notice eventually and complain about it. When you write AGENTS.md files, try to avoid behavior steering (like the now-outdated “think step by step”, “you are a skilled engineer”, or “if you get a task right I will tip you $200”). Keep them limited to specific, concrete facts about the project. Don’t let models fill your AGENTS.md with pages of barely-reviewed text, for the same reason that you wouldn’t let them fill your codebase with pages of barely-reviewed code. Write your prompts yourself, and delete them whenever you get the chance. Almost every system you might get paid to work on is in this category (if not in the code of the system itself, then in its dependencies and libraries). Instead of just using AI to build a program. This distinction was a real pain when I was working on GitHub Models . Almost every system you might get paid to work on is in this category (if not in the code of the system itself, then in its dependencies and libraries). ↩ Instead of just using AI to build a program. This distinction was a real pain when I was working on GitHub Models . ↩

0 views

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Today at Google I/O, Google released Gemini 3.5 Flash . This one skipped the modifier and went straight to general availability, and Google appear to be using it for a whole lot of their key products: 3.5 Flash is available today to billions of people globally: As usual with Gemini, the most interesting details are tucked away in the What's new in Gemini 3.5 Flash developer documentation. It mostly has the same set of platform features as the previous Gemini 3.x series, albeit with no computer use . The model ID is . The knowledge cut-off is January 2025, and it supports 1,048,576 input tokens and 65,536 maximum output tokens. Google are also pushing a new Interactions API , currently in beta, which looks to me like their version of the patterns introduced by OpenAI Responses - in particular server-side history management. Gemini 3.5 Flash is accompanied by a notable price bump. The previous models in the "Flash" family were Gemini 3 Flash Preview and Gemini 3.1 Flash-Lite . The new 3.5 Flash is 3x the price of 3 Flash Preview and 6x the price of 3.1 Flash-Lite (see price comparison here ). At $1.50/million input and $9/million output it's getting close in price to Google's Gemini 3.1 Pro, which is $2 and $12. The Gemini team promise that 3.5 Pro will roll out "next month" - presumably at an even higher price. This fits a trend: OpenAI's GPT-5.5 was 2x the price of GPT-5.4, and Claude Opus 4.7 is around 1.46x the price of 4.6 when you take the new tokenizer into account . Given the price increase it's interesting to see Google roll it out for so many of their own free-to-consumer products. It feels like all three of the major AI labs are starting to probe the price tolerance of their API customers. Artificial Analysis publish the cost to run their proprietary benchmark against models, which is a useful way to take things like tokenization and increased volume of reasoning tokens into account. Some numbers worth comparing: Running the benchmark for 3.5 Flash (high) cost significantly more than 3.1 Pro Preview! Here are some numbers from other vendors: I ran "Generate an SVG of a pelican riding a bicycle" against the Gemini API and got back this pelican, which is a lot : From the code comments: hedgehog on Hacker News : That pelican looks like it's in Miami for a crypto conference. That one cost me 11 input tokens and 14,403 output tokens, for a total cost of just under 13 cents . You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . For everyone via the Gemini app and AI Mode in Google Search For developers in our agent-first development platform Google Antigravity and Gemini API in Google AI Studio and Android Studio For enterprises in Gemini Enterprise Agent Platform and Gemini Enterprise. Gemini 3.5 Flash (high) : $1,551.60 Gemini 3.1 Pro Preview : $892.28 Gemini 3 Flash Preview (Reasoning) : $278.26 Gemini 3.1 Flash-Lite Preview : $93.60 Claude Opus 4.7 (Adaptive Reasoning, Max Effort) : $5,117.14 Claude Opus 4.7 (Non-reasoning, High Effort) : $1,217.23 GPT-5.5 (xhigh) : $3,357.00 GPT-5.5 (medium) : $1,199.14

1 views
Unsung Yesterday

“Some say it sounds like an alto saxophone.”

I witnessed this Siemens locomotive depart yesterday and for a second I thought I was losing my mind. Then, I smiled so hard: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/some-say-it-sounds-like-an-alto-saxophone/yt1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/some-say-it-sounds-like-an-alto-saxophone/yt1.1600w.avif" type="image/avif"> Turns out, the startup melody was intentional in this particular model. The power converters have to adapt the current from the overhead line to convert it to the three-phase motors of the locomotive, and that generates a rising tone. The engineers decided to change the logic to increment the tone in precise few steps resembling a musical scale, rather than allowing it to rise continuously. I debated whether to include this on Unsung. I guess it is software, even if it’s attached to the hardest of hardware. And sure, it’s “just” delightful, but it is still kind of nice to see someone go extra, adding a human touch atop a technical process that had to happen anyway. But then, it reminded me of something. No, not the poor CSIRAC trying (and similarly struggling) to become a musician. Rather, a “musical road” built in Lancaster, California, where the engineers messed up the execution, creating a truly unpleasant, atonal melody. David Simmons-Duffin wrote a fun essay in 2008 analyzing the “bug” thoroughly, including useful visuals, and even replicating the problem. Subsequently, Tom Scott visited the road and made a video about it ten years later. It won’t surprise you that the cause of the bug was bad hand-off between designers and engineers, but there can be no software patch for grooves you cut in asphalt – and so at least as of last year , the embarrassingly sounding road was still there. I think I prefer my out-of-tune musical scale performed by a train. Given it’s easy to find compilation videos of Siemens locomotives booting up , it seems I’m not alone. #bugs #real world #sound design #youtube

0 views
Unsung Yesterday

Shallow breathing

Turns out that the breathing light survives, sort of, not really, in an Apple product today: The AirPods Pro case does this when charging – right at the start, or when you tap it later. But it disappears after a while, the pace is now 28 breaths a second (over twice as fast as the original iteration), and the light is orange. Is it still the same thing, reflecting on how smaller organisms breathe faster? Or is it mostly an unrelated idea, with the light fading in and out indicating activity rather than lack of it? My money is on the latter – the light turns white when pairing, too, and it cycles even faster then – but it was nice to imagine the return of the old feature for a second or two… or 2.1, to be precise. #apple #details #hardware #motion design

0 views
flowtwo.io Yesterday

PHP's Oddities

I've been coding in PHP at work for the last 5 years. My org's entire backend is written in PHP—a decision made in 2007 when the company first started. It's not a language I ever imagined myself using prior to working there, but life takes you in all sorts of directions you don't expect. PHP gets a bad rep in the industry, despite being a mature and commonly used language . But it's mostly based on out-of-date understanding of what PHP can do. Recent versions have caught up with most other languages in terms of features; by this point it's a pretty versatile general-purpose language. Certainly not just for serving HTML, as it was originally designed. I'm no longer working at the aforementioned company, so I'm reflecting on my experience with PHP after all these years and there's some things I've always found odd about it. And more than just odd, some of it's language features are really unintuitive and have been prone to cause bugs. This comes from personal experience and many previous headaches at work. I'll explain two of the biggest offenders in this post—in short: PHP's standard library basically only has one data structure: the . This was intentional; it was designed to be a general-purpose, flexible data structure that can cover a variety of use cases. It's technically an ordered key-value dictionary , not an array in the traditional sense . Unfortunately, with flexibility comes complexity. If you want to create a collection of fixed-size objects in an allocated memory block, you can't really do that. PHP pretends to support them, but the illusion breaks down in unexpected ways. Let's say I have a bunch of fruits. PHP let's me define a fruits "array" and I can do normal array things with it. Everything looks fine but you get into trouble whenever you perform a mutation on this "simple" array; it will be exposed as being a key-value store. When you use one of PHP's built-in functions for standard array operations like sorting or filtering, it will operate on the keys AND values of your array. If it mutates the array in-place or by a return value, the key order will likely become inconsistent. why can't I hold all these indices??? The only way to put these arrays back into a naturally indexed state is to use the function. You just have to know that, or else you end up with subtle bugs. It's just strange to me that PHP doesn't support simple collections of objects. It's annoying to have to manage these arbitrary numeric keys when all you really want is ordinal indexing like 99% of the time. It feels like a leaky abstraction. In PHP5, a native type system was added to the language. It was expanded over time and by PHP7 you could define the types for your class's properties. Although PHP is a scripting language, type declarations will help catch bugs during testing, or even during development with the use of static analysis tools like PHPStan . But PHP's type system has some quirks since it was built on an existing dynamically typed language. The rules had to be designed after the behaviour was already there. For class properties, there's a hidden uninitialized state that can pop up if you're not careful. Let's define a class with three properties: Here, I'm illustrating all the ways of declaring the type for a string property: Before PHP7, all class properties were (1): untyped. Since the type system is optional, it has to live alongside the "legacy" behaviour which has weird consequences. For example, what do you think the values of these three properties will be after we instantiate a object? Trick question! Only the untyped property will have a value, and that value is . That seems fine and is roughly in line with how I'd expect a language to use a value. But the other two properties will NOT have a value because they don't exist, or rather they could exist but haven't been initialized yet. This example exposes the "uninitialized" state that a property can be in, which is NOT the same as . This distinction frustratingly comes up when you try to do a null check on these properties: Not a warning—a FATAL error occurs if you try to access an uninitialized property. This comes up a lot in cases where you try to deserialize data into a PHP object. If a field's data isn't present you might not initialize the property at all. ahh yes, NULL...who was that by again? This lax behaviour for property definitions makes writing code around them harder. Especially when you take into account that any object can have properties dynamically added to them: So I feel like the class property type system does little to help you understand what a given object is composed of, and in some respects has made it less clear because it's introduced this new uninitialized state. As a developer, it's hard to write defensive code because you're never sure which checks to do for all these situations: , (), , ... it's not obvious which functions cover which states. I'd argue that uninitialized did not need to be a state at all. For nullable typed properties, just default them to the way untyped properties are. And for non-nullable types, require them to be be defined as constructor promoted parameters OR require a default value at declaration. Similar requirements already exist for the attribute, so it's certainly feasible for the PHP execution engine to enforce it. But there's probably some nuance or historical reason I'm missing here. Let me know in the comments if you know. Despite all the critiquing I've done in this article, I still think the amount of hate PHP gets is undeserved. Like any language, it has it's quirks and tradeoffs, but you can still accomplish any task using PHP that you could in another language. The more you know about a language, the better you can structure things to work "with the grain" and write more idiomatic code. Some things I do enjoy about PHP: Thanks for reading! Arrays are weird and overloaded The type system is clunky It's a string It's nullable string It's a scripting language, so development friction is low. Make a file change and it instantly takes effect. Laravel is a solid web framework with tons of extensible functionality. It's opinionated and definitely leans into the "auto-magical" framework style, but it was designed well so you don't mind. All the $ signs help remind you what you're doing it all for at the end of the day 🤑

0 views