GreatReads - Blog Aggregator · Phoenix Framework

Home Categories Sources RSS Generator

Posts in Database (20 found)

miguelgrinberg.com Today

SQLAlchemy 2 In Practice - Chapter 5 - Advanced Many-To-Many Relationships

This is the fifth chapter of my SQLAlchemy 2 in Practice book. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon . Thank you! You have now learned the design blocks used in relational databases. Sometimes, however, these building blocks have to be "tweaked" a bit to achieve a desired goal. This chapter is dedicated to exploring a very useful variation on the many-to-many relationship.

0 views

miguelgrinberg.com 1 weeks ago

SQLAlchemy 2 In Practice - Chapter 4 - Many-To-Many Relationships

This is the fourth chapter of my SQLAlchemy 2 in Practice book. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon . Thank you! Continuing with the topic of relationships, this chapter is dedicated to the many-to-many type, which as its name implies, is used when it is not possible to identify any of the sides as a "one" side.

0 views

Lalit Maganti 1 weeks ago

Eight years of wanting, three months of building with AI

For eight years, I’ve wanted a high-quality set of devtools for working with SQLite. Given how important SQLite is to the industry 1 , I’ve long been puzzled that no one has invested in building a really good developer experience for it 2 . A couple of weeks ago, after ~250 hours of effort over three months 3 on evenings, weekends, and vacation days, I finally released syntaqlite ( GitHub ), fulfilling this long-held wish. And I believe the main reason this happened was because of AI coding agents 4 . Of course, there’s no shortage of posts claiming that AI one-shot their project or pushing back and declaring that AI is all slop. I’m going to take a very different approach and, instead, systematically break down my experience building syntaqlite with AI, both where it helped and where it was detrimental. I’ll do this while contextualizing the project and my background so you can independently assess how generalizable this experience was. And whenever I make a claim, I’ll try to back it up with evidence from my project journal, coding transcripts, or commit history 5 .

0 views

Armin Ronacher 1 weeks ago

Absurd In Production

About five months ago I wrote about Absurd , a durable execution system we built for our own use at Earendil, sitting entirely on top of Postgres and Postgres alone. The pitch was simple: you don’t need a separate service , a compiler plugin , or an entire runtime to get durable workflows. You need a SQL file and a thin SDK. Since then we’ve been running it in production, and I figured it’s worth sharing what the experience has been like. The short version: the design held up, the system has been a pleasure to work with, and other people seem to agree. Absurd is a durable execution system that lives entirely inside Postgres. The core is a single SQL file ( absurd.sql ) that defines stored procedures for task management, checkpoint storage, event handling, and claim-based scheduling. On top of that sit thin SDKs (currently TypeScript , Python and an experimental Go one) that make the system ergonomic in your language of choice. The model is straightforward: you register tasks, decompose them into steps, and each step acts as a checkpoint. If anything fails, the task retries from the last completed step. Tasks can sleep, wait for external events, and suspend for days or weeks. All state lives in Postgres. If you want the full introduction, the original blog post covers the fundamentals. What follows here is what we’ve learned since. The project got multiple releases over the last five months. Most of the changes are things you’d expect from a system that people actually started depending on: hardened claim handling, watchdogs that terminate broken workers, deadlock prevention, proper lease management, event race conditions, and all the edge cases that only show up when you’re running real workloads. A few things worth calling out specifically. Decomposed steps. The original design only had , where you pass in a function and get back its checkpointed result. That works well for many cases but not all. Sometimes you need to know whether a step already ran before deciding what to do next. So we added / , which give you a handle you can inspect before committing the result. This turned out to be very useful for modeling intentional failures and conditional logic. This in particular is necessary when working with “before call” and “after call” type hook APIs. Task results. You can now spawn a task, go do other things, and later come back to fetch or await its result. This sounds obvious in hindsight, but the original system was purely fire-and-forget. Having proper result inspection made it possible to use Absurd for things like spawning child tasks from within a parent workflow and waiting for them to finish. This is particularly useful for debugging with agents too. absurdctl . We built this out as a proper CLI tool. You can initialize schemas, run migrations, create queues, spawn tasks, emit events, retry failures from the command line. It’s installable via or as a standalone binary. This has been invaluable for debugging production issues. When something is stuck, being able to just and see exactly where it stopped is a very different experience from digging through logs. Habitat . A small Go application that serves up a web dashboard for monitoring tasks, runs, checkpoints, and events. It connects directly to Postgres and gives you a live view of what’s happening. It’s simple, but it’s the kind of thing that makes the system more enjoyable for humans. Agent integration. Since Absurd was originally built for agent workloads, we added a bundled skill that coding agents can discover and use to debug workflow state via . There’s also a documented pattern for making pi agent turns durable by logging each message as a checkpoint. The thing I’m most pleased about is that the core design didn’t need to change all that much. The fundamental model of tasks, steps, checkpoints, events, and suspending is still exactly what it was initially. We added features around it, but nothing forced us to rethink the basic abstractions. Putting the complexity in SQL and keeping the SDKs thin turned out to be a genuinely good call. The TypeScript SDK is about 1,400 lines. The Python SDK is about 1,900 but most of this comes from the complexity of supporting colored functions. Compare that to Temporal’s Python SDK at around 170,000 lines. It means the SDKs are easy to understand, easy to debug, and easy to port. When something goes wrong, you can read the entire SDK in an afternoon and understand what it does. The checkpoint-based replay model also aged well. Unlike systems that require deterministic replay of your entire workflow function, Absurd just loads the cached step results and skips over completed work. That means your code doesn’t need to be deterministic outside of steps. You can call or in between steps and things still work, because only the step boundaries matter. In practice, this makes it much easier to reason about what’s safe and what isn’t. Pull-based scheduling was the right choice too. Workers pull tasks from Postgres as they have capacity. There’s no coordinator, no push mechanism, no HTTP callbacks. That makes it trivially self-hostable and means you don’t have to think about load management at the infrastructure level. I had some discussions with folks about whether the right abstraction should have been a durable promise . It’s a very appealing idea, but it turns out to be much more complex to implement in practice. It’s however in theory also more powerful. I did make some attempts to see what absurd would look like if it was based on durable promises but so far did not get anywhere with it. It’s however an experiment that I think would be fun to try! The primary use case is still agent workflows. An agent is essentially a loop that calls an LLM, processes tool results, and repeats until it decides it’s done. Each iteration becomes a step, and each step’s result is checkpointed. If the process dies on iteration 7, it restarts and replays iterations 1 through 6 from the store, then continues from 7. But we’ve found it useful for a lot of other things too. All our crons just dispatch distributed workflows with a pre-generated deduplication key from the invocation. We can have two cron processes running and they will only trigger one absurd task invocation. We also use it for background processing that needs to survive deploys. Basically anything where you’d otherwise build your own retry-and-resume logic on top of a queue. Absurd is deliberately minimal, but there are things I’d like to see. There’s no built-in scheduler. If you want cron-like behavior, you run your own scheduler loop and use idempotency keys to deduplicate. That works, and we have a documented pattern for it , but it would be nice to have something more integrated. There’s no push model. Everything is pull. If you need an HTTP endpoint to receive webhooks and wake up tasks, you build that yourself. I think that’s the right default as push systems are harder to operate and easier to overwhelm but there are cases where it would be convenient. In particular there are quite a few agentic systems where it would be super nice to have webhooks natively integrated (wake on incoming POST request). I definitely don’t want to have this in the core, but that sounds like the kind of problem that could be a nice adjacent library that builds on top of absurd. The biggest omission is that it does not support partitioning yet. That’s unfortunate because it makes cleaning up data more expensive than it has to be. In theory supporting partitions would be pretty simple. You could have weekly partitions and then detach and delete them when they expire. The only thing that really stands in the way of that is that Postgres does not have a convenient way of actually doing that. The hard part is not partitioning itself, it’s partition lifecycle management under real workloads. If a worker inserts a row whose lands in a month without a partition, the insert fails and the workflow crashes. So you need a separate maintenance loop that always creates future partitions far enough ahead for sleeps/retries, and does that for every queue. On the delete side, the safe approach is , but getting that to run from doesn’t work because it cannot be run within a transaction, but runs everything in one. I don’t think it’s an unsolvable problem, but it’s one I have not found a good solution for and I would love to get input on . This brings me a bit to a meta point on the whole thing which is what the point of Open Source libraries in the age of agentic engineering is. Durable Execution is now something that plenty of startups sell you. On the other hand it’s also something that an agent would build you and people might not even look for solutions any more. It’s kind of … weird? I don’t think a durable execution library can support a company, I really don’t. On the other hand I think it’s just complex enough of a problem that it could be a good Open Source project void of commercial interests. You do need a bit of an ecosystem around it, particularly for UI and good DX for debugging, and that’s hard to get from a throwaway implementation. I don’t think we have squared this yet, but it’s already much better to use than a few months ago. If you’re using Absurd, thinking about it, or building adjacent ideas, I’d love your feedback. Bug reports, rough edges, design critiques, and contributions are all very welcome—this project has gotten better every time someone poked at it from a different angle.

0 views

miguelgrinberg.com 2 weeks ago

SQLAlchemy 2 In Practice - Chapter 3 - One-To-Many Relationships

This is the third chapter of my SQLAlchemy 2 in Practice book. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon . Thank you! In the previous chapter you learned how to execute a variety of queries on the table. Interestingly, some of those queries were designed to obtain product manufacturers and not products, and this required duplicates to be removed by grouping the results.

0 views

miguelgrinberg.com 3 weeks ago

SQLAlchemy 2 In Practice - Chapter 2 - Database Tables

This is the second chapter of my SQLAlchemy 2 in Practice book. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon . Thank you! This chapter provides an overview of the most basic usage of the SQLAlchemy library to create, update and query database tables.

0 views

miguelgrinberg.com 3 weeks ago

SQLAlchemy 2 In Practice - Chapter 1 - Database Setup

Welcome! This is the start of a journey which I hope will provide you with many new tricks to improve how you work with relational databases in your Python applications. Given that this is a hands-on book, this first chapter is dedicated to help you set up your system with a database, so that you can run all the examples and exercises. This is the first chapter of my SQLAlchemy 2 in Practice book. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon . Thank you!

0 views

Lalit Maganti 1 months ago

syntaqlite: high-fidelity devtools that SQLite deserves

Most SQL tools treat SQLite as a “flavor” of a generic SQL parser. They approximate the language, which means they break on SQLite-exclusive features like virtual tables , miss syntax like UPSERT , and ignore the 22 compile-time flags that change the syntax SQLite accepts. So I built syntaqlite : an open-source parser, formatter, validator, and LSP built directly on SQLite’s own Lemon-generated grammar. It sees SQL exactly how SQLite sees it, no matter which version of SQLite you’re using or which feature flags you compiled with. It ships as a CLI , VS Code extension , Claude Code LSP plugin , and C / Rust libraries. There’s also a web playground which you can try now: paste any SQLite SQL and see parsing, formatting, and validation live in the browser, no install needed. Full documentation is available here . Here’s syntaqlite in action: Formatting with the CLI Validation with the CLI

1 views

miguelgrinberg.com 1 months ago

Introduction to SQLAlchemy 2 In Practice

In 2023 I wrote " SQLAlchemy 2 In Practice ", a book in which I offer an in-depth look at SQLAlchemy version 2 , still the current version today. SQLAlchemy is, for those who don't know, the most popular database library and Object-Request Mapper (ORM) for Python. I have a tradition of publishing my books on this blog to read for free, but this is one that I never managed to bring here, and starting today I'm going to work on correcting that. This article includes the Preface of the book. If you are interested, keep an eye out on this blog over the next few weeks, as I will be publishing the eight chapters of the book in order. If you can't wait for the installments, you can buy the book in electronic or paper format today, and I will be eternally thankful, as you will be directly supporting my work.

0 views

Robin Moffatt 1 months ago

Claude Code isn't going to replace data engineers (yet)

Ten years late (but hopefully not a dollar short ) I recently figured out what all the fuss about dbt is about . No it’s not (at least, not yet). In fact, used incorrectly, it’ll do a worse job than you. But used right, it’s a kick-ass tool that any data engineer should be adding to their toolbox today * . In this article I’ll show you why.

0 views

The Coder Cafe 1 months ago

Build Your Own Key-Value Storage Engine—Week 8

Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It’s hosted by ScyllaDB, the monstrously fast and scalable database. The conference starts today and lasts two days. Tomorrow, I’m giving a talk called Working on Complex Systems . I’d be glad to see you there 🙂 . Agenda Week 0: Introduction Week 1: In-Memory Store Week 2: LSM Tree Foundations Week 3: Durability with Write-Ahead Logging Week 4: Deletes, Tombstones, and Compaction Week 5: Leveling and Key-Range Partitioning Week 6: Block-Based SSTables and Indexing Week 7: Bloom Filters and Trie Memtable Week 8: Concurrency Over this series, you built a working LSM tree: you flush to persist the memtable to disk and compact to reclaim space. Yet, you’ve been single-threaded so far. This week, we lift that constraint: flush and compaction will run in the background while you keep serving requests. There are many ways to add concurrency. The approach here is to introduce a versioned, ref-counted catalog that lets readers take a stable snapshot while background flush/compaction proceeds. A catalog holds references to: The current memtable. The current WAL. The current MANIFEST. Each request pins one catalog version for the duration of the operation. When a flush or compaction completes, the system creates a new catalog version. Old resources (e.g., obsolete SSTables) are not deleted immediately. Instead, each catalog tracks a refcount of in-flight requests. Once an old catalog’s refcount drops to zero and a newer catalog exists, you can safely garbage-collect the resources that appear in the old version but not in the new one. For example, with two catalog versions (red = older version’s elements, blue = newer element”): Once we can guarantee that catalog v1 is no longer referenced, we can delete the old MANIFEST, SST-2 and SST-3. Another example: a flush produced a new memtable and WAL file: In this case, once vatalog v1 has no remaining references, we can free the old memtable and delete the old WAL file. 💬 If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server ( channel): Join the Discord Add a data structure that tracks: Memtable reference. MANIFEST path. Version (monotonic). Refcount of active readers. Implement a manager that keeps catalog versions in memory: Pick the latest catalog. Increment its refcount. Decrement the refcount of the catalog. If refcount is zero and there’s a new catalog version: Remove the current catalog. Remove elements present in the current catalog but not in the latest version (files, WAL, etc.) Create a new catalog based on the provided data. Assign a unique, monotonic version. At startup: Read from the authoritative MANIFEST (latest MANIFEST file). Treat any files not listed in MANIFEST as orphans and delete them. Read all WAL files you still have on disk, in order, to rebuild the in-memory state. Create the current catalog version from the reconstructed state. Start the background worker. In a nutshell, flush and compaction will move to the background. You’ll use internal queues plus worker pools to ensure no overlapping work on the same resources: at most one flush running at a time, and at most one compaction running at a time. Compaction: Keep the same trigger: Every 10,000 update requests. Do not run compaction in the request path. On compaction trigger: Post a notification to an internal queue and return. A single background thread listens on the queue and runs the actual work. Similar compaction process, except: Do not overwrite the existing WAL file. Instead, create a new file. Create a new catalog that references the new WAL. Keep the same trigger: When the memtable contains 2,000 entries. Do not run flush in the request path. On flush trigger: Allocate a new memtable and create a new WAL file for subsequent writes. Post a notification to an internal queue. Return immediately to the caller. A single background thread listens on the queue and runs the actual work. Similar flush process, except: Do not overwrite the existing MANIFEST file. Instead, create a new file. Create a new catalog referencing the new MANIFEST. Acquire a catalog from the manager. Do the operation using paths/refs from that catalog. Release the catalog. Concurrent requests make deterministic assertions harder. For example, suppose the validation file contains the following requests that can run in parallel: What should you assert for : , , or ? To make validation deterministic, you will handle barriers: all requests before a barrier must finish before starting the next block. You will also relax checks: a is valid if it returns any value written for a key before the last barrier. A similar example with barriers: The first two requests run in parallel. The first barrier waits for both to complete. The first GET should accept either or . The second request should accept only . The new validation file is a sequence of blocks separated by instructions: All the lines between two barriers form a block. On instruction, wait for all in-flight requests in the current block to finish before starting the next block. / lines are issued in parallel within their block. lines are also issued in parallel within their block. means the response must be any one of the list values. Download and run your client against a new file: concurrency.txt . When the memtable reaches 80% of capacity: Pre-allocate the next memtable in memory. Pre-create/rotate to the next WAL on disk. That's it for the whole series. You implemented a fully functional LSM tree: Started with a memtable (hashtable) and a flush that writes immutable SSTables to disk. Added a WAL for durability. Handled deletes and compaction to reclaim space. Introduced leveling and key-range partitioning to speed up reads. Switched to block-based SSTables with indexing. Added Bloom filters and replaced the memtable with a radix trie for faster lookups. Finally, introduced concurrency: a simple, single-threaded foreground path with flush and compaction running in the background. I hope you had fun building it. Thank you for following the series, and special thanks to our partner, ScyllaDB ! To get more information on how things work in production databases, you can read how RocksDB keeps track of live SST files . The structure is inspired by RocksDB’s . Conflict resolution is one aspect we’re missing in the series (maybe as a follow-up?) A versioned catalog is enough for reads, but what about conflicting writes? Suppose two clients, Alice and Bob, update the same key around the same time. A simple policy to resolve conflicts is latest wins. The database can serialize operations for the same key to ensure the latest request wins: In this example, the database ends up with as the latest state. This approach works with one node. But what about databases composed of multiple nodes? Say the two requests go to two different nodes at roughly the same time: With multiple nodes, the database must resolve conflicts consistently. There are two common ways: Coordination via a leader (consensus): Route both writes to the same leader node, which solves the conflict and determines the end state. Reconcile with comparable timestamps: Attach a timestamp to each write and store it with the key. By timestamp, we don’t mean relying on wall-clock time but a logical clock, so that “later“ is well-defined across nodes. If we go with the second approach and start storing data, we also unlock something production systems use: consistent snapshots. A read can include a timestamp, and the database returns the last version at or before that time; hence, providing a consistent view of the data, even while flush/compaction runs in the background. This pattern has a name: Multi-Version Concurrency Control (MVCC). It involves keeping multiple versions per key instead of only the last one, reading using a chosen point in time, and deleting old versions once they are no longer needed. See how ScyllaDB handles timestamp conflict resolution for more information. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. ❤️ If you enjoyed this post, please hit the like button. Agenda Week 0: Introduction Week 1: In-Memory Store Week 2: LSM Tree Foundations Week 3: Durability with Write-Ahead Logging Week 4: Deletes, Tombstones, and Compaction Week 5: Leveling and Key-Range Partitioning Week 6: Block-Based SSTables and Indexing Week 7: Bloom Filters and Trie Memtable Week 8: Concurrency Over this series, you built a working LSM tree: you flush to persist the memtable to disk and compact to reclaim space. Yet, you’ve been single-threaded so far. This week, we lift that constraint: flush and compaction will run in the background while you keep serving requests. There are many ways to add concurrency. The approach here is to introduce a versioned, ref-counted catalog that lets readers take a stable snapshot while background flush/compaction proceeds. A catalog holds references to: The current memtable. The current WAL. The current MANIFEST. Once we can guarantee that catalog v1 is no longer referenced, we can delete the old MANIFEST, SST-2 and SST-3. Another example: a flush produced a new memtable and WAL file: In this case, once vatalog v1 has no remaining references, we can free the old memtable and delete the old WAL file. Your Tasks 💬 If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server ( channel): Join the Discord Catalog Add a data structure that tracks: Memtable reference. MANIFEST path. Version (monotonic). Refcount of active readers. Implement a manager that keeps catalog versions in memory: : Pick the latest catalog. Increment its refcount. : Decrement the refcount of the catalog. If refcount is zero and there’s a new catalog version: Remove the current catalog. Remove elements present in the current catalog but not in the latest version (files, WAL, etc.) : Create a new catalog based on the provided data. Assign a unique, monotonic version. At startup: Read from the authoritative MANIFEST (latest MANIFEST file). Treat any files not listed in MANIFEST as orphans and delete them. Read all WAL files you still have on disk, in order, to rebuild the in-memory state. Create the current catalog version from the reconstructed state. Start the background worker. Compaction: Keep the same trigger: Every 10,000 update requests. Behavior: Do not run compaction in the request path. On compaction trigger: Post a notification to an internal queue and return. Worker: A single background thread listens on the queue and runs the actual work. Similar compaction process, except: Do not overwrite the existing WAL file. Instead, create a new file. Create a new catalog that references the new WAL. Keep the same trigger: When the memtable contains 2,000 entries. Behavior: Do not run flush in the request path. On flush trigger: Allocate a new memtable and create a new WAL file for subsequent writes. Post a notification to an internal queue. Return immediately to the caller. Worker: A single background thread listens on the queue and runs the actual work. Similar flush process, except: Do not overwrite the existing MANIFEST file. Instead, create a new file. Create a new catalog referencing the new MANIFEST. Acquire a catalog from the manager. Do the operation using paths/refs from that catalog. Release the catalog. What should you assert for : , , or ? To make validation deterministic, you will handle barriers: all requests before a barrier must finish before starting the next block. You will also relax checks: a is valid if it returns any value written for a key before the last barrier. A similar example with barriers: The first two requests run in parallel. The first barrier waits for both to complete. The first GET should accept either or . The second request should accept only . All the lines between two barriers form a block. On instruction, wait for all in-flight requests in the current block to finish before starting the next block. / lines are issued in parallel within their block. lines are also issued in parallel within their block. means the response must be any one of the list values. Pre-allocate the next memtable in memory. Pre-create/rotate to the next WAL on disk. Started with a memtable (hashtable) and a flush that writes immutable SSTables to disk. Added a WAL for durability. Handled deletes and compaction to reclaim space. Introduced leveling and key-range partitioning to speed up reads. Switched to block-based SSTables with indexing. Added Bloom filters and replaced the memtable with a radix trie for faster lookups. Finally, introduced concurrency: a simple, single-threaded foreground path with flush and compaction running in the background. In this example, the database ends up with as the latest state. This approach works with one node. But what about databases composed of multiple nodes? Say the two requests go to two different nodes at roughly the same time: With multiple nodes, the database must resolve conflicts consistently. There are two common ways: Coordination via a leader (consensus): Route both writes to the same leader node, which solves the conflict and determines the end state. Reconcile with comparable timestamps: Attach a timestamp to each write and store it with the key. By timestamp, we don’t mean relying on wall-clock time but a logical clock, so that “later“ is well-defined across nodes.

0 views

<antirez> 1 months ago

Redis patterns for coding

Here LLM and coding agents can find: 1. Exhaustive documentation about Redis commands and data types. 2. Patterns commonly used. 3. Configuration hints. 4. Algorithms that can be mounted using Redis commands. https://redis.antirez.com/ Some humans claim this documentation is actually useful for actual people, as well :) I'm posting this to make sure search engines will index it. Comments

0 views

Binary Igor 1 months ago

JSON Documents Performance, Storage and Search: MongoDB vs PostgreSQL

Does MongoDB still have an edge as a document-oriented database for JSON in particular? Or is Postgres better? Or at least good-enough to stick with it, since it is a more universal database, offering a richer feature set and wider applicability?

0 views

The Coder Cafe 1 months ago

Build Your Own Key-Value Storage Engine—Week 7

Curious how leading engineers tackle extreme scale challenges with data-intensive applications? Join Monster Scale Summit (free + virtual). It’s hosted by ScyllaDB, the monstrously fast and scalable database. Agenda Week 0: Introduction Week 1: In-Memory Store Week 2: LSM Tree Foundations Week 3: Durability with Write-Ahead Logging Week 4: Deletes, Tombstones, and Compaction Week 5: Leveling and Key-Range Partitioning Week 6: Block-Based SSTables and Indexing Week 7: Bloom Filters and Trie Memtable Over the last few weeks, you refined your LSM tree to introduce leveling. In case of a key miss, the process requires the following steps: Lookup from the memtable. Lookup from all the L0 SSTables. Lookup from one L1 SSTable. Lookup from one L2 SSTable. Last week, you optimized the lookups by introducing block-based SSTables and indexing, but a lookup is still not a “free” operation. Worst case, it requires fetching two pages (one for the index block and one for the data block) to find out that a key is missing in an SSTable. This week, you will optimize searches by introducing a “tiny” level of caching per SSTable. If you’re an avid reader of The Coder Cafe 1 , we already discussed a great candidate for such a cache: One that doesn’t consume too much memory to make sure we don’t increase space amplification drastically. One that is fast enough so that a lookup doesn’t introduce too much overhead, especially if we have to check a cache before making any lookup in an SSTable. You will implement a cache using Bloom filters : a space-efficient, probabilistic data structure to check for set membership. A Bloom filter can return two possible answers: The element is definitely not in the set (no false negatives). The element may be in the set (false positives are possible). In addition to optimizing SSTable lookups, you will also optimize your memtable. In week 2, you implemented a memtable using a hashtable. Let’s get some perspective to understand the problems of using a hashtable: A memtable buffers writes. As it’s the main entry point for writes, a write has to be fast. → OK: a hashtable has average inserts, plus ( : the length of the key) for hashing. For reads, doing a key lookup has to be fast → OK: average lookups, plus to hash. Doing range scanning operations (week 5, optional work), such as: “ Give me the list of keys between bar and foo “ → A hashtable, because it’s not an ordered data structure, is terrible: you end up touching everything so with the number of elements in the hashtable. Flush to L0 → A hashtable isn’t ordered, so it requires sorting all the keys ( ) with n the number of elements) to produce the SSTables. Because of these negative points, could we find a better data structure? Yes! This week, you will switch the memtable to a radix trie (see Further Notes for a discussion on alternative data structures). A trie is a tree-shaped data structure usually used to store strings efficiently. The common example to illustrate a trie is to store a dictionary. For example, suppose you want to store these two words: Despite that starts with the same four letters, you need to store a total of 4 + 5 = 9 letters. Tries optimize the storage required by sharing prefixes. Each node stores one letter. Here’s an example of a trie storing these two words in addition to the word foo ( nodes represent the end of a word): As you can see, we didn’t duplicate the first four letters of to store . In this very example, instead of storing 9 letters for and , we stored only five letters. Yet, you’re not going to implement a “basic” trie for your memtable; instead, you will implement a compressed trie called a radix trie (also known as a patricia 2 trie). Back to the previous example, storing one node (one square) has an overhead. It usually means at least one extra field to store the next element, usually a pointer. In the previous example, we needed 11 nodes in total, but what if we could compress the number of nodes required? The idea is to combine nodes with a single child: This new trie stores the exact same information, except it requires 6 nodes instead of 11. That’s what radix tries are about. To summarize the benefits of switching a memtable from a hashtable to a radix trie: Ordered by design: Tries keep keys in order and make prefix/range lookups natural, which helps for and for streaming a sorted flush. No rebalancing/rehashing pauses: The shape doesn’t depend on insertion order, and operations don’t need rebalancing; you avoid periodic rehash work. Prefix compression: A radix trie can cut duplicated key bytes in the memtable, reducing in-memory space. 💬 If you want to share your progress, discuss solutions, or collaborate with other coders, join the community Discord server ( channel): Join the Discord Let’s size the Bloom filter. You will target: (false-positive rate) = 1% (max elements per SSTable) = 1,953 (hash functions) = 5 Using the formula from the Bloom Filters post: We get ≈ 19,230 bits, i.e., 2,404 B. We will round up to 2,496 B (39 × 64 B), so the bitset is a whole number of cache lines. NOTE : Using =7 would shave only ~2–3% space for ~40% more hash work, so =5 is a good trade-off. To distribute elements across the bitvector, you will use the following approach. You will use xxHash64 with two different constant seeds to get two base hashes, then derive k indices by double hashing (pseudo-code): The required changes to introduce Bloom filters: For each SSTable in the MANIFEST, cache its related Bloom filter in memory. Since each Bloom filter requires only a small amount of space, this optimization has a minimal memory footprint. For example, caching 1,000 Bloom filters of the type you designed requires less than 2.5 MB of memory. SSTable creation: For each new SSTable you write, initialize an empty bitvector of 2,496 B. Build the Bloom filter in memory as you emit the keys (including tombstones): Compute based on the key. For each , set bit at position . When the SSTable is done, persist a sidecar file next to it (e.g., and ) and the file. Update the cache containing the Bloom filters. Compaction: Delete from memory the Bloom filters corresponding to deleted SSTables. Before reading an SSTable: Compute based on the key. If all the bits of are set: The key may be present, therefore, proceed with your normal lookup in the SSTable. Otherwise: Skip this SSTable. Now, let’s replace your hashtable with a trie. : Compressed edge fragment. : A map keyed by the next character after to a node. : An enum with the different possible values: : The node is just a prefix, no full key ends here. : A full key exists at this node. : This key was explicitly deleted. : If is , the corresponding value. Root is a sentinel node with an empty . Walk from the root, matching the longest common prefix against . If partial match in the middle of an edge, split once: Create a parent with the common part, two children: the old suffix and the new suffix. Descend via the next child (next unmatched character). At the terminal node: set and Walk edges by longest-prefix match. If an edge doesn’t match, return not found. At the terminal node: If : return If or , return not found. Walk as in . If the path doesn’t fully exist, create the missing suffix nodes with so that a terminal node exists. At the terminal node: set (you may have to clear ). Flush process: In-order traversal: : Emit tombstone. : Emit nothing. There are no changes to the client. Run it against the same file ( put-delete.txt ) to validate that your changes are correct. Use per-SSTable random seeds for the Bloom hash functions. Persist them in the Bloom filter files. In Bloom Filters , you introduced blocked Bloom filters, a variant that optimizes spatial locality by: Dividing the bloom filter into contiguous blocks, each the size of a cache line. Restricting each query to a single block to ensure all bit lookups stay within the same cache line. Switch to blocked Bloom filters and see the impacts on latency and throughput. If you implemented the operation from week 5 (optional work), wire it to your memtable radix trie. That’s it for this week! You optimized lookups with per-SSTable Bloom filters and switched the memtable to a radix trie, an ordered data structure. Since the beginning of the series, everything you built has been single-threaded, and flush/compaction remains stop-the-world. In two weeks, you will finally tackle the final boss of LSM trees: concurrency. If you want to dive more into tries, Trie Memtables in Cassandra is a paper that explains why Cassandra moved from a skip list + B-tree memtable to a trie, and what it changed for topics such as GC and CPU locality. A popular variant of radix trie is the Adaptive Radix Tree (ART): it dynamically resizes node types based on the number of children to stay compact and cache-friendly, while supporting fast in-memory lookups, inserts, and deletes. This paper (or this summary ) explores the topic in depth. You should also be aware that tries aren’t the only option for memtables, as other data structures exist. For example, RocksDB relies on a skip list. See this resource for more information. About Bloom filters, some engines keep a Bloom filter not only per SSTable but per data-block range as well. This was the case for RocksDB’s older block-based filter format ( source ). RocksDB later shifted toward partitioned index/filters, which partition the index and full-file filter into smaller blocks with a top-level directory for on-demand loading. The official doc delves into the new approach. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. ❤️ If you enjoyed this post, please hit the like button. I’m sure you are. Week 0: Introduction Week 1: In-Memory Store Week 2: LSM Tree Foundations Week 3: Durability with Write-Ahead Logging Week 4: Deletes, Tombstones, and Compaction Week 5: Leveling and Key-Range Partitioning Week 6: Block-Based SSTables and Indexing Week 7: Bloom Filters and Trie Memtable Over the last few weeks, you refined your LSM tree to introduce leveling. In case of a key miss, the process requires the following steps: Lookup from the memtable. Lookup from all the L0 SSTables. Lookup from one L1 SSTable. Lookup from one L2 SSTable. One that doesn’t consume too much memory to make sure we don’t increase space amplification drastically. One that is fast enough so that a lookup doesn’t introduce too much overhead, especially if we have to check a cache before making any lookup in an SSTable. The element is definitely not in the set (no false negatives). The element may be in the set (false positives are possible). A memtable buffers writes. As it’s the main entry point for writes, a write has to be fast. → OK: a hashtable has average inserts, plus ( : the length of the key) for hashing. For reads, doing a key lookup has to be fast → OK: average lookups, plus to hash. Doing range scanning operations (week 5, optional work), such as: “ Give me the list of keys between bar and foo “ → A hashtable, because it’s not an ordered data structure, is terrible: you end up touching everything so with the number of elements in the hashtable. Flush to L0 → A hashtable isn’t ordered, so it requires sorting all the keys ( ) with n the number of elements) to produce the SSTables. As you can see, we didn’t duplicate the first four letters of to store . In this very example, instead of storing 9 letters for and , we stored only five letters. Yet, you’re not going to implement a “basic” trie for your memtable; instead, you will implement a compressed trie called a radix trie (also known as a patricia 2 trie). Back to the previous example, storing one node (one square) has an overhead. It usually means at least one extra field to store the next element, usually a pointer. In the previous example, we needed 11 nodes in total, but what if we could compress the number of nodes required? The idea is to combine nodes with a single child: This new trie stores the exact same information, except it requires 6 nodes instead of 11. That’s what radix tries are about. To summarize the benefits of switching a memtable from a hashtable to a radix trie: Ordered by design: Tries keep keys in order and make prefix/range lookups natural, which helps for and for streaming a sorted flush. No rebalancing/rehashing pauses: The shape doesn’t depend on insertion order, and operations don’t need rebalancing; you avoid periodic rehash work. Prefix compression: A radix trie can cut duplicated key bytes in the memtable, reducing in-memory space. (false-positive rate) = 1% (max elements per SSTable) = 1,953 (hash functions) = 5 Startup: For each SSTable in the MANIFEST, cache its related Bloom filter in memory. Since each Bloom filter requires only a small amount of space, this optimization has a minimal memory footprint. For example, caching 1,000 Bloom filters of the type you designed requires less than 2.5 MB of memory. SSTable creation: For each new SSTable you write, initialize an empty bitvector of 2,496 B. Build the Bloom filter in memory as you emit the keys (including tombstones): Compute based on the key. For each , set bit at position . When the SSTable is done, persist a sidecar file next to it (e.g., and ) and the file. Update the cache containing the Bloom filters. Compaction: Delete from memory the Bloom filters corresponding to deleted SSTables. Lookup: Before reading an SSTable: Compute based on the key. If all the bits of are set: The key may be present, therefore, proceed with your normal lookup in the SSTable. Otherwise: Skip this SSTable. : Compressed edge fragment. : A map keyed by the next character after to a node. : An enum with the different possible values: : The node is just a prefix, no full key ends here. : A full key exists at this node. : This key was explicitly deleted. : If is , the corresponding value. : Walk from the root, matching the longest common prefix against . If partial match in the middle of an edge, split once: Create a parent with the common part, two children: the old suffix and the new suffix. Descend via the next child (next unmatched character). At the terminal node: set and : Walk edges by longest-prefix match. If an edge doesn’t match, return not found. At the terminal node: If : return If or , return not found. : Walk as in . If the path doesn’t fully exist, create the missing suffix nodes with so that a terminal node exists. At the terminal node: set (you may have to clear ). In-order traversal: : Emit . : Emit tombstone. : Emit nothing. Dividing the bloom filter into contiguous blocks, each the size of a cache line. Restricting each query to a single block to ensure all bit lookups stay within the same cache line.

0 views

Dangling Pointers 1 months ago

Flexible I/O for Database Management Systems with xNVMe

Flexible I/O for Database Management Systems with xNVMe Emil Houlborg, Simon A. F. Lund, Marcel Weisgut, Tilmann Rabl, Javier González, Vivek Shah, Pınar Tözün CIDR’26 This paper describes xNVMe , a storage library (developed by Samsung), and demonstrates how it can be integrated into DuckDB. Section 2 contains the hard sell for . The “x” prefix serves a similar role to the “X” in DirectX. It is fast, while also being portable across operating systems and storage devices. The C API will feel like home for folks who have experience with low-level graphics APIs (no shaders on the disk yet, sorry). There are APIs to open a handle to a device, allocate buffers, and submit NVMe commands (synchronously or asynchronously). Listing 3 has an example, which feels like “Mantle for NVMe”: Source: https://www.cidrdb.org/cidr2026/papers/p6-houlborg.pdf The API works on Linux, FreeBSD, Windows, and macOS. Some operating systems have multiple backends available (e.g., , ). The point of this paper is that it is easy to drop into an existing application. The paper describes , which is an implementation of the DuckDB interface and uses . creates dedicated queues for each DuckDB worker thread to avoid synchronization (similar tricks are used by applications calling graphics APIs in parallel). The paper also describes how supports shiny new NVMe features like Flexible Data Placement (FDP). This allows DuckDB to pass hints to the SSD to colocate buffers with similar lifetimes (which improves garbage collection performance). Most of the results in the paper show comparable performance for vs the baseline DuckDB filesystem. Fig. 5 shows one benchmark where yields a significant improvement: Source: https://www.cidrdb.org/cidr2026/papers/p6-houlborg.pdf Dangling Pointers I think the long-term success of will depend on governance. Potential members of the ecosystem could be scared off by Samsung’s potential conflict of interest (i.e., will Samsung privilege Samsung SSDs in some way?) There is a delicate balancing act between an API driven by a sluggish bureaucratic committee, and an API which is dominated by one vendor. Subscribe now

0 views

Phil Eaton 1 months ago

We have pgvector at home

This is an external post of mine. Click here if you are not redirected.

0 views

Justin Duke 2 months ago

One status field per model

Any sufficiently old application starts to succumb to a pernicious form of technical debt known in street parlance as shitty data modeling . Sometimes this manifests as the god object: a single model that represents the user, and the settings, and the DNS configuration, and twelve other things. Sometimes this comes in the form of a table (or multiple tables) where the initial set of data modeling concerns in the early goings of the project don't quite match the reality discovered along the way, and a series of subtle mismatches collide with each other in the same way that subtle mismatches between tectonic plates do. Data models, unlike other areas of tech debt, are correctly scary to refactor. Even in Django — an application framework with really robust, mature migration tooling — reshaping data in production is non-trivial. The weight associated with even relatively simple schema changes can be so overwhelming as to forever dissuade a would-be re-architect from making things right. Therefore, it is that much more important to spend the extra mental energy early on to make sure, whenever possible, your data model is a roughly correct one, and to course correct early when it isn't. There are many ways to do this, and the goal of describing a virtuous data model in its entirety is too large and broad a problem for this measly little essay. Instead, I want to share a heuristic that I have found particularly useful — one which is summed up, as many of my blog posts are, in the title. Every data model must have at most one status field. If you're thinking about making a change such that a model has more than one status field, you have the wrong data model. Let me illustrate via self-flagellation and talk about Buttondown's own problematic model: the object. The object has three status fields within its lush, expansive confines: This is incorrect. We should have created standalone models for the sending domain and hosting domain, each with a simple field of its own, and drawn foreign keys from the onto those. We did not do this, because at the time it felt like overkill. And so. You pay the price — not in any one specific bug, but in weirdness , in the difficulty of reasoning about the code. Is there a meaningful difference between an status and a of for an active newsletter, versus an status and a of ? What queries should return which combinations? The confusion compounds. Again, I know this sounds trivial. But every good data model has syntactic sugar around the state machine, and every good state machine has a unary representation of its state. 1 See also: enums . A field (the normal one)

0 views

Phil Eaton 2 months ago

Paths of MySQL, vector search edition

This is an external post of mine. Click here if you are not redirected.

0 views

Justin Duke 2 months ago

Brief notes on migrating to Postgres-backed jobs

It seems premature to talk about a migration that is only halfway done, even if it's the hard half that's done — but I think there's something useful in documenting the why and how of a transition while you're still in the thick of it, before the revisionist history of completion sets in. Early last year, we built out a system for running background jobs directly against Postgres within Django. This very quickly got abstracted out into a generic task runner — shout out to Brandur and many other people who have been beating this drum for a while. And as far as I can tell, this concept of shifting away from Redis and other less-durable caches for job infrastructure is regaining steam on the Rails side of the ecosystem, too. The reason we did it was mostly for ergonomics around graceful batch processing. It is significantly easier to write a poller in Django for stuff backed by the ORM than it is to try and extend RQ or any of the other task runner options that are Redis-friendly. Django gives you migrations, querysets, admin visibility, transactional guarantees — all for free, all without another moving part. And as we started using it and it proved stable, we slowly moved more and more things over to it. At the time of this writing, around half of our jobs by quantity — which represent around two-thirds by overall volume — have been migrated over from RQ onto this system. This is slightly ironic given that we also last year released django-rq-cron , a library that, if I have my druthers, we will no longer need. Fewer moving parts is the watchword. We're removing spindles from the system and getting closer and closer to a simple, portable, and legible stack of infrastructure.

1 views

Ed Zitron's Where's Your Ed At 2 months ago

Premium: The Hater's Guide to Oracle

You can’t avoid Oracle. No, really, you can’t. Oracle is everywhere. It sells ERP software – enterprise resource planning, which is a rat king of different services for giant companies for financial services, procurement (IE: sourcing and organizing the goods your company needs to run), compliance, project management, and human resources. It sells database software, and even owns the programming language Java as part of its acquisition of Sun Microsystems back in 2010 . Its customers are fucking everyone: hospitals ( such as England’s National Health Service ), large corporations (like Microsoft), health insurance companies, Walmart, and multiple different governments. Even if you have never even heard of Oracle before, it’s almost entirely certain that your personal data is sitting in an Oracle-designed system somewhere. Once you let Oracle into your house, it never leaves. Canceling contracts is difficult, to the point that one Redditor notes that some clients agreed to spend a minimum amount of money on services without realizing, meaning that you can’t remove services you don’t need even during the renewal of a contract . One user from three years ago told the story of adding two users to their contract for Oracle’s Netsuite Starter Edition ( around $1000 a month in today’s pricing ), only for an Oracle account manager to call a day later to demand they upgrade to the more expensive package ($2500 per month) for every user. In a thread from a year ago , another user asked for help renegotiating their contract for Netsuite, adding that “[their] company is no where near the state needed to begin an implementation” and “would use a third party partner to implement” software that they had been sold by Oracle. One user responded by saying that Oracle would play hardball and “may even use [the] threat of attorneys.” In fact, there are entire websites about negotiations with Oracle, with Palisade Compliance saying that “Oracle likes a frenetic pace where contracts are reviewed and dialogues happen under the constant pressure of Oracle’s quarter closes,” describing negotiations with them as “often rushed, filled with tension, and littered with threats from aggressive sales and Oracle auditing personnel.” This is something you can only do when you’ve made it so incredibly difficult to change providers. What’re you gonna do? Have your entire database not work? Pay up. Oracle also likes to do “audits” of big customers where it makes sure that every single part of your organization that uses Oracle software is paying for it, or were not using it in a way that was not allowed based on their contract . For example, Oracle sued healthcare IT company Perry Johnson & Associates in 2020 because the company that built PJ&A’s database systems used Oracle’s database software. The case was settled. This is all to say that Oracle is a big company that sells lots of stuff, and increases the pressure around its quarterly earnings as a means of boosting revenues. If you have a company with computers that might be running Java or Oracle’s software — even if somebody else installed it for you! — you’ll be paying Oracle, one way or another. They even tried to sue Google for using the open source version of Java to build its Android operating system (though they lost). Oracle is a huge, inevitable pain in the ass, and, for the most part, an incredibly profitable one . Every time a new customer signs on at Oracle, they pledge themselves to the Graveyard Smash and permanent fealty to Larry Ellison’s database empire. As a result, founder Larry Ellison has become one of the richest people in the world — the fifth-largest as of writing this sentence — owning 40% of Oracle’s stock and, per Martin Peers of The Information, will earn about $2.3 billion in dividends in the next year. Oracle has also done well to stay out of bullshit hype-cycles. While it quickly spun up vague blockchain and metaverse offerings, its capex stayed relatively flat at around $1 billion to $2.1 billion a fiscal year (which runs from June 1 to May 31), until it burst to $4.511 in FY2022 (which began on June 1, 2021, for reference), $8.695 billion in FY2023, $6.86 billion in FY2024, and then increasing a teeny little bit to $21.25 billion in FY2025 as it stocked up on AI GPUs and started selling compute. You may be wondering if that helped at all, and it doesn’t appear to have at all. Oracle’s net income has stayed in the $2 billion to $3 billion range for over a decade , other than a $2.7 billion spike last quarter from its sale of its shares in Ampere . You see, things have gotten weird at Oracle, in part because of the weirdness of the Ellisons themselves, and their cozy relationship with the Trump Administration ( and Trump itself ). Ellison’s massive wealth backed son David Ellison’s acquisition of Paramount , putting conservative Bari Weiss at the helm of CBS in an attempt to placate and empower the right wing, and is currently trying to buy Warner Brothers Discovery ( though it appears Netflix may have won ), all in pursuit of kissing up to a regime steeped in brutality and bigotry that killed two people in Minnesota. Oracle will serve as the trusted security partner, responsible for auditing and ensuring compliance with National Security Terms, according to a memo. The company already provides cloud services for TikTok and manages user data in the U.S. Notably, Oracle previously made a bid for TikTok back in 2020. I know that you’re likely a little scared that an ultra right-wing billionaire has bought another major social network. I know you think that Oracle, a massive and inevitable cloud storage platform owned by a man who looks like H.R. Giger drew Jerry Stiller. I know you’re likely worried about a replay of the Elon Musk Twitter fiasco, where every week it seemed like things would collapse but it never seemed to happen, and then Musk bought an election. What if I told you that things were very different, and far more existentially perilous for Oracle? You see, Oracle is arguably one of the single-most evil and successful companies in the world, and it’s got there by being an aggressive vendor of database and ERP software, one that, like a tick with a law degree, cannot be removed without some degree of bloodshed. Perhaps not the highest-margin business in the world, but you know, it worked. Oracle has stuck to the things it’s known for for years and years and done just fine… …until AI, that is. Let’s see what AI has done for Oracle’s gross margi- OH MY GOD ! The scourge of AI GPUs has taken Oracle’s gross margin from around 79% in 2021 to 68.54% in 2025, with CNBC reporting that FactSet-polled analysts saw it falling to 49% by 2030 , which I think is actually being a little optimistic. Oracle was very early to high-performance computing, becoming the first cloud in the world to have general availability of NVIDIA’s A100 GPUs back in September 2020 , and in June 2023 (at the beginning of Oracle’s FY2024), Ellison declared that Oracle would spend “billions” on NVIDIA GPUs, naming AI firm Cohere as one of its customers. In May 2024, Musk and Ellison discussed a massive cloud compute contract — a multi-year, $10 billion deal that fell apart in July 2024 when Musk got impatient , a blow that was softened by Microsoft’s deal to buy compute capacity for OpenAI , for chips to be rented out of a data center in Abilene Texas that, about six months later, OpenAI would claim was part of a “$500 billion Stargate initiative” announcement between Oracle, SoftBank and OpenAI that was so rushed that Ellison had to borrow a coat to stay warm on the White House lawn, per The Information . “Stargate” is commonly misunderstood as a Trump program, or something that has raised $500 billion, when what it actually is is Oracle raising debt to build data centers for OpenAI. Instead of staying in its lane as a dystopian datacenter mobster, Oracle entered into negative-to-extremely-low margin realm of GPU rentals, raising $58 billion in debt and signing $248 billion in data center leases to service a 5-year-long $300 billion contract with OpenAI that it doesn’t have the capacity for and OpenAI doesn’t have the money to pay for . Oh, and TikTok? The billion-user social network that Oracle sort-of-just bought? There’s one little problem with it: per The Information , ByteDance investors estimate TikTok lost several billion dollars last year on revenues of roughly $20 billion, attributed to its high growth costs and, per The Information, “higher operational and labor costs in overseas markets compared to China.” Now, I know what you’re gonna say: Ellison bought TikTok as a propaganda tool, much like Musk bought Twitter. “The plan isn’t for it to be profitable,” you say. “It’s all about control” you say, and I say, in response, that you should know exactly how fucked Oracle is. In its last quarter, Oracle had negative $13 billion in cash flow , and between 2022 and late 2025 quintupled its PP&E (from $12.8 billion to $67.85 billion), primarily through the acquisition of GPUs for AI compute. Its remaining performance obligations are $523 billion , with $300 billion of that coming from OpenAI in a deal that starts, according to the Wall Street Journal, “ in 2027 ,” with data centers that are so behind in construction that the best Oracle could muster is saying that 96,000 B200 GPUs had been “delivered” to the Stargate Abilene data center in December 2025 for a data center of 450,000 GPUs that has to be fully operational by the end of 2026 without fail. And what’re the margins on those GPUs? Negative 100% . Oracle, a business borne of soulless capitalist brutality, has tied itself existentially to not just the success of AI , but the specific, incredible, impossible success of OpenAI , which will have to muster up $30 billion in less than a year to start paying for it, and another $270 billion or more to pay for the rest… at a time when Oracle doesn’t have the capacity and has taken on brutal debt to build it. For Oracle to survive , OpenAI must find a way to pay it four times the annual revenue of Microsoft Azure ($75 billion) , and because OpenAI burns billions of dollars, it’s going to have to raise all of that money at a time of historically low liquidity for venture capital . Did I mention that Oracle took on $56 billion of debt to build data centers specifically for OpenAI? Or that the banks who invested in these deals don’t seem to be able to sell off the debt ? Let me put it really simply: We are setting up for a very funny and chaotic situation where Oracle simply runs out of money, and in the process blows up Larry Ellison’s fortune. However much influence Ellison might have with the administration, Oracle has burdened itself with debt and $248 billion in data center lease obligations — costs that are inevitable, and are already crushing the life out of the company (and the stock). The only way out is if OpenAI becomes literally the most-successful cash-generating company of all time within the next two years, and that’s being generous. This is not a joke. This is not an understatement. Sam Altman holds Larry Ellison’s future in his clammy little hands, and there isn’t really anything anybody can do about it other than hope for the best, because Oracle already took on all that debt and capex. Forget about politics, forget about the fear in your heart that the darkness always wins, and join me in The Hater’s Guide To Oracle, or My Name’s Larry Ellison, and Welcome To Jackass. Larry Ellison’s wealth is almost entirely tied up in Oracle stock. Oracle’s stock is tied to the company “Oracle,” which is currently destroying its margins and annihilating its available cash to buy GPUs to serve a customer that cannot afford to pay it. Oracle has taken on ruinous debt that can only be paid if this customer, which cannot afford it and needs to raise money from an already-depleted venture capital pool, actually pays it. Oracle’s stock has already been punished for these debts , and that’s before OpenAI fails to pay for its contract. Oracle now owns part of one of its largest cloud customers, TikTok, which loses billions of dollars a year, and the US entity says, per Bloomberg , that it will “retrain, test and update the content recommendation algorithm on US user data,” guaranteeing that it’ll fuck up whatever makes it useful, reducing its efficacy for advertisers. Larry Ellison’s entire financial future is based on whether OpenAI lives or dies. If it dies, there isn’t another entity in the universe that can actually afford (or has interest in) the scale of the compute Oracle is building.

0 views