Posts in Sql (20 found)

Optimizing Datalog for the GPU

Optimizing Datalog for the GPU Yihao Sun, Ahmedur Rahman Shovon, Thomas Gilray, Sidharth Kumar, and Kristopher Micinski ASPLOS'25 Datalog source code comprises a set of relations, and a set of rules. A relation can be explicitly defined with a set of tuples. A running example in the paper is to define a graph with a relation named : A relation can also be implicitly defined with a set of rules. The paper uses the relation as an example: Rule 1 states that two vertices ( and ) are part of the same generation if they both share a common ancestor ( ), and they are not actually the same vertex ( ). Rule 2 states that two vertices ( and ) are part of the same generation if they have ancestors ( and ) from the same generation. “Running a Datalog program” entails evaluating all rules until a fixed point is reached (no more tuples are added). One key idea to internalize is that evaluating a Datalog rule is equivalent to performing a SQL join. For example, rule 1 is equivalent to joining the relation with itself, using as the join key, and as a filter. Semi-naïve Evaluation is an algorithm for performing these joins until convergence, while not wasting too much effort on redundant work. The tuples in a relation are put into three buckets: : holds tuples that were discovered on the current iteration holds tuples which were added in the previous iteration : holds all tuples that have been found in any iteration For a join involving two relations ( and ), is computed as the union of the result of 3 joins: joined with joined with joined with The key fact for performance is that is never joined with . More details on Semi-naïve Evaluation can be found in these notes . This paper introduces the hash-indexed sorted array for storing relations while executing Semi-naïve Evaluation on a GPU. It seems to me like this data structure would work well on other chips too. Fig. 2 illustrates the data structure (join keys are in red): Source: https://dl.acm.org/doi/10.1145/3669940.3707274 The data array holds the actual tuple data. It is densely packed in row-major order. The sorted index array holds pointers into the data array (one pointer per tuple). These pointers are lexicographically sorted (join keys take higher priority in the sort). The hash table is an open-addressed hash table which maps a hash of the join keys to the first element in the sorted index array that contains those join keys. A join of relations A and , can be implemented with the following pseudo-code: Memory accesses when probing through the sorted index array are coherent. Memory accesses when accessing the data array are coherent up to the number of elements in a tuple. Table 3 compares the results from this paper (GPULog) against a state-of-the-art CPU implementation (Soufflé). HIP represents GPULog ported to AMD’s HIP runtime and then run on the same Nvidia GPU. Source: https://dl.acm.org/doi/10.1145/3669940.3707274 Dangling Pointers The data structure and algorithms described by this paper seem generic, it would be interesting to see them run on other chips (FPGA, DPU, CPU, HPC cluster). I would guess most of GPULog is bound by memory bandwidth, not compute. I wonder if there are Datalog-specific algorithms to reduce the bandwidth/compute ratio. Subscribe now : holds tuples that were discovered on the current iteration holds tuples which were added in the previous iteration : holds all tuples that have been found in any iteration joined with joined with joined with

2 views
Simon Willison 2 weeks ago

Designing agentic loops

Coding agents like Anthropic's Claude Code and OpenAI's Codex CLI represent a genuine step change in how useful LLMs can be for producing working code. These agents can now directly exercise the code they are writing, correct errors, dig through existing implementation details, and even run experiments to find effective code solutions to problems. As is so often the case with modern AI, there is a great deal of depth involved in unlocking the full potential of these new tools. A critical new skill to develop is designing agentic loops . One way to think about coding agents is that they are brute force tools for finding solutions to coding problems. If you can reduce your problem to a clear goal and a set of tools that can iterate towards that goal a coding agent can often brute force its way to an effective solution. My preferred definition of an LLM agent is something that runs tools in a loop to achieve a goal . The art of using them well is to carefully design the tools and loop for them to use. Agents are inherently dangerous - they can make poor decisions or fall victim to malicious prompt injection attacks , either of which can result in harmful results from tool calls. Since the most powerful coding agent tool is "run this command in the shell" a rogue agent can do anything that you could do by running a command yourself. To quote Solomon Hykes : An AI agent is an LLM wrecking its environment in a loop. Coding agents like Claude Code counter this by defaulting to asking you for approval of almost every command that they run. This is kind of tedious, but more importantly, it dramatically reduces their effectiveness at solving problems through brute force. Each of these tools provides its own version of what I like to call YOLO mode, where everything gets approved by default. This is so dangerous , but it's also key to getting the most productive results! Here are three key risks to consider from unattended YOLO mode. If you want to run YOLO mode anyway, you have a few options: Most people choose option 3. Despite the existence of container escapes I think option 1 using Docker or the new Apple container tool is a reasonable risk to accept for most people. Option 2 is my favorite. I like to use GitHub Codespaces for this - it provides a full container environment on-demand that's accessible through your browser and has a generous free tier too. If anything goes wrong it's a Microsoft Azure machine somewhere that's burning CPU and the worst that can happen is code you checked out into the environment might be exfiltrated by an attacker, or bad code might be pushed to the attached GitHub repository. There are plenty of other agent-like tools that run code on other people's computers. Code Interpreter mode in both ChatGPT and Claude can go a surprisingly long way here. I've also had a lot of success (ab)using OpenAI's Codex Cloud . Coding agents themselves implement various levels of sandboxing, but so far I've not seen convincing enough documentation of these to trust them. Update : It turns out Anthropic have their own documentation on Safe YOLO mode for Claude Code which says: Letting Claude run arbitrary commands is risky and can result in data loss, system corruption, or even data exfiltration (e.g., via prompt injection attacks). To minimize these risks, use in a container without internet access. You can follow this reference implementation using Docker Dev Containers. Locking internet access down to a list of trusted hosts is a great way to prevent exfiltration attacks from stealing your private source code. Now that we've found a safe (enough) way to run in YOLO mode, the next step is to decide which tools we need to make available to the coding agent. You can bring MCP into the mix at this point, but I find it's usually more productive to think in terms of shell commands instead. Coding agents are really good at running shell commands! If your environment allows them the necessary network access, they can also pull down additional packages from NPM and PyPI and similar. Ensuring your agent runs in an environment where random package installs don't break things on your main computer is an important consideration as well! Rather than leaning on MCP, I like to create an AGENTS.md (or equivalent) file with details of packages I think they may need to use. For a project that involved taking screenshots of various websites I installed my own shot-scraper CLI tool and dropped the following in : Just that one example is enough for the agent to guess how to swap out the URL and filename for other screenshots. Good LLMs already know how to use a bewildering array of existing tools. If you say "use playwright python " or "use ffmpeg" most models will use those effectively - and since they're running in a loop they can usually recover from mistakes they make at first and figure out the right incantations without extra guidance. In addition to exposing the right commands, we also need to consider what credentials we should expose to those commands. Ideally we wouldn't need any credentials at all - plenty of work can be done without signing into anything or providing an API key - but certain problems will require authenticated access. This is a deep topic in itself, but I have two key recommendations here: I'll use an example to illustrate. A while ago I was investigating slow cold start times for a scale-to-zero application I was running on Fly.io . I realized I could work a lot faster if I gave Claude Code the ability to directly edit Dockerfiles, deploy them to a Fly account and measure how long they took to launch. Fly allows you to create organizations, and you can set a budget limit for those organizations and issue a Fly API key that can only create or modify apps within that organization... So I created a dedicated organization for just this one investigation, set a $5 budget, issued an API key and set Claude Code loose on it! In that particular case the results weren't useful enough to describe in more detail, but this was the project where I first realized that "designing an agentic loop" was an important skill to develop. Not every problem responds well to this pattern of working. The thing to look out for here are problems with clear success criteria where finding a good solution is likely to involve (potentially slightly tedious) trial and error . Any time you find yourself thinking "ugh, I'm going to have to try a lot of variations here" is a strong signal that an agentic loop might be worth trying! A few examples: A common theme in all of these is automated tests . The value you can get from coding agents and other LLM coding tools is massively amplified by a good, cleanly passing test suite. Thankfully LLMs are great for accelerating the process of putting one of those together, if you don't have one yet. Designing agentic loops is a very new skill - Claude Code was first released in just February 2025! I'm hoping that giving it a clear name can help us have productive conversations about it. There's so much more to figure out about how to use these tools as effectively as possible. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . The joy of YOLO mode Picking the right tools for the loop Issuing tightly scoped credentials When to design an agentic loop This is still a very fresh area Bad shell commands deleting or mangling things you care about. Exfiltration attacks where something steals files or data visible to the agent - source code or secrets held in environment variables are particularly vulnerable here. Attacks that use your machine as a proxy to attack another target - for DDoS or to disguise the source of other hacking attacks. Run your agent in a secure sandbox that restricts the files and secrets it can access and the network connections it can make. Use someone else's computer. That way if your agent goes rogue, there's only so much damage they can do, including wasting someone else's CPU cycles. Take a risk! Try to avoid exposing it to potential sources of malicious instructions and hope you catch any mistakes before they cause any damage. Try to provide credentials to test or staging environments where any damage can be well contained. If a credential can spend money, set a tight budget limit. Debugging : a test is failing and you need to investigate the root cause. Coding agents that can already run your tests can likely do this without any extra setup. Performance optimization : this SQL query is too slow, would adding an index help? Have your agent benchmark the query and then add and drop indexes (in an isolated development environment!) to measure their impact. Upgrading dependencies : you've fallen behind on a bunch of dependency upgrades? If your test suite is solid an agentic loop can upgrade them all for you and make any minor updates needed to reflect breaking changes. Make sure a copy of the relevant release notes is available, or that the agent knows where to find them itself. Optimizing container sizes : Docker container feeling uncomfortably large? Have your agent try different base images and iterate on the Dockerfile to try to shrink it, while keeping the tests passing.

3 views
Armin Ronacher 2 weeks ago

90%

“I think we will be there in three to six months, where AI is writing 90% of the code. And then, in 12 months, we may be in a world where AI is writing essentially all of the code” — Dario Amodei Three months ago I said that AI changes everything. I came to that after plenty of skepticism. There are still good reasons to doubt that AI will write all code, but my current reality is close. For the infrastructure component I started at my new company, I’m probably north of 90% AI-written code. I don’t want to convince you — just share what I learned. In parts, because I approached this project differently from my first experiments with AI-assisted coding. The service is written in Go with few dependencies and an OpenAPI-compatible REST API. At its core, it sends and receives emails. I also generated SDKs for Python and TypeScript with a custom SDK generator. In total: about 40,000 lines, including Go, YAML, Pulumi, and some custom SDK glue. I set a high bar, especially that I can operate it reliably. I’ve run similar systems before and knew what I wanted. Some startups are already near 100% AI-generated. I know, because many build in the open and you can see their code. Whether that works long-term remains to be seen. I still treat every line as my responsibility, judged as if I wrote it myself. AI doesn’t change that. There are no weird files that shouldn’t belong there, no duplicate implementations, and no emojis all over the place. The comments still follow the style I want and, crucially, often aren’t there. I pay close attention to the fundamentals of system architecture, code layout, and database interaction. I’m incredibly opinionated. As a result, there are certain things I don’t let the AI do. I know it won’t reach the point where I could sign off on a commit. That’s why it’s not 100%. As contrast: another quick prototype we built is a mess of unclear database tgables, markdown file clutter in the repo, and boatloads of unwanted emojis. It served its purpose — validate an idea — but wasn’t built to last, and we had no expectation to that end. I began in the traditional way: system design, schema, architecture. At this state I don’t let the AI write, but I loop it in AI as a kind of rubber duck. The back-and-forth helps me see mistakes, even if I don’t need or trust the answers. I did get the foundation wrong once. I initially argued myself into a more complex setup than I wanted. That’s a part where I later used the LLM to redo a larger part early and clean it up. For AI-generated or AI-supported code, I now end up with a stack that looks something like something I often wanted, but was too hard to do by hand: Raw SQL: This is probably the biggest change to how I used to write code. I really like using an ORM, but I don’t like some of its effects. In particular, once you approach the ORM’s limits, you’re forced to switch to handwritten SQL. That mapping is often tedious because you lose some of the powers the ORM gives you. Another consequence is that it’s very hard to find the underlying queries, which makes debugging harder. Seeing the actual SQL in your code and in the database log is powerful. You always lose that with an ORM. The fact that I no longer have to write SQL because the AI does it for me is a game changer. I also use raw SQL for migrations now. OpenAPI first: I tried various approaches here. There are many frameworks you can use. I ended up first generating the OpenAPI specification and then using code generation from there to the interface layer. This approach works better with AI-generated code. The OpenAPI specification is now the canonical one that both clients and server shim is based on. Today I use Claude Code and Codex. Each has strengths, but the constant is Codex for code review after PRs. It’s very good at that. Claude is indispensable still when debugging and needing a lot of tool access (eg: why do I have a deadlock, why is there corrupted data in the database etc.). The working together of the two is where it’s most magical. Claude might find the data, Codex might understand it better. I cannot stress enough how bad the code from these agents can be if you’re not careful. While they understand system architecture and how to build something, they can’t keep the whole picture in scope. They will recreate things that already exist. They create abstractions that are completely inappropriate for the scale of the problem. You constantly need to learn how to bring the right information to the context. For me, this means pointing the AI to existing implementations and giving it very specific instructions on how to follow along. I generally create PR-sized chunks that I can review. There are two paths to this: Agent loop with finishing touches: Prompt until the result is close, then clean up. Lockstep loop: Earlier I went edit by edit. Now I lean on the first method most of the time, keeping a todo list for cleanups before merge. It requires intuition to know when each approach is more likely to lead to the right results. Familiarity with the agent also helps understanding when a task will not go anywhere, avoiding wasted cycles. The most important piece of working with an agent is the same as regular software engineering. You need to understand your state machines, how the system behaves at any point in time, your database. It is easy to create systems that appear to behave correctly but have unclear runtime behavior when relying on agents. For instance, the AI doesn’t fully comprehend threading or goroutines. If you don’t keep the bad decisions at bay early it, you won’t be able to operate it in a stable manner later. Here’s an example: I asked it to build a rate limiter. It “worked” but lacked jitter and used poor storage decisions. Easy to fix if you know rate limiters, dangerous if you don’t. Agents also operate on conventional wisdom from the internet and in tern do things I would never do myself. It loves to use dependencies (particularly outdated ones). It loves to swallow errors and take away all tracebacks. I’d rather uphold strong invariants and let code crash loudly when they fail, than hide problems. If you don’t fight this, you end up with opaque, unobservable systems. For me, this has reached the point where I can’t imagine working any other way. Yes, I could probably have done it without AI. But I would have built a different system in parts because I would have made different trade-offs. This way of working unlocks paths I’d normally skip or defer. Here are some of the things I enjoyed a lot on this project: Research + code, instead of research and code later: Some things that would have taken me a day or two to figure out now take 10 to 15 minutes. It allows me to directly play with one or two implementations of a problem. It moves me from abstract contemplation to hands on evaluation. Trying out things: I tried three different OpenAPI implementations and approaches in a day. Constant refactoring: The code looks more organized than it would otherwise have been because the cost of refactoring is quite low. You need to know what you do, but if set up well, refactoring becomes easy. Infrastructure: Claude got me through AWS and Pulumi. Work I generally dislike became a few days instead of weeks. It also debugged the setup issues as it was going through them. I barely had to read the docs. Adopting new patterns: While they suck at writing tests, they turned out great at setting up test infrastructure I didn’t know I needed. I got a recommendation on Twitter to use testcontainers for testing against Postgres. The approach runs migrations once and then creates database clones per test. That turns out to be super useful. It would have been quite an involved project to migrate to. Claude did it in an hour for all tests. SQL quality: It writes solid SQL I could never remember. I just need to review which I can. But to this day I suck at remembering and when writing it. Is 90% of code going to be written by AI? I don’t know. What I do know is, that for me, on this project, the answer is already yes. I’m part of that growing subset of developers who are building real systems this way. At the same time, for me, AI doesn’t own the code. I still review every line, shape the architecture, and carry the responsibility for how it runs in production. But the sheer volume of what I now let an agent generate would have been unthinkable even six months ago. That’s why I’m convinced this isn’t some far-off prediction. It’s already here — just unevenly distributed — and the number of developers working like this is only going to grow. That said, none of this removes the need to actually be a good engineer. If you let the AI take over without judgment, you’ll end up with brittle systems and painful surprises (data loss, security holes, unscalable software). The tools are powerful, but they don’t absolve you of responsibility. Raw SQL: This is probably the biggest change to how I used to write code. I really like using an ORM, but I don’t like some of its effects. In particular, once you approach the ORM’s limits, you’re forced to switch to handwritten SQL. That mapping is often tedious because you lose some of the powers the ORM gives you. Another consequence is that it’s very hard to find the underlying queries, which makes debugging harder. Seeing the actual SQL in your code and in the database log is powerful. You always lose that with an ORM. The fact that I no longer have to write SQL because the AI does it for me is a game changer. I also use raw SQL for migrations now. OpenAPI first: I tried various approaches here. There are many frameworks you can use. I ended up first generating the OpenAPI specification and then using code generation from there to the interface layer. This approach works better with AI-generated code. The OpenAPI specification is now the canonical one that both clients and server shim is based on. Agent loop with finishing touches: Prompt until the result is close, then clean up. Lockstep loop: Earlier I went edit by edit. Now I lean on the first method most of the time, keeping a todo list for cleanups before merge. Research + code, instead of research and code later: Some things that would have taken me a day or two to figure out now take 10 to 15 minutes. It allows me to directly play with one or two implementations of a problem. It moves me from abstract contemplation to hands on evaluation. Trying out things: I tried three different OpenAPI implementations and approaches in a day. Constant refactoring: The code looks more organized than it would otherwise have been because the cost of refactoring is quite low. You need to know what you do, but if set up well, refactoring becomes easy. Infrastructure: Claude got me through AWS and Pulumi. Work I generally dislike became a few days instead of weeks. It also debugged the setup issues as it was going through them. I barely had to read the docs. Adopting new patterns: While they suck at writing tests, they turned out great at setting up test infrastructure I didn’t know I needed. I got a recommendation on Twitter to use testcontainers for testing against Postgres. The approach runs migrations once and then creates database clones per test. That turns out to be super useful. It would have been quite an involved project to migrate to. Claude did it in an hour for all tests. SQL quality: It writes solid SQL I could never remember. I just need to review which I can. But to this day I suck at remembering and when writing it.

0 views
Marc Brooker 3 weeks ago

Seven Years of Firecracker

Time flies like an arrow. Fruit flies like a banana. Back at re:Invent 2018, we shared Firecracker with the world. Firecracker is open source software that makes it easy to create and manage small virtual machines. At the time, we talked about Firecracker as one of the key technologies behind AWS Lambda, including how it’d allowed us to make Lambda faster, more efficient, and more secure. A couple years later, we published Firecracker: Lightweight Virtualization for Serverless Applications (at NSDI’20). Here’s me talking through the paper back then: The paper went into more detail into how we’re using Firecracker in Lambda, how we think about the economics of multitenancy ( more about that here ), and how we chose virtualization over kernel-level isolation (containers) or language-level isolation for Lambda. Despite these challenges, virtualization provides many compelling benefits. From an isolation perspective, the most compelling benefit is that it moves the security-critical interface from the OS boundary to a boundary supported in hardware and comparatively simpler software. It removes the need to trade off between kernel features and security: the guest kernel can supply its full feature set with no change to the threat model. VMMs are much smaller than general-purpose OS kernels, exposing a small number of well-understood abstractions without compromising on software compatibility or requiring software to be modified. Firecracker has really taken off, in all three ways we hoped it would. First, we use it in many more places inside AWS, backing the infrastructure we offer to customers across multiple services. Second, folks use the open source version directly, building their own cool products and businesses on it. Third, it was the motivation for a wave of innovation in the VM space. In this post, I wanted to write a bit about two of the ways we’re using Firecracker at AWS that weren’t covered in the paper. Bedrock AgentCore Back in July, we announced the preview of Amazon Bedrock AgentCore . AgentCore is built to run AI agents. If you’re not steeped in the world of AI right now, you might be confused by the many definitions of the word agent . I like Simon Willison’s take : An LLM agent runs tools in a loop to achieve a goal. 1 Most production agents today are programs, mostly Python, which use a framework that makes it easy to interact with tools and the underlying AI model. My favorite one of those frameworks is Strands , which does a great job of combining traditional imperative code with prompt-driven model-based interactions. I build a lot of little agents with Strands, most being less than 30 lines of Python (check out the strands samples for some ideas ). So where does Firecracker come in? AgentCore Runtime is the compute component of AgentCore. It’s the place in the cloud that the agent code you’ve written runs. When we looked at the agent isolation problem, we realized that Lambda’s per-function model isn’t rich enough for agents. Specifically, because agents do lots of different kinds of work on behalf of different customers. So we built AgentCore runtime with session isolation . Each session with the agent is given its own MicroVM, and that MicroVM is terminated when the session is over. Over the course of a session (up to 8 hours), there can be multiple interactions with the user, and many tool and LLM calls. But, when it’s over the MicroVM is destroyed and all the session context is securely forgotten. This makes interactions between agent sessions explicit (e.g. via AgentCore Memory or stateful tools), with no interactions at the code level, making it easier to reason about security. Firecracker is great here, because agent sessions vary from milliseconds (single-turn, single-shot, agent interactions with small models), to hours (multi-turn interactions, with thousands of tool calls and LLM interactions). Context varies from zero to gigabytes. The flexibility of Firecracker, including the ability to grow and shrink the CPU and memory use of VMs in place, was a key part of being able to build this economically. Aurora DSQL We announced Aurora DSQL, our serverless relational database with PostgreSQL compatibility, in December 2014. I’ve written about DSQL’s architecture before , but here wanted to highlight the role of Firecracker. Each active SQL transaction in DSQL runs inside its own Query Processor (QPs), including its own copy of PostgreSQL. These QPs are used multiple times (for the same DSQL database), but only handle one transaction at a time. I’ve written before about why this is interesting from a database perspective. Instead of repeating that, lets dive down to the page level and take a look from the virtualization level. Let’s say I’m creating a new DSQL QP in a new Firecracker for a new connection in an incoming database. One way I could do that is to start Firecracker, boot Linux, start PostgreSQL, start the management and observability agents, load all the metadata, and get going. That’s not going to take too long. A couple hundred milliseconds, probably. But we can do much better. With clones . Firecracker supports snapshot and restore , where it writes down all the VM memory, registers, and device state into a file, and then can create a new VM from that file. Cloning is the simple idea that once you have a snapshot you can restore it as many time as you like. So we boot up, start the database, do some customization, and then take a snapshot. When we need a new QP for a given database, we restore the snapshot. That’s orders of magnitude faster. This significantly reduces creation time, saving the CPU used for all that booting and starting. Awesome. But it does something else too: it allows the cloned microVMs to share unchanged ( clean ) memory pages with each other, significantly reducing memory demand (with fine-grained control over what is shared). This is a big saving, because a lot of the memory used by Linux, PostgreSQL, and the other processes on the box aren’t modified again after start-up. VMs get their own copies of pages they write to (we’re not talking about sharing writable memory here), ensuring that memory is still strongly isolated between each MicroVM. Another knock-on effect is the shared pages can also appear only once in some levels of the CPU cache hierarchy, further improving performance. There’s a bit more plumbing that’s needed to make some things like random numbers work correctly in the cloned VMs 2 . Last year, I wrote about our paper Resource management in Aurora Serverless . To understand these systems more deeply, let’s compare their approaches to one common challenge: Linux’s approach to memory management. At a high level, in stock Linux’s mind, an empty memory page is a wasted memory page. So it takes basically every opportunity it can to fill all the available physical memory up with caches, buffers, page caches, and whatever else it may think it’ll want later. This is a great general idea. But in DSQL and Aurora Serverless, where the marginal cost of a guest VM holding onto a page is non-zero, it’s the wrong one for the overall system. As we say in the Aurora serverless paper , Aurora Serverless fixes this with careful tracking of page access frequency: − Cold page identification: A kernel process called DARC [8] continuously monitors pages and identifies cold pages. It marks cold file-based pages as free and swaps out cold anonymous pages. This works well, but is heavier than what we needed for DSQL. In DSQL, we take a much simpler approach: we terminate VMs after a fixed period of time. This naturally cleans up all that built-up cruft without the need for extra accounting. DSQL can do this because connection handling, caching, and concurrency control are handled outside the QP VM. In a lot of ways this is similar to the approach we took with MVCC garbage collection in DSQL. Instead of PostgreSQL’s , which needs to carefully keep track of references to old versions from the set of running transactions, we instead bound the set of running transactions with a simple rule (no transaction can run longer than 5 minutes). This allows DSQL to simply discard versions older than that deadline, safe in the knowledge that they are no longer referenced. Simplicity, as always, is a system property.

0 views
Grumpy Gamer 1 months ago

Comments are back

“But? Wait?” I can hear you saying, “Isn’t grumpygamer.com a static site built by Hugo? What dark magic did you use to get comments on the static site?” No dark magic. But it does involve a small php script. You can embed php in a hugo page and since grumpygamer.com is hosted on my server and it’s running php it wasn’t that hard. No tricky javascript and since it’s all hosted by me, no privacy issues. All your comments stay on my server and don’t feed Big Comment. Comments are stored in flat files so no pesky SQL databases. It only took me about a day, so all in all not bad. I may regret this. I’m only turning on comments for future posts. P.S. I will post the code and a small guide in a few days, so you too can invite the masses to critic and criticize your every word. Good times.

0 views
Nicky Reinert 2 months ago

How to simulate proof of work in SQL and BigQuery

(or: How does blockchain mining work?) A miner has to solve a mathematical problem to validate a block for the Blockhain. The so called Proof of Work requires a lot of computing power aka energy and therefore is quite expensive. Googles BigQuery has a price model, that charges you for the …

0 views
Marc Brooker 2 months ago

Dynamo, DynamoDB, and Aurora DSQL

Programming Model Dynamo is a simple key-value store, that doesn’t offer transactions of any kind: Dynamo does not provide any isolation guarantees and permits only single key updates. DynamoDB offers single-shot serializable ACID transactions, with a single transaction consisting of multiple reads and writes. DSQL has the richest programming model, offering interactive transactions, full SQL support, and a rich type system. Availability and Latency The Dynamo paper makes a number of claims about the trade-offs between consistency, availability, and latency that have not stood the test of time. I’m not trying to call out the paper authors (several are personal friends of mine, and many are long-time colleagues), but point out that we’ve learned a lot about building distributed databases in 20 years. Cloud infrastructure has also advanced considerably. Experience at Amazon has shown that data stores that provide ACID guarantees tend to have poor availability. This was true in the mid 2000s, but many ACID systems offer excellent availability today. That includes DynamoDB, DSQL, and others like Aurora Postgres. DynamoDB and DSQL can tolerate the failure of hosts, or an entire availability zone, without losing consistency, durability, or availability. From the very early replicated database works, it is well known that when dealing with the possibility of network failures, strong consistency and high data availability cannot be achieved simultaneously. Here, the Dynamo paper is citing Bernstein and Goodman (from 1984) and Lindsay et al 1 (from 1979) to highlight the inherent trade-offs between availability and consistency. These results aren’t in any way wrong, but ( as I’ve argued before ), they aren’t as practically important as the Dynamo paper implies they are. Strongly consistent systems offer excellent availability in the face of failures of many kinds ( including entire region failures ). Dynamo also allows applications to pick different trade-offs for performance, losing durability, consistency, or availability in the process. The main advantage of Dynamo is that its client applications can tune the values of N, R and W to achieve their desired levels of performance, availability and durability. This made complete sense in the mid-2000s. But better ways of thinking about replication and failure correlation, vastly improved system performance (thanks SSDs!), and much better datacenter networks have made this kinds of tunability uninteresting. It’s notable that both DynamoDB and DSQL offer significantly lower latencies than Dynamo while making none of the associated trade-offs discussed in the paper. The Amazon Dynamo paper is a classic. You should read it if you haven’t. But time has marched on, we’ve learned a ton, we’ve got better hardware and better ideas, and much of what the Dynamo paper says doesn’t make sense in the real world anymore. That’s a good thing!

0 views
Robin Moffatt 2 months ago

Connecting Apache Flink SQL to Confluent Cloud Kafka broker

This is a quick blog post to remind me how to connect Apache Flink to a Kafka topic on Confluent Cloud. You may wonder why you’d want to do this, given that Confluent Cloud for Apache Flink is a much easier way to run Flink SQL. But, for whatever reason, you’re here and you want to understand the necessary incantations to get this connectivity to work. There are two versions of this connectivity - with, and without, using the Schema Registry for Avro. First off, you need to get two things: The address of your Confluent Cloud broker An API key pair with authorisation to access the topic that you want to read/write

0 views
Anton Zhiyanov 3 months ago

Redka: Redis re-implemented with SQL

I'm a big fan of Redis. It's such an amazing idea to go beyond the get-set paradigm and provide a convenient API for more complex data structures: maps, sets, lists, streams, bloom filters, etc. I'm also a big fan of relational databases and their universal language, SQL. They've really stood the test of time and have proven to solve a wide range of problems from the 1970s to today. So, naturally, one day I decided to combine the two and reimplement Redis using a relational backend — first SQLite, then Postgres. That's how Redka was born. About Redka • Use cases • Usage example • Performance • Final thoughts Redka is a software written in Go. It comes in two flavors: Redka currently supports five core Redis data types: Redka can use either SQLite or PostgreSQL as its backend. It stores data in a database with a simple schema and provides views for better introspection. Here are some situations where Redka might be helpful: Embedded cache for Go applications . If your Go app already uses SQLite or just needs a built-in key-value store, Redka is a natural fit. It gives you Redis-like features without the hassle of running a separate server. You're not limited to just get/set with expiration, of course — more advanced structures like lists, maps, and sets are also available. Lightweight testing environment . Your app uses Redis in production, but setting up a Redis server for local development or integration tests can be a hassle. Redka with an in-memory database offers a fast alternative to test containers, providing full isolation for each test run. Postgres-first data structures . If you prefer to use PostgreSQL for everything but need Redis-like data structures, Redka can use your existing database as the backend. This way, you can manage both relational data and specialized data structures with the same tools and transactional guarantees. You can run the Redka server the same way you run Redis: Then use or any Redis client for your programming language, like , , , and so on: You can also use Redka as a Go package without the server: All data is stored in the database, so you can access it using SQL views: Redka is not about raw performance. You can't beat a specialized data store like Redis with a general-purpose relational backend like SQLite. However, Redka can still handle tens of thousands of operations per second, which should be more than enough for many apps. Here are the redis-benchmark results for 1,000,000 GET/SET operations on 10,000 randomized keys. Redka (SQLite): Redka (PostgreSQL): Redka for SQLite has been around for over a year, and I recently released a new version that also supports Postgres. If you like the idea of Redis with an SQL backend — feel free to try Redka in testing or (non-critical) production scenarios. See the nalgeon/redka repo for more details.

0 views
Fakeman Show 3 months ago

Lessons learned by building my own cookiecutter for REST APIs

During my university days, I avoided building CRUD apps, not because I couldn’t, but because they felt boring. To me, it was just a client (web or mobile) talking to a server that ran some SQL queries. I dodged them in every project I could. Fast forward a three years, after graduating and working at Oracle, I’ve realized that almost everything in software is just a fancy CRUD app. And you know what?

0 views
Robin Moffatt 3 months ago

Writing to Apache Iceberg on S3 using Kafka Connect with Glue catalog

Without wanting to mix my temperature metaphors, Iceberg is the new hawtness, and getting data into it from other places is a common task. I wrote previously about using Flink SQL to do this , and today I’m going to look at doing the same using Kafka Connect. Kafka Connect can send data to Iceberg from any Kafka topic. The source Kafka topic(s) can be populated by a Kafka Connect source connector (such as Debezium), or a regular application producing directly to it. I’m going to use AWS’s Glue Data Catalog, but the sink also works with other Iceberg catalogs. Kafka Connect is a framework for data integration, and is part of Apache Kafka. There is a rich ecosystem of connectors for getting data in and out of Kafka, and Kafka Connect itself provides a set of features that you’d expect for a resilient data integration platform, including scaling, schema handling, restarts, serialisation, and more. The Apache Iceberg connector for Kafka Connect was originally created by folk at Tabular and has subsequently been contributed to the Apache Iceberg project (via a brief stint on a Databricks repo following the Tabular acquisition).

2 views
baby steps 4 months ago

Using Rust to build Aurora DSQL

Just yesterday, AWS announced General Availability for a cool new service called Aurora DSQL – from the outside, it looks like a SQL database, but it is fully serverless, meaning that you never have to think about managing database instances, you pay for what you use, and it scales automatically and seamlessly. That’s cool, but what’s even cooler? It’s written 100% in Rust – and how it go to be that way turns out to be a pretty interesting story. If you’d like to read more about that, Marc Bowes and I have a guest post on Werner Vogel’s All Things Distributed blog . Besides telling a cool story of Rust adoption, I have an ulterior motive with this blog post. And it’s not advertising for AWS, even if they are my employer. Rather, what I’ve found at conferences is that people have no idea how much Rust is in use at AWS. People seem to have the impression that Rust is just used for a few utilities, or something. When I tell them that Rust is at the heart of many of services AWS customers use every day (S3, EC2, Lambda, etc), I can tell that they are re-estimating how practical it would be to use Rust themselves. So when I heard about Aurora DSQL and how it was developed, I knew this was a story I wanted to make public. Go take a look!

0 views
Jefferson Heard 5 months ago

Modeling data well

I talk often about how modeling data should be done later in the process. If you start a new project with a or or then you're putting the cart before the horse, and you're going to model data badly . But okay, you've accepted that or at least that it's my opinion. In fact, my steps are: But when it finally does become time to model data, how do you model it? The Ecology of Great Tech No spam. Unsubscribe anytime. You've gotten a start modeling everything else, so you've probably already spent some time thinking about this. That in fact is the point of putting data modeling this far back in the process. You need to decide on databases, whether you're adopting a streaming or other async strategy for processing mutations, whether the same tables (or databases) can be used for mutation and query, what indexes to use, and so forth. You can't do that without the context in which your data lives. When sitting down to model data, you want to know things like: You have finite time to spend modeling data. Spend it where there's going to be traffic and tough-to-meet requirements. When you have to steal time, steal it from infrequently used cases in your app, service, or microservice. Which queries are the ones that are most visible to the user? Which mutations are the ones that touch the most and are the most sensitive to consistency or timing? Focus on those and branch out. And yes, you need to model schema, attributes, and document fragments, but all of that is at your fingertips if you've already modeled commands, queries, subs, pubs, and schedules. There's a temptation to over-optimize for scale at the data modeling stage. Sure, you want 25M daily-active-users and 100 million queries an hour, but is that where you're really starting? (I'll grant that in some situations it is, but I implore you to ask yourself whether you're the exception). And yes, you want all your queries to return as quickly as possible, but there's a point of diminishing returns and there are queries that are only executed 100 times a day. Put in a data migration system first thing, always. You can hand-control it or automate it, but you have to have one. Except in extreme cases, it needs to effect migrations while the system is online. And while not every migration needs to be reversible, the migration system ought to have the concept. I like because of its flexibility but having one is more important than having a specific one. This is also a great place to use Architecture Decision Records (or ADRs) to talk about what kinds of scale you hope for / expect, what strategies you'd employ when you're approaching that scale, and what the process ought to migrate up a level. You don't have to get too detailed yet; "concepts of a plan" suffice at this stage. The point is that you've thought about the possibility that at some point for example you might have to go from a Postgres-based queue to RabbitMQ to Kafka, and you don't do anything now that prevents you from making that transition smoothly when the time has come. I'll note that these data modeling strategies are valid for SaaS companies and service architectures. There are other modes of data modeling out there with different concerns (in particular there are times where my position about foreign keys is not valid). But in more than a decade of SaaS work, these have served me well. I don't use sequential integer keys. I use uuid4 or uuid7 keys for the most part. The latter is sortable, with the sort key working out to a timestamp equivalent. There's a hard-won security lesson that predicated this: if your keys are obviously sequential, then technically-minded users interested in automating something involving your API will assume they can index through the list. The amount of CPU cycles or storage you save by fitting your keys in a 32-bit word is basically never worth the tradeoff. Go ahead and fight that holy war with me, and I'll point out the dollar and change per day you're saving on a really absurdly high-traffic table vs. the cost of mitigating just one easily avoidable data breach. I rarely use foreign key constraints. Github famously does not use foreign keys ever, anywhere. I don't entirely avoid them, but they've made me bring down a service and introduce downtime we didn't have to run a migration. They're fine most of the time, but there are relatively common cases where you have to alter the schema or update records that cause the constraint to lock things that you don't want locked. Instead I typically opt to put resilience in my business logic around cascading data and handling broken relationships. I use ARRAY and JSONB columns a lot, as well as GIN and GIST indexes. Your code to access and maintain data will never be less complex than the underlying data model. Because of this I avoid over-normalizing database models, period. If something's not going to be accessed as a top level object, and it doesn't push the constraints of either of those types, I will pick column storage knowing that if scale changes the calculus I can run a database migration to get to where I need to be. I never use a relational database table as a queue. This is one of those "never do anything that prevents you from scaling later," things. Having your queue be part of your database lets you do things you can't do in any other queuing system. You can access potentially all old queue records. You can join to other tables. And in the rush of crunch time and the brain fog of friday peer reviews, letting a junior coder get away with abusing the queue because it's "just a table" is easy to do and hard to fix. I just use redis or SQS to start and expand to something more robust if I get to the point that it's necessary. I add checkpointing and auditability in relatively early. They aren't typically the first things I add, but checkpointing and event-level (e.g. mutation or API-call level) auditing let me replay mutations on top of a checkpoint for forensic purposes and let me fix inconsistencies. I'm not concerned with the redundancy of writing the same data to two or more places. That is to say that if I'm accessing OpenSearch to provide a query I don't have a problem writing the whole record as a document in OpenSearch and returning that without first checking Postgres to make sure that it's still there. Obviously there are exceptions where consistency matters, but it's worth thinking about whether you have one of those situations. It's often worth creating tables or data stores that serve up data that's more compatible with the query at hand than the "base" data structure the truth is stored in. The benefits that you get with maintaining redundant data are: I do my best to abide by the principle " Choose Boring Technology ". For plain old data, I use Postgres with SQLAlchemy. They both give me tons of scaling options. Once you move to RDS or Aurora you can scale up pretty much as far as you'd ever need to. And while SQLAlchemy contains an ORM it is not an ORM. If I want composability or absolutely crazy queries, I get that without having to drop to SQL strings or templates. For document-structured data , I stay in SQL, but I use a JSONB column and various indexing strategies instead of over-normalized relational hierarchies. I typically validate document structured data with a JSONSchema or something like Pydantic that produces one. For providing full-text indexed search, I use OpenSearch. I should say that I have most of a PhD in text search and I still choose OpenSearch 99.9% of the time for this. It simply follows so many best practices and you can do so much to change how indexes are built that the number of cases where OpenSearch or ElasticSearch aren't the right tool are vanishingly small. I can safely say that unless you already have a ton of tooling around Postgres full-text search that you'll create a better user-experience faster and with fewer resources with OS/ES. For online migrations, I use Alembic. It will autogenerate changes off your SQLAlchemy ORM models, but you can also just tell it to create you a bare migration and you can do anything in that you can do in Python, like moving the contents of a Redis pubsub to a new RabbitMQ instance or loading all your profiles into AWS Cognito. And then you can keep using the autogenerated migrations after that. For regular gridded data-science, ML, or AI training data, I use Parquet. Yes there are other formats, but parquet is the most compatible by far. I can use DuckDB, Polars, Pandas, or R. I can move it and all the resultant processes into Snowflake or Databricks when I get to that point of scale. I can hand it to people who work in other languages or systems and they can pretty universally do something with it. It partitions well. It's ridiculously fast. For queueing, caching, cross-process data storage, and other set membership or hash-lookup cases, I use Redis. Again, it's the most flexible and broadly compatible thing out there. I almost suggested RabbitMQ for queuing but Redis does an appreciable job until you get to the scale where you want a streaming system like Kafka. For permissions I use AuthZed's open source SpiceDB. This is the most "non-boring technology" choice on this list, but after using it in a couple of projects I can't imagine going back. It's an open implementation of Google Zanzibar, which controls access to files and folders in Google Drive among other things. While I recognize that I don't provide much in the way of concrete examples in this post, I still think there's a lot of general advice that's worth giving around data modeling, and I'd love to answer specific questions from people about some of my advice or what technology to use for a certain use case ( use the contact form on my website ). TL;DR, though I leave you with three things: Model commands. Model queries. Model pubs, subs, and schedules. Model the environment / outside connections. ... Then model data. How many users, agents, and other autonomous processes are there going to be hitting your data through the commands and queries you've laid out? How long can a query take before a user notices the lag? How important is transactional consistency? Are there special requirements for queries like text search that suggest alternative databases to support them? Are there special reporting or auditing requirements for various models? Should changes on certain data be reversible? Do any of your commands and queries revolve around collaborative editing? Are there row-level (or document-level or whatever) permissions involved? Avoiding stovepipes made up of constraints and single-use indexes that slow down your transactions in the ground truth tables. The ability to populate read-only tables in an async or lazy fashion. The ability to use strategy-specific data stores like OpenSearch for accelerating queries while using battle tested transactional consistency from a Postgres or similar. It creates clear and complete permission models; It can handle ridiculous query frequencies before you need to scale up Migrating away from it or up to a cloud-based AuthZed solution in the event that you need to is pretty straightforward. Zanzibar is as battle-tested as it gets in the commercial space (classified govt systems are a separate class). Your code to access and maintain data will never be less complex than the underlying data model. Behavior (both machine and human) begets schema. Optimize for the ability to evolve to scale, rather than to scale you don't have.

0 views
Jefferson Heard 5 months ago

What I talk about when I talk about Technical Debt.

Communicating technical debt to people other than engineers is essential to getting work on that debt prioritized and valued alongside bugs and product roadmap work, and it’s not easy at all. One key quality distinguishing a good engineering leader from a great one is the ability to bring engineers and non-engineers to agreement about technical debt and its priority. In this article, I'll talk about how we've done that at Teamworks. The Ecology of Great Tech No spam. Unsubscribe anytime. For a long time we struggled with differing definitions of technical debt by different parts of the company and a lack of ability to communicate the urgency of tackling it. We arrived at this definition: This doesn’t attempt to provide  a taxonomy of technical debt . It doesn’t establish a framework to determine priority (I’ll talk about how we do it a bit later). But it does establish a hard line between what  is  and what  is not  technical debt and gets everyone on the same page. Ward Cunningham first used the debt metaphor to talk about code problems that weren’t exactly bugs but rather things that made the code harder to understand and modify. It’s a good metaphor. It describes these problems as having a cost associated with them. It provides for the idea that there’s a principal and an interest rate, even if it doesn’t define how you arrive at those things. Venture-backed startup companies generally share the characteristic that they spend money faster than they can make it to expand into new markets. This deficit spending is a conscious choice and makes long-term sense. Most importantly it tends to pervade every decision made about how to allocate capital in a company. That includes how that company allocates technical capital. A venture-backed software company  has  to build software faster than it can refine it. Getting into new markets and getting to market fit faster than competitors require lean experimentation alongside a codebase that’s  also  serving a well-developed base of paying customers who count on an agreed-upon service. This leads to conscious adoption of technical debt in the service of growing the company. In an investment-backed company, you need to strategically embrace technical debt . Just remember to understand it, document it, and budget for paying it down before it buries you. To get their renewal, someone promised an important customer who was already on the fence about renewing that tracking wearable data would be available by April 21. To make that happen, you had to cut some corners. A a new section of config has to be done for each user for the feature to work. Without a detailed update to the profile screen that walks someone through connecting their watch to the app, one of your engineers has to do it in SQL and by making API calls with the provider. The cost of the customer doing it themselves is $0 and throughput is basically instantaneous, but it'd require those screens to be built – including validation and failsafes and OAuth handshakes. The cost of someone in customer support doing it with a quick-and-dirty screen is, oh let's say $15/user for their time. But also the time that it takes for CS to do a rollout for a customer can't be allowed to be a drag on CSAT, so there may be additional opportunity costs if they're updating profiles en masse and letting other work pile up. You also have to consider the throughput time. Depending on their queuing strategy and time guarantees, the time from when a user realizes they need the new feature until they can use it goes from minutes to a day or so. The ongoing cost of an engineer doing this in SQL and API calls is: So now the user opens a support ticket to get on the new feature. Support sees it and forwards it along. Engineering uses their SDLC process to accomplish it or their 2nd-tier technical support process if you have a 2nd-tier support team. Counting all the handoffs, the cost is now in the hundreds of dollars per ticket if not pushing four figures. The throughput is now a day or more depending on how disciplined your engineering team is about customer-facing problems and it has impacts to your team's ability to push new features. The tickets interrupt people. Engineers can't be as adept about making context switches as other functions in your organization, so you'll lose more than the few minutes of active work it takes people to service the ticket. This is technical debt, and these are its associated costs. Again though, this is not a matter of bad vs. good. The above impact scenario has its place. I've done it, but when I did I knew full well what I was doing. The point is that when you take on tech debt, you're aware of its scope, you document the impacts, and you communicate the need to clean it up. Sometimes the most debtful scenario is fine long term because in actuality it amounts to a few tickets per quarter and it's not worth diverting the team to write a fully hardened, well designed screen that puts the setup in the user's hands. In his article on Technical Debt , Martin Fowler characterizes technical debt this way: This definition is a good one for software development in a vacuum, but it’s not all that useful in a company setting. The challenge with tech debt in a company is getting it on the docket when there are features to develop and territory to capture. In the enterprise, technical debt has impact well beyond engineering concerns. It includes: The impact of technical debt is the sum of these costs that are themselves the result of solvable technical issues, shortcuts to market (like mechanical turking), and costly adaptations (hacks). Too much debt can drag on a company’s KPIs across the board. A pragmatic approach to planning and accounting for technical debt on the other hand allows you to achieve things in a timeframe you couldn’t otherwise. All of these costs, importantly, are quantifiable. You can calculate the increased cost of customer support. You can calculate the cost of churn due to low customer satisfaction. You can calculate the cost of your R&D department having to re-engineer and bootstrap a project to work around the problems from poorly maintained production software. And since you can quantify that cost, you can communicate it to the CEO, COO, and CFO. The important part of communicating to the non-technical parts of the organization is quantification. If you can make a spreadsheet or a graph of it and relate it to ARR, you can relay debt in terms that are meaningful to everyone who isn't an engineer. I say cost and not other metrics because cost is always meaningful. There's no way to make it a vanity metric . No-one cares that you can improved garbage collection times by 50%. Everyone cares that you can eliminate $125,000 a quarter from cloud costs with one month of work. In the above scenario, the cost of missing your engineering deadline is losing the renewal to churn. To the company that's, say $850,000 in ARR. It's also the cost of missing quarterly earnings numbers, dropping Gross Retention, and so on. So as long as the cost to engineering and CS, etc is less than that number, taking on the technical debt and maintaining it without fixing it is worthwhile. When you tell engineers why they're not just delaying the feature by another sprint or two, this is what you tell them. When you tell folks handling renewals why this feature is being prioritized over others, making their negotiations harder, this is the calculus you give them. There's a number. The math works out. Yes, there are impacts to this, like engineers slowing down on feature development to handle support tickets, and like CS being forced to be really precise and check their work on a new screen that makes them handle the rollout of hundreds of users person by person. But in a controlled timeline, it works. And when you tell everyone that you're delaying something else by a couple of sprints to get the screen in, even after 95% of existing users have been migrated onto it, you point out the ongoing cost to everyone of running the above processes to add all the users every time sales signs a new customer. And thus your technical debt gets cleaned up. In my experience, the  principal  of the tech debt is the cost of what it will take to provide a solution that eliminates the negative impacts. The  interest  is the toil and drag across the organization involved in using and maintaining the debt-financed solution and the growth of that toil over time as other code and company process has to work around or incorporate the tech-debt in order to get its job done. Take for example a function that lets you create and update a form, but there’s nothing self-service to delete it; imagine that deleting and cascading was more complicated, so you needed longer to consider how to do it right. Not having the deletion is the principal. The interest on that principal consists of the time and cost of every ticket your DBA had to take care of manually in SQL in order to delete a customer’s form. It's the number of times you had to restore a backup or otherwise fix the database when the DBA made a mistake. It’s also the cost to the company’s reputation of all the times that it took longer than its customers felt was reasonable to delete that form, or they were impacted by mistakes. When prioritizing tech-debt in the backlog, it helps to describe consequences to be mitigated instead of the solutions you plan to use to mitigate. Imagine a system where you’re going from a single-instance cache to a highly-available, scalable cache. Your description of the work could be “Switch web caching from our managed redis to a managed ElastiCache,” That communicates nothing about the why, and in a backlog of 1000 tickets the title tells the product owner nothing about how to prioritize that vs. everything else. A better ticket would be titled, “Cache misses are causing users to complain about slow performance at peak times.” There is an important distinction between doing work that  anticipates  change vs. work in  reaction  to it. Work that is done as a reaction to change is often paying down principal or interest on technical debt. Work that’s done in anticipation of change expands our overall technical capital. To illustrate the difference, consider a mobile/web shared code project. At some point in the past we began using React on web as well as React Native to build our application. In the beginning, there was very little shared code, but we  anticipated  that much of the code between web and mobile would be shared in the future. If we had taken that foreknowledge and applied it then to solving the problem of “how to share code between web and mobile”, prioritized, and scheduled that work, that work would not have been “technical debt.” Why? We realized our code repo was inadequate to future needs. Why is that not Technical Debt? The answer to that is also the answer to the question “Was there  realized  impact or did we get ahead of it?” It’s the difference between being forced to react vs. having the advantage of the situation. If we had done it then, we could have done it without also being impacted by the negative consequences that came with waiting too long to address it. Instead we didn’t plan the work ahead and we had to build a shared-code solution while also experiencing development drag from engineers manually keeping shared code in sync. Prioritize proactive work by thinking about the technical debt you take on if it’s ignored. This bucket of work is for experimentation and work that has the potential for positive disruption. It’s a bucket for work where the engineering department can be the force for innovation. Think “labs.” If when you crafted your story, you thought “Things are pretty good, but I think they could be way better.” then you have an engineering priority. Prerequisite and requisite work is the work that should be done before building or revising a feature, or should be done in order to make the feature complete. This is often the work needed to make new development conform to engineering standards of quality, testability, and performance. Examples of this include: Sometimes prerequisite work can be skipped and a feature can still be shipped, but it will be more costly to maintain and modify than a fully complete project. This causes technical debt to incur and thus the priority of the work can be based on that impact. The difference between a bug and an item of technical debt is obvious most of the time. The distinction is blurry when the bug doesn’t affect the correctness of output, only some aspect of importance to engineering or operations. In some cases, the distinction may be down to urgency or whether treating it as technical debt can bring a single item into the context of a wider cleanup push. The key concepts are Urgency and Impact. Another key activity for grooming technical debt, however, is contextualization. This is the planning activity of organizing technical debt into well-scoped refactoring plans, epics, and the collapsing of closely related stories. This makes it so that we can tackle more technical debt than we could grabbing a few stories off the stack. Teams should groom technical debt carefully and where possible create proactive solutions like refactors vs. playing “whack-a-mole” with issues that haven’t changed since they were initially reported. Too often, "technical debt" is a meaningless phrase in a company setting. Making it meaningful is about showing the wider impact technical debt has to the organization. Everyone is impacted by technical debt, and so everyone has to collaborate on fixing it, whether that's writing code, adjusting timelines, smoothing over bumps with customers, or yielding budget dollars to help with the paydown. To achieve that kind of collaboration, though, you have to become a great communicator of technical debt to technical people and non-technical people alike. By characterizing the debt in terms of cost, the pay-down in terms of impact, and making conscientous choices about tech debt to take on, you will manage your company's technical debt balance effectively and not let it compound until it stalls growth. The Ecology of Great Tech No spam. Unsubscribe anytime. CS writes the support ticket and puts it on the engineering queue to be prioritized An engineer stops work on features (possibly even the screen that cures this tech debt) and modifies the SQL template for a specific ticket, costing them a few minutes to an hour (cheap) and a context switch (expensive) Another engineer spends their peer review time making sure the SQL is correct Someone with permissions to execute SQL against the production database and API calls against the production vendor account runs it. The cost of providing adequate customer support. Cost of providing performant and reliable software. Cost of continued scaling of the customer base. Cost of ensuring regulatory compliance and security. Throughput of individual support requests and their impact on customer relationships and retention. Cost of hiring engineers who can make system modifications reliably in bounded time. Ability to execute on high- to medium- priority items in a product roadmap in a sane amount of time. The impact of customer frustration from user “toil” and confusion necessitated by engineering around existing behavior i.e. “You have to have  this  permission and go to  that  strangely named screen, and then do your task in 8 click-and-waits because that’s the only way we could build it.” Providing self-service admin functionality. Refactoring code somewhere else in the stack that is common to the new / revised feature, so that it can be shared between the old and new. Bulk uploads. Settings screens.

0 views
Marc Brooker 8 months ago

What Fekete's Anomaly Can Teach Us About Isolation

Is it just fancy write skew? In the first draft of yesterday’s post , the example I used was one that showed Fekete’s anomaly. After drafting, I realized the example distracted too much from the story. But there’s still something I want to say about the anomaly, and so now we’re here. What is Fekete’s anomaly? It’s an example of a snapshot isolation behavior first described in Fekete, O’Neil, and O’Neil’s paper A Read-Only Transaction Anomaly Under Snapshot Isolation . The first time I read about it, I found it spooky. As in five stages of grief spooky. But the more I’ve thought about it, the more I think it’s just a great teaching example. To understand the anomaly, let’s talk about two people. We’ll call the Pat and Betty . Pat and Betty share a pair of bank accounts - a savings account and a current account. They bank at Alan’s bank, which charges a $1 overdraft fee any time a withdrawal will reduce the total value of their accounts below $0. One day, Pat and Betty are running separate errands. Pat goes to the ATM, checks the savings balance and sees $0, then deposits $20. After completing his transaction, Pat comes back to the ATM, checks their balance, and sees $20 in savings, and $0 in current. Around the same time, Betty goes to the ATM and withdraws $10 from the current account. Checking their account later, they notice a balance of -$11 in the current account, and $20 in savings. But that’s impossible! Pat saw $0 and $20, so Alan’s bank shouldn’t have charged them that $1. Did they get ripped off? Let’s tell the same story again, in SQL. Starting with some setup: Then we get to the anomaly itself, showing Pat’s two transactions ( and ), and Betty’s one ( ): Under snapshot isolation (SI), and other some other implementations of weaker-than-serializable isolation levels, this SQL can run as-is. Under serializable isolation, at least one of these transactions would be rejected (standard Postgres would reject the commit of ). What’s interesting about this anomaly is that, if it wasn’t for , we could simply say that happens before and Alan’s Bank’s behavior was justified. But, by doing a read-only transaction, caught them in a weak isolation anomaly. But why, and what can we learn from this? In a Picture Before we answer that question, let’s draw a picture of the transactions and the database state: The interesting part, fittingly, is the bit marked the interesting part is here : the decision whether to commit . Why does commit under snapshot isolation? We can answer that in three ways: The OCC view: is allowed to commit under SI, because it has no write-write conflicts with the transactions that committed between it’s start and end. ’s write set is , ’s is , and ’s is . No conflict there. would not be allowed to commit under serializability because of read-write conflict with : ’s read set is which intersects with ’s write set . The 2PL MVCC view: Under SI, and read from different MVCC snapshots, and the write lock taken by on row doesn’t conflict with the write lock taken by on row . The theory view: To quote from my favorite transaction theory paper, Crooks et al’s Seeing is Believing : Like serializability, SI prevents transactions T from seeing the effects of concurrently running transactions. The commit test enforces this requirement by having all operations in \(T\) read from the same state \(s\), produced by a transaction that precedes \(T\) … You can see that in the diagram: reads the results from our setup transaction , which directly precedes it in the history. However, SI no longer insists on that state \(s\) being \(T\)’s parent state \(s_p\): other transactions, whose operations T will not observe, may commit in between \(s\) and \(s_p\). Here, \(s_p\) is the state that applies its changes to, which isn’t the same state as the one it reads. The commit test only forbids \(T\) from modifying any of the keys that changed value as the system’s state progressed from \(s\) to \(s_p\). Which it hasn’t: only row has changed, and only needs to change . But What Does It Mean? Fekete’s anomaly sure is weird: by introducing a third read-only transaction, we get the database to show an anomalous behavior that would otherwise appear serializable. On the other hand, it also seems like a relatively straightforward case of constraint violation caused by write skew. To quote A Critique of ANSI SQL Isolation Levels Transactions must preserve the constraint predicate to maintain consistency: if the database is consistent when the transaction starts, the database will be consistent when the transaction commits. If a transaction reads a database state that violates the constraint predicate, then the transaction suffers from a constraint violation concurrency anomaly. From the perspective of the database system builder it’s a direct consequence of what we wrote about yesterday : SI allows a database (single-system or distributed) to avoid the coordination (inter-machine, inter-process, or inter-thread) necessary to detect read-write conflicts. Because that coordination scales with reads, and reads tend to be more common than writes in many DB workloads, this can be a big win. What about the perspective of the application builder? The Application Builder’s View The application builder needs to answer two questions about this anomaly (and write skew and constraint violation ) more generally: As much as it makes DB-heads uncomfortable, evidence shows that many (if not most) application builders have concluded that the answer to question 1 is no . This isn’t an unreasonable answer. But let’s assume they do care, what do they do? One option is to choose a serializable database and use its serializable isolation level. This can work for some workloads, but certainly not all. Serializability comes with significant performance and concurrency implications. But there are some more surgical fixes. For example, in Aurora DSQL’s snapshot mode you can use 1 . This works because the theoretical model of isolation levels doesn’t map to special SQL features like in a super clean way, but roughly they are ways to increase isolation for some transactions. The second way is to force the write-write conflict. Overall, Fekete’s update isn’t something to be spooked about, but is an interesting example of the trade-off between serializability and weaker isolation levels. The anomaly is a direct result of the reduced coordination needed for SI: the very same reduced coordination that brings significant performance and scalability.

0 views
Marc Brooker 8 months ago

Versioning versus Coordination

Spoiler: Versioning Wins. Today, we’re going to build a little database system. For availability, latency, and scalability, we’re going to divide our data into multiple shards, have multiple replicas of each shard, and allow multiple concurrent queries. As a block diagram, it’s going to look something like this: Next, borrowing heavily from Hermitage , we’re going to run some SQL. So far so good. We’ve inserted three rows into our database. Next, we’re going to run two concurrent transactions (from two different connections, call them and ), like so: There’s only one valid serializable 1 ordering of these transactions: at line , has seen the world before commits, and so must see that same pre- world until it commits. Therefore must happen before in the serial order. How might we implement this requirement in our distributed architecture? We could use locking: takes a shared lock on at , blocks on it when trying to get an exclusive lock to update the row, and can complete. There are two practical problems with this approach. First, we’re blocking a writer on a reader, which reduces concurrency and throughput. Second, specific to our distributed architecture, needs to take its lock in a single place where needs to look for it. With multiple replicas, where this single place is is not obvious. That can be solved by choosing a primary replica, implementing a single lock manager, or by locking on all replicas 2 . In all three cases, read scalability is lost and read coordination is required. Enter David P. Reed’s 1979 work on versions . Instead of making its desired changes in-place, it creates a new version of the rows, that only becomes visible to transactions that start after commits. , which started earlier, does not see these new versions. The storage layer needs to provide a way to request its reads as of a particular version, which it does by storing multiple copies of the data. The effect this has on the coordination in our database is significant: never has to block on . In fact, doesn’t even need to know that exists at all. could be off in the corner, doing its reads happily against one of a million data replicas, and is none the wiser. This helps scalability ( avoiding coordination is key to scalability ), but also helps throughput (writers never have to wait for readers, readers never have to wait for writers, readers never have to wait for readers), and performance consistency (no waiting means other transactions can’t slow you down). Since the early 1980s, multi-versioning has been a core technique in the implementation of database systems, but it’s role in avoiding coordination in distributed systems is less well-known. The reason multi-versioning is so powerful is because it allows the system to have an extra piece of information (when was this data created?) about the data that it doesn’t need to discover from coordination patterns. As Reed wrote in 1979: Since [versions] are objects that are used by programs, they give a tool for programming that allows explicit recognition of consistent states within the program. In contrast, traditional synchronization mechanisms, such as semaphores, locking, monitors, send-receive, etc. do not give a tool for representing or naming consistent states – one can deduce the states assumed by the system by timing relationships among the execution of steps of programs. Versions are the difference between knowing consistent states and having to deduce consistent states! That’s a powerful idea. Picking A Version, Serving A Version Above, I mentioned that requests it’s reads as-of a particular version. This raises two questions: how to pick the version, and how the storage engine keeps track of all the versions. How to pick a version depends a lot on the properties you want. Serializability, in one common definition, would allow read-only transactions to pick almost any version (including the beginning of time , returning empty results for all reads). This definition is silly. Let’s go back to SQL to think about the results we want: Here, lines and are doing the same thing as in our first snippet, but we’ve introduced a third transaction . At line , we’re showing what most programmers would expect: a new transaction that starts after commits sees the results of ’s writes. This goal is, informally, called read-after-write consistency (generally considered a type of strong consistency ) 3 . There are many ways to achieve this goal. One would be to have a version authority that hands out transaction version numbers in a strict order - but this re-introduces the exact coordination we were trying to avoid! In Aurora DSQL , we pick this time using a physical clock (EC2’s microsecond-accurate time-sync service ). This allows us to avoid all coordination between readers, including reads inside read-write transactions (e.g. notice for ’s has to be a read-modify-write to find the new for each row). The fundamental idea of using physical time this way dates back to the late 1970s, although mostly with folks acknowledging the difficulty of the synchronization problem. Somewhat amusingly, Reed says: Synchronizing the system clocks whenever they come up by using the operator’s watch will usually get the system time accurate within a few minutes before going on to note that Lamport clocks allow the system to do better. The 1970s consensus seems to be that adequate physical clock synchronization wasn’t possible - today it’s easily available in the cloud. Keeping Track of All Those Versions! The next question is how to keep track of all those versions. This is a deep question of its own, with tons of interesting trade-offs and different approaches. I won’t dive into those here, but instead take a different tack. Let’s isolate from our last example: which we can then rewrite as: Similarly, would be re-written as an at a new version number. I don’t know of any database system that’s implemented this way, but it’s a good illustration which bring us to two invariants we need to maintain around versions: In other words, we must retain the last version (or we lose durability), and we must retain at least until is done. The former property is a local one, that can be implemented by each replica with no coordination. The latter is a distributed one, which brings us back to our theme of coordination. Again, we could clearly solve this problem with coordination: register each running transaction in a list, unregister it on commit, keep track of the low-water-mark timestamp. That’s possible to build (and even scale arbitrarily), but it’s nice to avoid that coordination. In Aurora DSQL we avoid that coordination in a simple way: transactions are limited in time (five minutes in the current preview release), and versions are tied to physical time. This turns invariant 2 into a local property too, once again avoiding coordination 4 . In distributed database systems, versioning and physical clocks allow coordination to be avoided in nearly all read cases. This is an extremely powerful tool, because avoiding coordination improves throughput and scalability, reduces latency and cost, helps availability, and simplifies the design of systems.

0 views
Emil Privér 9 months ago

It's Time to Move on From Nosql

There are a few things and directions we developers have taken that I think were probably not good things, such as NoSQL or thinking that edge runtime will make our websites faster. Back in 2019 and probably even earlier, many developers looked into Lambda functions because they solved the problems of unexpected traffic spikes for our system - and don’t get me wrong, they probably did. But they didn’t solve the problem of our wallets not printing money on demand, but that’s a different story. When we started to use Lambda functions even more, we realized that if we use Lambda functions which we don’t have a lot of control over, we also need to have a database which can scale up automatically when our traffic scales up too. At the time, the traditional way to use a database was to spin up a MariaDB, MySQL, or Postgres database and set up a connection between the application and the database. But the problem when you use Lambda and a normal relational database is that it’s hard to control how many connections you spin up. It’s possible for the Lambda scheduler to spin up 100 functions which all need connections to the database, and we might not have control over when these connections spin up, so the database gets too much work and probably dies. When we realized this, we probably also realized that we need something which can handle unexpected connections, which became NoSQL. NoSQL worked as an alternative due to its design. NoSQL is essentially an object-storage system that comes with an interface so we can more easily query the data, but it has no rules or structure for the data. It simply stores it and lets you do basic operations on the data such as where, sum, and count. But there is one trade-off - due to its design with files in object storage, you are now forced to do many queries instead of one to fetch all the data you need, and you pay per request you make, and there is no way to control which data you get back; you simply fetch all of the data. The idea of spinning up a database without requirements on our data to use as our main database is, for me, an interesting idea and a weird move we do. Scale in software engineering refers to a system’s ability to handle growing demands while maintaining performance and reliability. If you look at NoSQL databases, they indeed handle the increased amount of traffic for services very well; however, there is more to scale than traffic. I also often see the ability to grow over time in terms of database size and how easy it is to maintain that data as part of scaling, and this is where NoSQL doesn’t scale. I generally believe that in programming, working more strictly makes it easier to maintain software over time, such as using a strictly typed language like Rust. A strictly designed relational database can grow more easily with you over time because you design a schema which your data follows. This is not the case with NoSQL because there is no such thing as a schema in a NoSQL database; sure, the application might have a schema, but not the database. If your product manager comes to you and tells you that the requirements have changed for the application, it can be really hard to achieve a good change with a NoSQL database because you might have a big migration job in front of you. If you simply change a type in the application and you don’t change the data for all documents, it could mean that when you’re requesting a type of string and you get an int, and this can create issues. If you decide to make a big migration job, you will probably need to pull a lot of data and then write it again, which can be pricey, compared to a SQL database where you can change the schema and tell the database how to make a migration if it’s needed. And this is my biggest problem with NoSQL: the amount of work you need to do if you want to maintain a NoSQL database over a period of time and keep it in a good state, especially when the amount of data you store increases. A NoSQL database might solve your needs, but the problem isn’t when things are working fine - it’s when things are not working fine, as it’s much easier to break a good state of a system where there are no rules to follow. Another thing with NoSQL is that we can’t treat a NoSQL database as a relational database, which is obvious. But what I mean is that it will be more expensive if you treat your NoSQL database as a relational database by splitting up the collections (tables) into many and then making many queries to get all the data you need. This is why you need to think of how you build your data model so it’s more easier to fetch data. For instance, if you want to build an API which shows products and want the category information to be on a single product’s model, it is probably better to instead of updating a single category in the category collection, update all products with that category as well because it reduces the amount of documents we need to read in order to get the data we need. However, it’s a massive job to update all these products if we want to update the value for products which have that category. And this is one of the biggest differences I realized when I was working with NoSQL: with NoSQL, you need to prepare the data before you make the request to keep it in a good state and prevent unnecessary costs for your database. Now this question is probably tied a lot to preference in how we work, but I think that from the beginning we shouldn’t have used NoSQL for the cause of handling traffic; I think the problem has always been with the tech stack. But I definitely also think we shouldn’t use NoSQL as our main DB. Either we should have used it as a persistent cache to our relational DB for the service which we expect to have heavy load, or we should have treated it as an alternative database to our relational database when we can’t describe how to store the data so we simply just dump it somewhere, and then a data engineer or someone could use that “dumped” data to do their job. But I also think we should have looked into other options which could be good candidates in the serverless lambda functions world, such as LibSQL. LibSQL is a SQLite fork by the company named Turso. It gives us an HTTP interface we can use to query the data from a lambda function. Something I didn’t mention earlier is that when SSDs improved even more, NoSQL databases also improved in terms of speed, as did LibSQL. LibSQL allows us to achieve the same type of speed and scale as NoSQL but it also comes with requirements on our data.

0 views
Marc Brooker 10 months ago

Snapshot Isolation vs Serializability

Under serializability, once is committed, can only commit if $R_2 \cap W_1 = \emptyset$ (or, more generally, the set of all writes accepted between and does not intersect with 1 ). Under snapshot isolation, once is committed, can only commit if $W_2 \cap W_1 = \emptyset$ (or, more generally, the set of all writes accepted between and does not intersect with ). Notice how similar those two statements are: one about $R_2 \cap W_1$ and one about $W_2 \cap W_1$. That’s the only real difference in the rules. But it’s a crucial difference. It’s a crucial difference because of one of the cool and powerful things that SQL databases make easy: s. You can grow a transaction’s write set with and and friends, but most OLTP applications don’t tend to. You can grow a transaction’s read set with any , and many applications do that. If you don’t believe me, go look at the ratio between predicate (i.e. not exact PK equality) s in your code base versus predicate s and s. If the ratios are even close, you’re a little unusual. Concurrency versus Isolation This is where we enter a world of trade-offs 3 : avoiding SI’s write skew requires the database to abort (or, sometimes, just block) transactions based on what they read . That’s true for OCC, for PCC, or for nearly any scheme you can devise. To get good performance (and scalability) in the face of concurrency, applications using serializable isolation need to be extremely careful about reads . The trouble is that isolation primarily exists to simplify the lives of application programmers, and make it so they don’t have to deal with concurrency. SQL-like isolation models do that quite effectively, and are (in my opinion) one of the best ideas in the history of computing. But as we move up the isolation levels to serializability, we start pushing more complexity onto the application programmer by forcing them to worry more about concurrency from a performance and throughput perspective. This is the cause of my belief that snapshot isolation, combined with strong consistency, is the right default for most applications and most teams of application programmers: it provides a useful minimum in the sum of worries about anomalies and performance. Fundamentally, it does that by observing that write sets are smaller than read sets, for the majority of OLTP applications (often MUCH smaller). How Low Can We Go? How Low Should We Go? That raises the question of whether its worth going even lower in the isolation spectrum. I don’t have as crisp a general answer, but in Aurora DSQL’s case the answer is, mostly no . Reducing the isolation level from SI to does not save any distributed coordination, because of the use of physical clocks and MVCC to provide a consistent read snapshot to transactions without coordination. It does save some local coordination on each storage replica, and the implementation of multiversioning in storage, but our measurements indicate those are minimal. If you’d like to learn more about that architecture, here’s me talking about it at re:Invent 2024: Optimistic versus Pessimistic Concurrency Control What I said above about SI is, from my perspective, equally true in both OCC and PCC designs, when combined with MVCC to form read snapshots and timestamps to choose a read snapshot. In both cases, the only coordination strictly required for SI is to look for write-write conflicts ($W_2 \cap W_1$) that occured between and , and to choose a commit timestamp based on that detection. The big advantage OCC has in avoiding coordination has to do with how those write-write conflicts are detected. In backward validation OCC, the only state that is needed to decide whether to commit transaction $T$ is from already committed transactions . This means that all state in the commit protocol is transient, and can be reconstructed from the log of committed transactions, without causing any false aborts. This is a significant benefit! The other benefit of OCC, as I covered in my first post on DSQL is that we can avoid all coordination during the process of executing a transaction, and only coordinate at time. This is a huge advantage when coordination latency is considerable, like in the multi-region setting. As with SI versus serializability, the story of trade-offs between optimistic and pessimistic approaches is a long one. But, for similar reasons, I think OCC (when combined with MVCC) is the right choice for most transactional workloads.

0 views
Marc Brooker 10 months ago

DSQL Vignette: Reads and Compute

Another key benefit of this coordination-free approach is that we can send reads to the nearest read replica (in the same region, and generally AZ) to reduce cost and latency. Reads never have to go to a leader or a primary to be sequenced or have their lock state maintained, simply because they don’t have any lock state. This is true in read-only transactions, read-write transactions, and even for the reads triggered by writes (e.g. is a read-modify-write). Avoiding Caching and Coherence Aurora DSQL uses a logical interface to storage. The QP doesn’t ask for pages, it asks for rows. Knowing the logical structure of the data it holds allows DSQL’s storage to offer quite a high-level interface to the QP: the QP can ask storage to do work like filtering , aggregation , projection , and other common tasks on its behalf. Unlike SQL designs that build on K/V stores, this allows to DSQL to do much of the heavy lifting of filtering and finding data right next to the data itself, on the storage replicas, without sacrificing scalability of storage or compute. This, in turn, allows us to avoid the scalability bottleneck of having to have a large, coherent, cache 8 on-box with SQL execution. In-AZ (or closer) networking, combined with carefully-designed protocols and the ability to push chatty work down, keeps storage fast without the need to cache. We still cache some low-write-rate information (like the list of tables and their definitions). You can see this in action with : Here, the index-only scan on the primary key index on this table (Aurora DSQL tables are index organized) is pushed down to storage, along with the projection of the selected columns. This significantly reduces the number of round-trips between the QP and storage system, with a great impact on performance. Pushing operations down to storage is a good bet for another reason: Latency Lags Bandwidth . Networks have gotten a lot faster over the last couple decades, but the rate of change of latency has been much slower than the rate of change of bandwidth (partially, this just has to do with speed-of-light limitations). This has been true over multiple decades, and looks set to continue for decades more. That trend means that pushdown, which moves operations close to the storage devices themselves and removes a lot of round-trips, is a good bet for the long-term. The Big Picture The overall approach here is disaggregation : we’ve taken each of the critical components of an OLTP database and made it a dedicated service. Each of those services is independently horizontally scalable, most of them are shared-nothing, and each can make the design choices that is most optimal in its domain. This approach is enabled by the extremely fast and reliable networking available in modern data centers, and by designing each component as part of the overall architecture. Tomorrow we’ll go into the write path, which will reveal how the whole picture comes together.

0 views
Marc Brooker 10 months ago

DSQL Vignette: Aurora DSQL, and A Personal Story

It's happening. In this morning’s re:Invent keynote, Matt Garman announced Aurora DSQL. We’re all excited, and some extremely excited, to have this preview release in customers’ hands. Over the next few days, I’m going to be writing a few posts about what DSQL is, how it works, and how to make the best use of it. This post is going to look at the product itself, and a little bit of a personal story. The official AWS documentation for Aurora DSQL is a great place to start to understand what DSQL is and how to use it. What is Aurora DSQL? Aurora DSQL is a new serverless SQL database, optimized for transaction processing, and designed for the cloud. DSQL is designed to scale up and down to serve workloads of nearly any size, from your hobby project to your largest enterprise application. All the SQL stuff you expect is there: transactions, schemas, indexes, joins, and so on, all with strong consistency and isolation 5 . DSQL offers active-active multi-writer capabilities in multiple availability zones (AZs) in a single region, or across multiple regions. Reads and writes, even in read-write transactions, are fast and local, requiring no cross-region communication (or cross-AZ communication in single region setups). Transaction commit goes across regions (for multi-region setups) or AZs (for single-regions setups), ensuring that your transactions are durable, isolated, and atomic. DSQL is PostgreSQL compatible, offering a subset of PostgreSQL’s (huge) SQL feature set. You can connect with your favorite PostgreSQL client (even the cli), use your favorite ORMs and frameworks, etc. We’ll be adding more PostgreSQL-compatible features over time, making it easy to bring your existing code to DSQL. DSQL is serverless. Here, we mean that you create a cluster in the AWS console (or API or CLI), and that cluster will include an endpoint. You connect your PostgreSQL client to that endpoint. That’s all you have to do: management, scalability, patching, fault tolerance, durability, etc are all built right in. You never have to worry about infrastructure. As we launch Aurora DSQL, we’re talking a lot about multi-region active-active, but that’s not the only thing its for. We built DSQL to be a great choice for single-region applications of all sizes - from a few requests per day to thousands a second and beyond. A Personal Story In 2020 I was working on serverless compute at AWS, spending most of my time with the great AWS Lambda team 1 . As always, I spent a lot of time talking to customers, and realized that I was hearing two consistent things from serverless and container customers: Existing relational database offerings weren’t a great fit for the fast-moving scalable world of serverless and containers. These customers loved relational databases and SQL, for all the reasons folks have loved relational for forty years, but felt a lot of friction between the needs of serverless compute and existing relational products. Amazon RDS Proxy helped with some of this friction, but it wasn’t going away. Large, highly-regulated, AWS customers with global businesses were building applications across multiple AWS regions, but running into a tricky architectural trade-off. They could pick multi-region active-active (with DynamoDB Global Tables, for example), but lose out on SQL, ACID, and strong cross-region consistency. Or they could choose active-standby (with Aurora Global Database, for example), but lose the peace of mind of having their application actively running in multiple places, and the ability to serve strongly consistent data to customers from their closest region. These customers wanted both things. At the same time, a few pieces of technology were coming together. One was a set of new virtualization capabilities, including Caspian (which can dynamically and securely scale the resources allocated to a virtual machine up and down), Firecracker 3 (a lightweight VMM for fast-scaling applications), and the VM snapshotting technology we were using to build Lambda Snapstart . We used Caspian to build Aurora Serverless V2 2 , bringing a vertical auto scaling to Aurora’s full feature set. The second was EC2 time sync , which brings microsecond-accurate time to EC2 instances around the globe. High-quality physical time is hugely useful for all kinds of distributed system problems . Most interestingly, it unlocks ways to avoid coordination within distributed systems, offering better scalability and better performance. The new horizontal sharding capability for Aurora Postgres, Aurora Limitless Database , uses these clocks to make cross-shard transactions more efficient. The third was Journal, the distributed transaction log we’d used to build critical parts of multiple AWS services (such as MemoryDB , the Valkey compatible durable in-memory database 4 ). Having a reliable, proven, primitive that offers atomicity, durability, and replication between both availability zones and regions simplifies a lot of things about building a database system (after all, A tomicity and D urability are half of ACID). The fourth was AWS’s strong formal methods and automated reasoning tool set . Formal methods allow us to explore the space of design and implementation choices quickly, and also helps us build reliable and dependable distributed system implementations 6 . Distributed databases, and especially fast distributed transactions, are a famously hard design problem, with tons of interesting trade-offs, lots of subtle traps, and a need for a strong correctness argument. Formal methods allowed us to move faster and think bigger about what we wanted to build. Finally, AWS has been building big cloud systems for a long time ( S3 is turning 19 next year! , can you believe it?), and we have a ton of experience. Along with that experience is an incredible pool of talented engineers, scientists, and leaders who know how to build and operate things. If there’s one thing that’s AWS’s real secret sauce, it’s that our engineers and leaders are close to the day-to-day operation of our services 7 , bringing a constant flow of real-world lessons of how to improve our existing services and build better new ones. The combination of all of these things made it the right time to think big about building a new distributed relational database. We knew we wanted to solve some really hard problems on behalf of our customers, and we were starting to see how to solve them. So, in 2021 I started spending a lot more time with the databases teams at AWS, including the incredibly talented teams behind Aurora and QLDB. We built a team to go do something audacious: build a new distributed database system, with SQL and ACID, global active-active, scalability both up and down (with independent scaling of compute, reads, writes, and storage), PostgreSQL compatibility, and a serverless operational model. I’m proud of the incredibly talented group of people that built this, and can’t wait to see how our customers use it. One Big Thing There are a lot of interesting benefits to the approach we’ve taken with DSQL, but there’s one I’m particularly excited about (the same one Matt highlighted in the keynote): the way that latency scales with the number of statements in a transaction. For cross-region active-active, latency is all about round-trip times. Even if you’re 20ms away from the quorum of regions, making a round trip (such as to a lock server) on every statement really hurts latency. In DSQL local in-region reads are as fast as 1.2ms, so 20ms on top of that would really hurt. From the beginning, we took avoiding this as a key design goal for our transaction protocol, and have achieved our goals. In Aurora DSQL, you only incur additional cross-region latency on , not for each individual , , or in your transaction (from any of the endpoints in an active-active setup). That’s important, because even in the relatively simple world of OLTP, having 10s or even 100s of statements in a transaction is common. It’s only when you (and then only when you a read-write transaction) that you incur cross-region latency. Read-only transactions, and read-only autocommit s are always in-region and fast (and strongly consistent and isolated). In designing DSQL, we wanted to make sure that developers can take advantage of the full power of transactions, and the full power of SQL. Later this week I’ll share more about how that works under the covers. The goal was to simplify the work of developers and architects, and make it easier to build reliable, scalable, systems in the cloud. A Few Other Things In Aurora DSQL, we’ve chosen to offer strong consistency and snapshot isolation . Having observed teams at Amazon build systems for over twenty years, we’ve found that application programmers find dealing with eventual consistency difficult, and exposing eventual consistency by default leads to application bugs. Eventual consistency absolutely does have its place in distributed systems 8 , but strong consistency is a good default. We’ve designed DSQL for strongly consistent in-region (and in-AZ) reads, giving many applications strong consistency with few trade-offs. We’ve also picked snapshot isolation by default. We believe that snapshot isolation 9 is, in distributed databases, a sweet spot that offers both a high level of isolation and few performance surprises. Again, our goal here is to simplify the lives of operators and application programmers. Higher isolation levels push a lot of performance tuning complexity onto the application programmer, and lower levels tend to be hard to reason about. As we talk more about DSQL’s architecture, we’ll get into how we built for snapshot isolation from the ground up. Picking a serverless operational model, and PostgreSQL compatibility, was also based on our goal of simplifying the work of operators and builders. Tons of folks know (and love) Postgres already, and we didn’t want them to have to learn something new. For many applications, moving to Aurora DSQL is as simple as changing a few connection-time lines. Other applications may need larger changes, but we’ll be working to reduce and simplify that work over time.

0 views