Latest Posts (20 found)

Mutable atomic deletes with Parquet backed columnar tables on S3

In the previous post, I explored a Parquet on S3 design with tombstones for constant time deletes and a CAS updated manifest for snapshot isolation. This post extends that design. The focus is in file delete operations where we replace a Parquet row group and publish a new footer using S3 Multipart Upload (MPU) and UploadPartCopy without having to download and rebuild unchanged bytes. We preserve the same system model with immutable data files and a manifest pointer that serializes visibility.

0 views
Shayon Mukherjee 1 weeks ago

An MVCC-like columnar table on S3 with constant-time deletes

Parquet is excellent for analytical workloads. Columnar layout, aggressive compression, predicate pushdown, but deletes require rewriting entire files. Systems like Apache Iceberg and Delta Lake solve this by adding metadata layers that track delete files separately from data files. But what if, for fun, we built something (arguably) simpler? S3 now has conditional writes (If-Match, If-None-Match) that enable atomic operations without external coordination. Let’s explore how we might build a columnar table format on S3 that gets most of Parquet’s benefits while supporting constant-time deletes.

0 views
Shayon Mukherjee 1 weeks ago

Exploring PostgreSQL to Parquet archival for JSON data with S3 range reads

PostgreSQL handles large JSON payloads reasonably well until you start updating or deleting them frequently. Once payloads cross the 8 KB TOAST threshold and churn becomes high, autovacuum can dominate your I/O budget and cause other issues. I have been exploring the idea of moving older JSON data (read: cold data) to Parquet on S3 while keeping recent data hot in PostgreSQL daily partitions, then simply dropping those partitions instead of running expensive DELETE operations and subsequent vacuum cycles.

0 views
Shayon Mukherjee 2 months ago

Bypass PostgreSQL catalog overhead with direct partition hash calculations

PostgreSQL’s hash partitioning distributes rows across partitions using deterministic hash functions. When you query through the parent table, PostgreSQL must perform catalog lookups to route each query to the correct partition. This results in measurable overhead for high-throughput applications, especially if you decide to use multi-level partitioning schemes where PostgreSQL must traverse deeper catalog structures to identify the target partition. Let’s take a look at some findings on speeding up the part where you already know the partition key values.

0 views
Shayon Mukherjee 3 months ago

Is AGI paradoxical?

A developer types implement user authentication and watches as Cursor generates 50 lines of secure, production-ready code in seconds. It’s remarkable—the AI understands context, follows best practices, even adds appropriate error handling. But here’s what’s fascinating: every pattern it used was learned from millions of human-written codebases. The AI didn’t invent authentication; it synthesized decades of human security knowledge at unprecedented speed. This same pattern plays out across today’s most impressive AI systems as we pursue Artificial General Intelligence.

0 views
Shayon Mukherjee 4 months ago

Pitfalls of premature closure with LLM assisted coding

A 51-year-old man walked into the emergency room with chest pain. The symptoms seemed clear enough: elevated blood pressure, chest discomfort, some cardiac irregularities. The emergency physician, attending doctor, and cardiologist all converged on the same diagnosis—acute coronary syndrome or accelerated hypertension. The classic signs of anything more serious simply weren’t there. But one hospitalist wasn’t satisfied. Despite multiple colleagues ruling out aortic dissection as unlikely, something felt incomplete. The pieces fit the common diagnosis, but not perfectly.

0 views
Shayon Mukherjee 5 months ago

Another look into PostgreSQL CTE materialization and non-idempotent subqueries

A few days ago, I wrote about a surprising planner behavior with CTEs, DELETE, and LIMIT in PostgreSQL, a piece I hastily put together on a bus ride. That post clearly only scratched the surface of a deeper issue that I’ve since spent way too many hours exploring. So here are some more formed thoughts and findings. The core issue revisited Let’s quickly recap: when using a query like the following, the planner might execute your query in ways you don’t expect.

0 views
Shayon Mukherjee 5 months ago

A PostgreSQL planner gotcha with CTEs DELETE and LIMIT

I recently discovered an unexpected behavior in PostgreSQL involving a pattern of using a Common Table Expression (CTE) with DELETE ... RETURNING and LIMIT to process a batch of items from a queue-like table. What seemed straightforward turned out to have a surprising interaction with the query planner. The scenario Let’s say you have a task_queue table and want to pull exactly one task for a specific queue_group_id. A common approach is using a CTE:

0 views
Shayon Mukherjee 7 months ago

Selective asynchronous commits in PostgreSQL - balancing durability and performance

I was recently looking into some workloads that generate a lot of I/O and CPU contention on some very high-write code paths and came across synchronous_commit (https://www.postgresql.org/docs/current/wal-async-commit.html). It can be very tempting to turn this off globally because the performance gains in terms of I/O, CPU, and TPS (transactions per second) are very hard to overlook. I noticed I/O completely gone, CPU down 20% (at peak), and a 30% increase in TPS.

0 views
Shayon Mukherjee 7 months ago

Challenging AI generated code from first principles

Prototyping new features and fixing bugs has become so much faster now that we have coding copilots like Cursor and LLMs (Large Language Models). We can generate boilerplate code in no time, saving time that would have gone into repetitive tasks. AI suggestions often feel like magic—type a quick prompt, accept a snippet, and plug it in. However, this speed comes with a downside: it’s too easy to bypass deep thinking.

0 views
Shayon Mukherjee 8 months ago

Scaling with PostgreSQL without boiling the ocean

“Postgres was great when we started but now that our service is being used heavily we are running into a lot of ‘weird’ issues” This sentiment is frequently echoed by CTOs and senior engineers at high-growth startups when I speak with them. Scaling PostgreSQL successfully doesn’t always require a full team of DBAs and experts. The beauty of PostgreSQL is that solutions often lie within the database itself - by rethinking your data access patterns from first principles, you can solve many business problems at scale.

0 views
Shayon Mukherjee 9 months ago

Database mocks are just not worth it

It’s tempting to rely on mocks for database calls. Mocking is faster and often feels more straightforward. However, testing against a real database uncovers hidden pitfalls that can appear as the application matures. Issues like unique constraint violations, default value handling, or even performance bottlenecks may only surface when the code is exercised against actual data. The importance of real database testing Consider a simple example where you create a user in your application.

0 views
Shayon Mukherjee 11 months ago

Using CTID Based Pagination for Data Cleanups in PostgreSQL

When dealing with very large PostgreSQL tables (we’re talking 15TB+), sometimes routine maintenance like archiving very old data can become surprisingly challenging. Despite having good indexes. I recently faced this issue when trying to clean up very old data on a very large and legacy table. The Problem Initial approach used standard ID-based pagination. Imagine a query like this: DELETE FROM large_table WHERE id IN ( SELECT id FROM large_table WHERE created_at < '2023-04-01' ORDER BY id LIMIT 10000 ); While totally harmless on the surface, this kept timing out (10 min statement_timeout 😬).

0 views
Shayon Mukherjee 1 years ago

pg_easy_replicate Supports Schema Change Tracking During Logical Replication

I have been meaning to support common DDLs (Data Definition Language) for pg_easy_replicate for quite some time now and I am super stoked that it is now finally out. This new capability addresses one of the limitations of PostgreSQL’s native logical replication, bringing more flexibility to database migrations and replication through pg_easy_replicate. What is pg_easy_replicate? For those new to the project, pg_easy_replicate is a CLI orchestrator tool that simplifies the process of setting up and managing logical replication between PostgreSQL databases.

0 views
Shayon Mukherjee 1 years ago

Stop Relying on IF NOT EXISTS for Concurrent Index Creation in PostgreSQL

As a developer, you might have encountered situations where creating an index in PostgreSQL fails due to lock timeouts. In such scenarios, it’s tempting to use the IF NOT EXISTS as a quick fix and move on. However, this approach can lead to subtle and hard-to-debug issues in production environments. Let’s understand how PostgreSQL handles concurrent index creation When we initiate CREATE INDEX CONCURRENTLY, PostgreSQL first creates an entry for the index in the system catalogs (specifically in pg_index) and marks it as invalid.

0 views
Shayon Mukherjee 1 years ago

The Tech Industry's Moral Vacuum

The New York Times recently reported on how a group of tech elites helped J.D. Vance leap into power. This story isn’t just about one candidate; it’s a symptom of a broader shift in the tech industry’s ethos. These influential figures are pouring millions into supporting candidates and a party that, quite frankly, hold regressive values that clash violently with the image the tech world once projected. Let’s be clear: the Republican party these billionaires are backing has consistently opposed women’s reproductive rights.

0 views
Shayon Mukherjee 1 years ago

Use pg_easy_replicate for setting up Logical Replication and Switchover in PostgreSQL

Logical replication is a powerful feature in PostgreSQL that allows for real-time data replication between databases. It can be used for performing major version upgrades using a blue/green setup where you have two databases, allowing you to test and switch over to a new version with minimal downtime. Logical replication can also be use to facilitate database migrations between different environments, using the same technique and tooling. In this post, I will describe the process of setting up simple replication and switchover between two databases using pg_easy_replicate.

0 views
Shayon Mukherjee 1 years ago

Fast, Simple and Metered Concurrency in Ruby with Concurrent::Semaphore

Let’s say you need to fetch a lot of data from an upstream API, then you want to manipulate that data, maybe even enrich it, and then send it downstream to a database or another API. You aim for high concurrency in fetching data (since it’s allowed), but you need to be cautious when sending the enriched data to the downstream API due to system limits. You can’t send events downstream at the same rate as you fetch them upstream.

0 views
Shayon Mukherjee 1 years ago

The value of sitting on an idea

Have you ever had a brilliant idea that you wanted to act on immediately? I think its safe to say we’ve all been there, and it’s tempting to jump right in. But what if I told you there’s immense value in simply sitting on an idea? Let’s explore this concept. Sometimes, the excitement of a new idea can cloud our judgment. It feels like the idea is burning a hole in our pocket, urging us to act.

0 views
Shayon Mukherjee 1 years ago

Incidents and the requirement of slowing down

Speed is key, except when its not. This adage, while seemingly prudent, overlooks a fundamental truth of complex systems: the paradox that slowing down can often be the fastest route to resolution. This contemplative approach to incident management is not merely a tactical choice but a philosophical stance on navigating the intricacies of system failures. It’s an acknowledgment that incidents have a tendency to breed more incidents. Imagine a scenario in which a production cluster experiences a sudden blip, leading to saturated connections and dropped requests for critical workflows.

0 views