Posts in Postgresql (6 found)
Andrew Atkinson 7 months ago

Short alphanumeric pseudo random identifiers in Postgres

In this post, we’ll cover a way to generate short, alphanumeric, pseudo random identifiers using native Postgres tactics. These identifiers can be used for things like transactions or reservations, where users need to read and share them easily. This approach is an alternative to using long, random generated values like UUID values, which have downsides for usability and performance. We’ll call the identifier a and store it in a column with that name. Here are some example values: In database design, we can use natural or surrogate keys to identify rows. We won’t cover the differences here as that’s out of scope. For our identifier, we’re going to generate it from a conventional surrogate primary key called . We aren’t using natural keys here. The is intended for use outside the database, while the primary key is used inside the database to be referenced by foreign key columns on other tables. Whle is short which minimizes space and speeds up access, the main reason for it is for usability. With that said, the target for total space consumption was to be fewer bytes than a 16-byte UUID. This was achieved with an primary key and this additional 5 character generated value, targeting a smaller database where this provides plenty of unique values now and into the future. Let’s get into the design details. Here were the desired design properties: Additional details: Here are the functions used: This function obfuscates the integer value using exclusive or (XOR) obfuscation. Main entrypoint function: This converts the obfuscated value into the alphanumeric value, used within . This is “base 62” with the 26 upper and lower case characters, and 10 numbers (0-9). Reverses the back into the original integer. Used within : For a length of 5 with this system, we can create up to around ~1 billion unique values. This was sufficiently large for the original use case. For use cases requiring more values, by storing 6 characters for then up to ~56 billion values could be generated, based on a primary key. Let’s create a sample table with an primary key with a generated identity column. Besides the use in the identity column, we’ll again use the keyword to create a column for the . The column uses the column as input, obfuscates it, encodes it to base 62, producing a 5 character value. How do we guarantee conforms to our expected data properties? Constraints! For an existing system, we could add a unique index first as follows: Then we can add the unique constraint using the unique index, along with the constraint: Let’s insert data into the table: Let’s query the data, and also make sure it’s reversed (using the function) properly: Let’s compare the time spent inserting 1 million rows into an equivalent table without the column or value generation. That took an average of 2037.906 milliseconds, or around 2 seconds on my machine. Inserting 1 million rows with the took an average of 6954.070 or around 7 seconds, or about 3.41x slower. Note that these times were with the indexes and constraints in place on the table in the second example, but not the first, meaning their presence contributed to the total time. Summary: Creating this identifier made the write operations 3.4x slower for me locally, which was an acceptable amount of overhead for the intended use case. Compared with random values, the pseudo random remains orderable, which means that lookups for individual rows or ranges of rows can use indexes, running fast and reliably even as row counts grow. We can add a unique index on the column like this: We can very that individual lookups or range scans use this index, by inspecting query execution plans for this table. https://github.com/andyatkinson/pg_scripts/pull/15 Feedback on this approach is welcomed! Please use my contact form to provide feedback or leave comments on the PR. Future enhancements to this could include unit tests using pgTAP for the functions, packaging them into an extension, or supporting more features like case insensitivity or a modified input alphabet. Thanks for reading! A fixed size, 5 characters in length, regardless of the size of the input integer (and within the range of the data type) Fewer bytes of space than a data type An obfuscated value, pseudo random, not easily guessable. While not easily guessable, this is not meant to be “secure” Reversibility back into the original integer Only native Postgres capabilities, no extensions, client web app language can be anything as it’s within Postgres Non math-heavy implementation is stored using not not , following recommendations for best practices PL/PgSQL functions, native Postgres data types and constraints are used, like UNIQUE, NOT NULL, and CHECK, and a stored generated column. Converts integers to bits, uses exclusive-or (XOR) bitwise operation and modulo operations. Did not set out to support case insensitivity now, possible future enhancement Did not try to exclude similar-looking characters (see: Base32 Crockford below), possible future enhancement Uses a Hexadecimal key (make this any key you want) Sets a max value for the data type range , which is just under 1 billion possible values. This was enough for this system and into the future, but a bigger system would want to use Converts integer bytes into bits gets a constraint and , so we know we have a unique value A constraint is added to validate the length Base32 Crockford - An emphasis ease of use for humans: removing similar looking characters, case insensitivity. ULID - Also 128 bits/8 bytes like UUIDs, so I had ruled these out for space consumption, and they’re slightly less “usable” NanoIDs at PlanetScale - I like aspects of NanoID. This is random generation though like UUID vs. encoding a unique integer.

0 views
Shayon Mukherjee 8 months ago

Another look into PostgreSQL CTE materialization and non-idempotent subqueries

A few days ago, I wrote about a surprising planner behavior with CTEs, DELETE, and LIMIT in PostgreSQL, a piece I hastily put together on a bus ride. That post clearly only scratched the surface of a deeper issue that I’ve since spent way too many hours exploring. So here are some more formed thoughts and findings. The core issue revisited Let’s quickly recap: when using a query like the following, the planner might execute your query in ways you don’t expect.

0 views
Shayon Mukherjee 1 years ago

Stop Relying on IF NOT EXISTS for Concurrent Index Creation in PostgreSQL

As a developer, you might have encountered situations where creating an index in PostgreSQL fails due to lock timeouts. In such scenarios, it’s tempting to use the IF NOT EXISTS as a quick fix and move on. However, this approach can lead to subtle and hard-to-debug issues in production environments. Let’s understand how PostgreSQL handles concurrent index creation When we initiate CREATE INDEX CONCURRENTLY, PostgreSQL first creates an entry for the index in the system catalogs (specifically in pg_index) and marks it as invalid.

0 views
hyPiRion 1 years ago

Implementing System-Versioned Tables in Postgres

There's no official support for system-versioned tables in Postgres yet, and the temporal extensions aren't supported on Azure/AWS/GCP. Let's implement it ourselves with 3 triggers and an index.

0 views
Pinaraf's website 12 years ago

Review – “Instant PostgreSQL Starter”

Thanks to Shaun M. Thomas , I have been offered a numeric copy of the “ Instant PostgreSQL Backup ” book from Packt publishing, and was provided with the “ Instant PostgreSQL Starter ” book to review. Considering my current work-situation, doing a lot of PostgreSQL advertising and basic teaching, I was interested in reviewing this one… Like the Instant collection ditto says, it’s short and fast. I kind of disagree with the “focused” for this one, but it’s perfectly fine considering the aim of that book. Years ago, when I was a kid, I discovered databases with a tiny MySQL-oriented book. It teaches you the basis : how to install, basic SQL queries, some rudimentary PHP integration. This book looks a bit like its PostgreSQL-based counterpart. It’s a quick travel through installation, basic manipulation, and the (controversy) “Top 9 features you need to know about”. And that’s exactly the kind of book we need. So, what’s inside ? I’d say what you need to kick-start with PostgreSQL. The installation part is straight forward : download, click, done. Now you can launch pgadmin, create an user, a database, and you’re done. Next time someone tells you PostgreSQL ain’t easy to install, show him that book. The second part is a fast SQL discovery, covering a few PostgreSQL niceties. It’s damn simple : Create, Read, Update, Delete. You won’t learn about indexes, functions, advanced queries here. For someone discovering SQL, it’s what needs to be known to just start… The last part, “Top 9 features you need to know about”, is a bit more hard to describe. PostgreSQL is a RDBMS with included batteries, choosing 9 features must have been a really hard time for the author, and I think nobody can be blamed for not choosing that or that feature you like : too much choice… The author spends some time on pg_crypto, the RETURNING clause with serial, hstore, XML, even recursive queries… This is, from my point of view, the troublesome part of the book : mentioning all these features means introducing complicated SQL queries. I would never teach someone how to do recursive queries before teaching him joins, it’s like going from elementary school to university in fourty pages. But the positive part is that an open-minded and curious reader will have a great teaser and nice tracks to follow to increase his knowledge of PostgreSQL. Mentioning hstore is really cool, that’s one of the PostgreSQL feature one have to know… To sum up my point of view about this book : it’s a nice book for beginners, especially considering the current NoSQL movement and people forgetting about SQL and databases. It’s a bit sad we don’t have more books like this one about PostgreSQL. I really hope Packt publishing will try to have a complete collection, from introduction (this book) to really advanced needs ( PostgreSQL High Performance comes to mind) through advanced SQL queries, administration tips and so on… They have a book about PostgreSQL Server Programming planned next month, I’m really looking forward to this one.

0 views