Posts in Backend (20 found)
Max Bernstein 1 weeks ago

Value numbering

Welcome back to compiler land. Today we’re going to talk about value numbering , which is like SSA, but more. Static single assignment (SSA) gives names to values: every expression has a name, and each name corresponds to exactly one expression. It transforms programs like this: where the variable is assigned more than once in the program text, into programs like this: where each assignment to has been replaced with an assignment to a new fresh name. It’s great because it makes clear the differences between the two expressions. Though they textually look similar, they compute different values. The first computes 1 and the second computes 2. In this example, it is not possible to substitute in a variable and re-use the value of , because the s are different. But what if we see two “textually” identical instructions in SSA? That sounds much more promising than non-SSA because the transformation into SSA form has removed (much of) the statefulness of it all. When can we re-use the result? Identifying instructions that are known at compile-time to always produce the same value at run-time is called value numbering . To understand value numbering, let’s extend the above IR snippet with two more instructions, v3 and v4. In this new snippet, v3 looks the same as v1: adding v0 and 1. Assuming our addition operation is some ideal mathematical addition, we can absolutely re-use v1; no need to compute the addition again. We can rewrite the IR to something like: This is kind of similar to the destructive union-find representation that JavaScriptCore and a couple other compilers use, where the optimizer doesn’t eagerly re-write all uses but instead leaves a little breadcrumb / instruction 1 . We could then run our copy propagation pass (“union-find cleanup”?) and get: Great. But how does this happen? How does an optimizer identify reusable instruction candidates that are “textually identical”? Generally, there is no actual text in the IR . One popular solution is to compute a hash of each instruction. Then any instructions with the same hash (that also compare equal, in case of collisions) are considered equivalent. This is called hash-consing . When trying to figure all this out, I read through a couple of different implementations. I particularly like the Maxine VM implementation. For example, here is the (hashing) and functions for most binary operations, slightly modified for clarity: The rest of the value numbering implementation assumes that if a function returns 0, it does not wish to be considered for value numbering. Why might an instruction opt-out of value numbering? An instruction might opt out of value numbering if it is not “pure”. Some instructions are not pure. Purity is in the eye of the beholder, but in general it means that an instruction does not interact with the state of the outside world, except for trivial computation on its operands. (What does it mean to de-duplicate/cache/reuse ?) A load from an array object is also not a pure operation 2 . The load operation implicitly relies on the state of the memory. Also, even if the array was known-constant, in some runtime systems, the load might raise an exception. Changing the source location where an exception is raised is generally frowned upon. Languages such as Java often have requirements about where exceptions are raised codified in their specifications. We’ll work only on pure operations for now, but we’ll come back to this later. We do often want to optimize impure operations as well! We’ll start off with the simplest form of value numbering, which operates only on linear sequences of instructions, like basic blocks or traces. Let’s build a small implementation of local value numbering (LVN). We’ll start with straight-line code—no branches or anything tricky. Most compiler optimizations on control-flow graphs (CFGs) iterate over the instructions “top to bottom” 3 and it seems like we can do the same thing here too. From what we’ve seen so far optimizing our made-up IR snippet, we can do something like this: The find-and-replace, remember, is not a literal find-and-replace, but instead something like: (if you have been following along with the toy optimizer series) This several-line function (as long as you already have a hash map and a union-find available to you) is enough to build local value numbering! And real compilers are built this way, too. If you don’t believe me, take a look at this slightly edited snippet from Maxine’s value numbering implementation. It has all of the components we just talked about: iterating over instructions, map lookup, and some substitution. This alone will get you pretty far. Code generators of all shapes tend to leave messy repeated computations all over their generated code and this will make short work of them. Sometimes, though, your computations are spread across control flow—over multiple basic blocks. What do you do then? Computing value numbers for an entire function is called global value numbering (GVN) and it requires dealing with control flow (if, loops, etc). I don’t just mean that for an entire function, we run local value numbering block-by-block. Global value numbering implies that expressions can be de-duplicated and shared across blocks. Let’s tackle control flow case by case. First is the simple case from above: one block. In this case, we can go top to bottom with our value numbering and do alright. The second case is also reasonable to handle: one block flowing into another. In this case, we can still go top to bottom. We just have to find a way to iterate over the blocks. If we’re not going to share value maps between blocks, the order doesn’t matter. But since the point of global value numbering is to share values, we have to iterate them in topological order (reverse post order (RPO)). This ensures that predecessors get visited before successors. If you have , we have to visit first and then . Because of how SSA works and how CFGs work, the second block can “look up” into the first block and use the values from it. To get global value numbering working, we have to copy ’s value map before we start processing so we can re-use the instructions. Maybe something like: Then the expressions can accrue across blocks. can re-use the already-computed from because it is still in the map. …but this breaks as soon as you have control-flow splits. Consider the following shape graph: We’re going to iterate over that graph in one of two orders: A B C or A C B. In either case, we’re going to be adding all this stuff into the value map from one block (say, B) that is not actually available to its sibling block (say, C). When I say “not available”, I mean “would not have been computed before”. This is because we execute either A then B or A then C. There’s no world in which we execute B then C. But alright, look at a third case where there is such a world: a control-flow join. In this diagram, we have two predecessor blocks B and C each flowing into D. In this diagram, B always flows into D and also C always flows into D. So the iterator order is fine, right? Well, still no. We have the same sibling problem as before. B and C still can’t share value maps. We also have a weird question when we enter D: where did we come from? If we came from B, we can re-use expressions from B. If we came from C, we can re-use expressions from C. But we cannot in general know which predecessor block we came from. The only block we know for sure that we executed before D is A. This means we can re-use A’s value map in D because we can guarantee that all execution paths that enter D have previously gone through A. This relationship is called a dominator relationship and this is the key to one style of global value numbering that we’re going to talk about in this post. A block can always use the value map from any other block that dominates it. For completeness’ sake, in the diamond diagram, A dominates each of B and C, too. We can compute dominators a couple of ways 4 , but that’s a little bit out of scope for this blog post. If we assume that we have dominator information available in our CFG, we can use that for global value numbering. And that’s just what—you guessed it—Maxine VM does. It iterates over all blocks in reverse post-order, doing local value numbering, threading through value maps from dominator blocks. In this case, their method gets the immediate dominator : the “closest” dominator block of all the blocks that dominate the current one. And that’s it! That’s the core of Maxine’s GVN implementation . I love how short it is. For not very much code, you can remove a lot of duplicate pure SSA instructions. This does still work with loops, but with some caveats. From p7 of Briggs GVN : The φ-functions require special treatment. Before the compiler can analyze the φ-functions in a block, it must previously have assigned value numbers to all of the inputs. This is not possible in all cases; specifically, any φ-function input whose value flows along a back edge (with respect to the dominator tree) cannot have a value number. If any of the parameters of a φ-function have not been assigned a value number, then the compiler cannot analyze the φ-function, and it must assign a unique, new value number to the result. It also talks about eliminating useless phis, which is optional, but would the strengthen global value numbering pass: it makes more information transparent. But what if we want to handle impure instructions? Languages such as Java allow for reading fields from the / object within methods as if the field were a variable name. This makes code like the following common: Each of these reference to and is an implicit reference to or , which is semantically a field load off an object. You can see it in the bytecode (thanks, Matt Godbolt): When straightforwardly building an SSA IR from the JVM bytecode for this method, you will end up with a bunch of IR that looks like this: Pretty much the same as the bytecode. Even though no code in the middle could modify the field (which would require a re-load), we still have a duplicate load. Bummer. I don’t want to re-hash this too much but it’s possible to fold Load and store forwarding into your GVN implementation by either: See, there’s nothing fundamentally stopping you from tracking the state of your heap at compile-time across blocks. You just have to do a little more bookkeeping. In our dominator-based GVN implementation, for example, you can: Not so bad. Maxine doesn’t do global memory tracking, but they do a limited form of load-store forwarding while building their HIR from bytecode: see GraphBuilder which uses the MemoryMap to help track this stuff. At least they would not have the same duplicate instructions in the example above! We’ve now looked at one kind of value numbering and one implementation of it. What else is out there? Apparently, you can get better results by having a unified hash table (p9 of Briggs GVN ) of expressions, not limiting the value map to dominator-available expressions. Not 100% on how this works yet. They note: Using a unified hash-table has one important algorithmic consequence. Replacements cannot be performed on-line because the table no longer reflects availability. Which is the first time that it occurred to me that hash-based value numbering with dominators was an approximation of available expression analysis. There’s also a totally different kind of value numbering called value partitioning (p12 of Briggs GVN ). See also a nice blog post about this by Allen Wang from the Cornell compiler course . I think this mostly replaces the hashing bit, and you still need some other thing for the available expressions bit. Ben Titzer and Seth Goldstein have some good slides from CMU . Where they talk about the worklist dataflow approach. Apparently this is slower but gets you more available expressions than just looking to dominator blocks. I wonder how much it differs from dominator+unified hash table. While Maxine uses hash table cloning to copy value maps from dominator blocks, there are also compilers such as Cranelift that use scoped hash maps to track this information more efficiently. (Though Amanieu notes that you may not need a scoped hash map and instead can tag values in your value map with the block they came from, ignoring non-dominating values with a quick check. The dominance check makes sense but I haven’t internalized how this affects the set of available expressions yet.) You may be wondering if this kind of algorithm even helps at all in a dynamic language JIT context. Surely everything is too dynamic, right? Actually, no! The JIT hopes to eliminate a lot of method calls and dynamic behaviors, replacing them with guards, assumptions, and simpler operations. These strength reductions often leave behind a lot of repeated instructions. Just the other day, Kokubun filed a value-numbering-like PR to clean up some of the waste. ART has a recent blog post about speeding up GVN. Go forth and give your values more numbers. There’s been an ongoing discussion with Phil Zucker on SSI, GVN, acyclic egraphs, and scoped union-find. TODO summarize Commutativity; canonicalization Seeding alternative representations into the GVN Aegraphs and union-find during GVN https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-egraph.md https://github.com/bytecodealliance/wasmtime/issues/9049 https://github.com/bytecodealliance/wasmtime/issues/4371 Writing this post is roughly the time when I realized that the whole time I was wondering why Cinder did not use union-find for rewriting, it actually did! Optimizing instruction by replacing with followed by copy propagation is equivalent to union-find.  ↩ In some forms of SSA, like heap-array SSA or sea of nodes, it’s possible to more easily de-duplicate loads because the memory representation has been folded into (modeled in) the IR.  ↩ The order is a little more complicated than that: reverse post-order (RPO). And there’s a paper called “A Simple Algorithm for Global Data Flow Analysis Problems” that I don’t yet have a PDF for that claims that RPO is optimal for solving dataflow problems.  ↩ There’s the iterative dataflow way (described in the Cooper paper (PDF)), Lengauer-Tarjan (PDF), the Engineered Algorithm (PDF), hybrid/Semi-NCA approach (PDF), …  ↩ initialize a map from instruction numbers to instruction pointers for each instruction if wants to participate in value numbering if ’s value number is already in the map, replace all pointers to in the rest of the program with the corresponding value from the map otherwise, add to the map doing load-store forwarding as part of local value numbering and clearing memory information from the value map at the end of each block, or keeping track of effects across blocks track heap write effects for each block at the start of each block B, union all of the “kill” sets for every block back to its immediate dominator finally, remove the stuff that got killed from the dominator’s value map V8 Hydrogen Writing this post is roughly the time when I realized that the whole time I was wondering why Cinder did not use union-find for rewriting, it actually did! Optimizing instruction by replacing with followed by copy propagation is equivalent to union-find.  ↩ In some forms of SSA, like heap-array SSA or sea of nodes, it’s possible to more easily de-duplicate loads because the memory representation has been folded into (modeled in) the IR.  ↩ The order is a little more complicated than that: reverse post-order (RPO). And there’s a paper called “A Simple Algorithm for Global Data Flow Analysis Problems” that I don’t yet have a PDF for that claims that RPO is optimal for solving dataflow problems.  ↩ There’s the iterative dataflow way (described in the Cooper paper (PDF)), Lengauer-Tarjan (PDF), the Engineered Algorithm (PDF), hybrid/Semi-NCA approach (PDF), …  ↩

0 views
Evan Schwartz 1 weeks ago

Scour - March Update

Hi friends, In March, Scour scoured 813,588 posts from 24,029 feeds (7,131 were newly added) and 488 new users signed up. Welcome! Here's what's new in the product: Scour now does a better job of ensuring that your feed draws from a mix of sources and that no single interest or group of interests dominates. I had made a number of changes along these lines in the past, but they were fiddly and the diversification mechanism wasn't working that well. Under the hood, Scour now does a first pass to score how similar articles are to your interests and then has a separate step for selecting posts for your feed while keeping it diverse on a number of different dimensions. Content from websites and groups of interests you tend to like and/or click on more are now given slightly more room in your feed. Conversely, websites and groups of interests you tend to dislike or not click on will be given a bit less space. For Scour, I'm always trying to think of how to show you more content you'll find interesting -- without trapping you in a small filter bubble (you can read about my ranking philosophy in the docs). After a number of iterations, I landed on a design that I'm happy with. I hope this strikes a good balance between making sure you see articles from your favorite sources, while still leaving room for the serendipity of finding a great new source that you didn't know existed. After you click an article, Scour now explicitly asks you for your reaction. These reactions help tune your feed slightly , and they help me improve the ranking algorithm over time. Before, the reaction buttons were below every post but that made them a bit hard to hit intentionally and easy to touch accidentally. If you want to react to an article without reading it first, you can also find them in the More Options ( ) menu. Thanks to Shane Sveller for pointing out that the reaction buttons were too small on mobile! Scour now supports exact keyword matching, in addition to using vector embeddings for semantic similarity. Articles that are similar to one of your interests but don't use the exact words or phrases from your interest definition will be ranked lower. Right now this applies to interests marked as "Specific" or "Normal" (this is also automatically determined when interests are created). This should cut down on the number of articles you see that are mis-categorized or clearly off-topic. Thanks to Alex Miller and an anonymous user for prompting this, and thanks to Alex, JackJackson, mhsid, snuggles, and anders_no for all the Off-Topic reports! Sometimes, I see an article on Hacker News or elsewhere and wonder why didn't this show up in my Scour feed. You can now paste links into the Why didn't I see this? page, and it will give you a bit of an explanation. You can also report that so I can look into it more and continue to improve the ranking algorithm over time. Here were some of my favorite posts that I found on Scour in March: Happy Scouring! P.S. If you use a coding agent like Claude Code, I also wrote up A Rave Review of Superpowers , a plugin that makes me much more productive. For anyone building products, this is a good reminder to make sure you're trying out and experiencing the bad parts of your product: Bored of eating your own dogfood? Try smelling your own farts! . This was a brief, interesting history and technical overview of document formats, from to and and why Markdown "won": Markdown Ate The World . A reminder that any user-generated input, including repo branch names, can be malicious: OpenAI Codex: How a Branch Name Stole GitHub Tokens . This is a very detailed and informative visual essay explaining how quantization (compression) for large language models works: Quantization from the ground up . I'm not currently using Turso (the Rust rewrite of SQLite), but I think what they're doing is interesting. Including this experimental version that speaks the Postgres SQL dialect: pgmicro . And because I like making -- and eating -- sour sourdough: How To Make Sourdough Bread More (Or Less) Sour .

0 views
W. Jason Gilmore 3 weeks ago

Troubleshooting Your Claude MCP Configuration

These days I add MCP support for pretty much every software product I build, including most recently IterOps and SecurityBot.dev . Creating the MCP server is very easy because I build all of my SaaS products using Laravel, and Laravel offers native MCP support . What's less clear is how to configure the MCP client to talk to the MCP server. This is because many MCP servers use to call the MCP server URL. This is easy enough, however if you're running NVM to assist with handling Node version discrepancies across multiple projects, then you might need to explicitly define the npx path inside the file, like this: If you're using Laravel Herd and the MCP client is crashing once Claude loads, it might be because you're using Herd's locally generated SSL certificates. The mcp-remote package doesn't like this and will complain about the certificate not being signed. You can tell mcp-remote to ignore this by adding the environment variable:

0 views
Simon Willison 3 weeks ago

Experimenting with Starlette 1.0 with Claude skills

Starlette 1.0 is out ! This is a really big deal. I think Starlette may be the Python framework with the most usage compared to its relatively low brand recognition because Starlette is the foundation of FastAPI , which has attracted a huge amount of buzz that seems to have overshadowed Starlette itself. Kim Christie started working on Starlette in 2018 and it quickly became my favorite out of the new breed of Python ASGI frameworks. The only reason I didn't use it as the basis for my own Datasette project was that it didn't yet promise stability, and I was determined to provide a stable API for Datasette's own plugins... albeit I still haven't been brave enough to ship my own 1.0 release (after 26 alphas and counting)! Then in September 2025 Marcelo Trylesinski announced that Starlette and Uvicorn were transferring to their GitHub account , in recognition of their many years of contributions and to make it easier for them to receive sponsorship against those projects. The 1.0 version has a few breaking changes compared to the 0.x series, described in the release notes for 1.0.0rc1 that came out in February. The most notable of these is a change to how code runs on startup and shutdown. Previously that was handled by and parameters, but the new system uses a neat lifespan mechanism instead based around an async context manager : If you haven't tried Starlette before it feels to me like an asyncio-native cross between Flask and Django, unsurprising since creator Kim Christie is also responsible for Django REST Framework. Crucially, this means you can write most apps as a single Python file, Flask style. This makes it really easy for LLMs to spit out a working Starlette app from a single prompt. There's just one problem there: if 1.0 breaks compatibility with the Starlette code that the models have been trained on, how can we have them generate code that works with 1.0? I decided to see if I could get this working with a Skill . Regular Claude Chat on claude.ai has skills, and one of those default skills is the skill-creator skill . This means Claude knows how to build its own skills. So I started a chat session and told it: Clone Starlette from GitHub - it just had its 1.0 release. Build a skill markdown document for this release which includes code examples of every feature. I didn't even tell it where to find the repo, Starlette is widely enough known that I expected it could find it on its own. It ran which is actually the old repository name, but GitHub handles redirects automatically so this worked just fine. The resulting skill document looked very thorough to me... and then I noticed a new button at the top I hadn't seen before labelled "Copy to your skills". So I clicked it: And now my regular Claude chat has access to that skill! I started a new conversation and prompted: Build a task management app with Starlette, it should have projects and tasks and comments and labels And Claude did exactly that, producing a simple GitHub Issues clone using Starlette 1.0, a SQLite database (via aiosqlite ) and a Jinja2 template. Claude even tested the app manually like this: For all of the buzz about Claude Code, it's easy to overlook that Claude itself counts as a coding agent now, fully able to both write and then test the code that it is writing. Here's what the resulting app looked like. The code is here in my research repository . You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
devansh 1 months ago

Four Vulnerabilities in Parse Server

Parse Server is one of those projects that sits quietly beneath a lot of production infrastructure. It powers the backend of a meaningful number of mobile and web applications, particularly those that started on Parse's original hosted platform before it shut down in 2017 and needed somewhere to migrate. Currently the project has over 21,000+ stars on GitHub I recently spent some time auditing its codebase and found four security vulnerabilities. Three of them share a common root, a fundamental gap between what is documented to do and what the server actually enforces. The fourth is an independent issue in the social authentication adapters that is arguably more severe, a JWT validation bypass that allows an attacker to authenticate as any user on a target server using a token issued for an entirely different application. The Parse Server team was responsive throughout and coordinated fixes promptly. All four issues have been patched. Parse Server is an open-source Node.js backend framework that provides a complete application backend out of the box, a database abstraction layer (typically over MongoDB or PostgreSQL), a REST and GraphQL API, user authentication, file storage, push notifications, Cloud Code for serverless functions, and a real-time event system. It is primarily used as the backend for mobile applications and is the open-source successor to Parse's original hosted backend-as-a-service platform. Parse Server authenticates API requests using one of several key types. The grants full administrative access to all data, bypassing all object-level and class-level permission checks. It is intended for trusted server-side operations only. Parse Server also exposes a option. Per its documentation, this key grants master-level read access, it can query any data, bypass ACLs for reading, and perform administrative reads, but is explicitly intended to deny all write operations. It is the kind of credential you might hand to an analytics service, a monitoring agent, or a read-only admin dashboard, enough power to see everything, but no ability to change anything. That contract is what three of these four vulnerabilities break. The implementation checks whether a request carries master-level credentials by testing a single flag — — on the auth object. The problem is that authentication sets both and , and a large number of route handlers only check the former. The flag is set but never consulted, which means the read-only restriction exists in concept but not in enforcement. Cloud Hooks are server-side webhooks that fire when specific Parse Server events occur — object creation, deletion, user signup, and so on. Cloud Jobs are scheduled or manually triggered background tasks that can execute arbitrary Cloud Code functions. Both are powerful primitives: Cloud Hooks can exfiltrate any data passing through the server's event stream, and Cloud Jobs can execute arbitrary logic on demand. The routes that manage Cloud Hooks and Cloud Jobs — creating new hooks, modifying existing ones, deleting them, and triggering job execution — are all guarded by master key access checks. Those checks verify only that the requesting credential has . Because satisfies that condition, a caller holding only the read-only credential can fully manage the Cloud Hook lifecycle and trigger Cloud Jobs at will. The practical impact is data exfiltration via Cloud Hook. An attacker who knows the can register a new Cloud Hook pointing to an external endpoint they control, then watch as every matching Parse Server event — user signups, object writes, session creation — is delivered to them in real time. The read-only key, intended to allow passive observation, can be turned into an active wiretap on the entire application's event stream. The fix adds explicit rejection checks to the Cloud Hook and Cloud Job handlers. Parse Server's Files API exposes endpoints for uploading and deleting files — and . Both routes are guarded by , a middleware that checks whether the incoming request has master-level credentials. Like the Cloud Hooks routes, this check only tests and never consults . The root cause traces through three locations in the codebase. In at lines 267–278, the read-only auth object is constructed with . In at lines 107–113, the delete route applies as its only guard. At lines 586–602 of the same file, the delete handler calls through to without any additional read-only check in the call chain. The consequence is that a caller with only can upload arbitrary files to the server's storage backend or permanently delete any existing file by name. The upload vector is primarily an integrity concern — poisoning stored assets. The deletion vector is a high-availability concern — an attacker can destroy application data (user avatars, documents, media) that may not have backups, and depending on how the application is structured, deletion of certain files could cause cascading application failures. The fix adds rejection to both the file upload and file delete handlers. This is the most impactful of the three issues. The endpoint is a privileged administrative route intended for master-key workflows — it accepts a parameter and returns a valid, usable session token for that user. The design intent is to allow administrators to impersonate users for debugging or support purposes. It is the digital equivalent of a master key that can open any door. The route's handler, , is located in at lines 339–345 and is mounted as at lines 706–708. The guard condition rejects requests where is false. Because produces an auth object where is true — and because there is no check anywhere in the handler or its middleware chain — the read-only credential passes the gate and the endpoint returns a fully usable for any provided. That session token is not a read-only token. It is a normal user session token, indistinguishable from one obtained by logging in with a password. It grants full read and write access to everything that user's ACL and role memberships permit. An attacker with the and knowledge of any user's object ID can silently mint a session as that user and then act as them with complete write access — modifying their data, making purchases, changing their email address, deleting their account, or doing anything else the application allows its users to do. There is no workaround other than removing from the deployment or upgrading. The fix is a single guard added to that rejects the request when is true. This vulnerability is independent of the theme and is the most severe of the four. It sits in Parse Server's social authentication layer — specifically in the adapters that validate identity tokens for Sign in with Google, Sign in with Apple, and Facebook Login. When a user authenticates via one of these providers, the client receives a JSON Web Token signed by the provider. Parse Server's authentication adapters are supposed to verify this token, they check the signature, the expiry, and critically, the audience claim — the field that specifies which application the token was issued for. Audience validation is what prevents a token issued for one application from being used to authenticate against a different application. Without it, a validly signed token from any Google, Apple, or Facebook application in the world can be used to authenticate against any Parse Server that trusts the same provider. The vulnerability arises from how the adapters handle missing configuration. For the Google and Apple adapters, the audience is passed to JWT verification via the configuration option. When is not set, the adapters do not reject the configuration as incomplete — they silently skip audience validation entirely. The JWT is verified for signature and expiry only, and any valid Google or Apple token from any app will be accepted. For Facebook Limited Login, the situation is worse, the vulnerability exists regardless of configuration. The Facebook adapter validates as the expected audience for the Standard Login (Graph API) flow. However, the Limited Login path — which uses JWTs rather than Graph API tokens — never passes to JWT verification at all. The code path simply does not include the audience parameter in the verification call, meaning no configuration value, however correct, can prevent the bypass on the Limited Login path. The attack is straightforward. An attacker creates or uses any existing Google, Apple, or Facebook application they control, signs in to obtain a legitimately signed JWT, and then presents that token to a vulnerable Parse Server's authentication endpoint. Because audience validation is skipped, the token passes verification. Combined with the ability to specify which Parse Server user account to associate the token with, this becomes full pre-authentication account takeover for any user on the server — with no credentials, no brute force, and no interaction from the victim. The fix enforces (Google/Apple) and (Facebook) as mandatory configuration and passes them correctly to JWT verification for both the Standard Login and Limited Login paths on all three adapters. What is Parse Server? The readOnlyMasterKey Contract Vulnerabilities CVE-2026-29182 Cloud Hooks and Cloud Jobs bypass readOnlyMasterKey CVE-2026-30228 File Creation and Deletion bypass readOnlyMasterKey CVE-2026-30229 /loginAs allows readOnlyMasterKey to gain full access as any user CVE-2026-30863 JWT Audience Validation Bypass in Google, Apple, and Facebook Adapters Disclosure Timeline CVE-2026-29182: GHSA-vc89-5g3r-cmhh — Fixed in 8.6.4 , 9.4.1-alpha.3 CVE-2026-30228: GHSA-xfh7-phr7-gr2x — Fixed in 8.6.5 , 9.5.0-alpha.3 CVE-2026-30229: GHSA-79wj-8rqv-jvp5 — Fixed in 8.6.6 , 9.5.0-alpha.4 CVE-2026-30863: GHSA-x6fw-778m-wr9v — Fixed in 8.6.10 , 9.5.0-alpha.11 Parse Server repository: github.com/parse-community/parse-server

0 views

Binding port 0 to avoid port collisions

It's common to spin up a server in a test so that you can do full end-to-end requests of it. It's a very important sort of test, to make sure things work all together. Most of the work I do is in complex web backends, and there's so much risk of not having all the request processing and middleware and setup exactly the same in a mock test... you must do at least some end-to-end tests or you're making a gamble that's going to bite you. And this is great, but you quickly run into a problem: port collisions! This can happen when you run multiple tests at once and all of them start a separate server, and whoops, two have picked the same port. Or it can happen if something else running on your development machine happens to be running on the port you chose. It's annoying when it happens, too, because it's often hard to reproduce. So... how do we fix that? You read the title [1] , so you know where we're going, but let's go there together. There are a few potential solutions to this. Perhaps the most obvious is binding to a port you choose randomly. This will work a lot of the time, but it's going to be flaky. You can drive down the probability of collision, but it's going to happen sometimes. Side note, I think the only thing worse than a test that fails 10% of the time is one that fails 1% of the time. It's not flaky enough to drive urgency for anyone to fix it, but it's flaky enough that in a team context, you will run into this on a daily basis. Ask me how I know. How often you get a collision depends on a lot of factors. How many times do you bind a port in the range? How many other services might bind something in that range? How likely are two things to run concurrently? As a simple example, let's say we pick a random port in the range 9000-9999, and you have 4 concurrent tests that will overlap. If you uniformly sample from this range, then you will have a 1/1000 chance of a collision from the second test, a 2/1000 chance from the third, and a 3/1000 chance from the fourth. Our probability of having no collision is . That means that we have a 0.6% chance of a collision. This isn't horrible, but it's not great! We could also have each test increment the port it picks by 1. I've done this before, and it avoids one set of problems from collisions, but it makes a new problem. Now you're sweeping across the entire range starting from the first port. If you have anything else running on your system that binds in that range, you'll run into a collision! And if you run your entire test suite in parallel, you're much more likely to have a problem now, since they all start at the same port. The problem we've had all along is that we don't have full information. If we know the system state and all the currently open ports, then binding to one that's not in use is an easy problem. And you know who knows all that info? The kernel does. And it turns out, this is something we can ask the kernel for. We can just say "please give me a nice unused port" and it will! There's a range of ports that the kernel uses for this. It varies by system, but it's not usually very relevant what the particular range is. On my system, I can find the range by checking . My ephemeral port range is from 32768 to 60999. I'm curious why the range stops there instead of going all the way up, so that's a future investigation. To get an ephemeral port on Linux systems, you bind or listen on port 0 . Then the kernel will hand you back a port in the ephemeral range. And you know that it's available, since the kernel is keeping track. It's possible to have an issue here if the full range of ports has been exhausted but, you know what, if you hit that limit, you probably have other problems [2] . The only thing is that if you've bound to an unknown port, how do you send requests to it? We can get the port we've bound to by another syscall, . This lets us find out what address a socket is bound to, and then we can do something with that information. For tests, that means that you'll need to find a way to communicate this port from the listener to the requester. If they're in the same process, I like to do this by either injecting in the listener or returning the address. If you're doing something like postgres or redis on an ephemeral port, then you'd probably have to find the port from its output, which is tedious but doable. Here's an example from a web app I'm working on. This is how a simple test looks. We launch the web server, binding to port 0, and get the address back. Then we can send requests to that address! And inside , the relevant two lines are: ...where in our case. That's all we have to do, and we'll get a much more reliable test setup. I think suspenseful titles can be fun, improve storytelling, and drive attention. But sometimes you really need a clear, honest, spoiler of a title. Giving away the answer is great when you're giving information that people might want to quickly internalize. ↩ If you do run into this, I'm very curious to hear about the circumstances. It's the kind of problem that I'd love to look at and work on. It's kind of messy, and you know that there's something very interesting that led to it being this way. ↩ I think suspenseful titles can be fun, improve storytelling, and drive attention. But sometimes you really need a clear, honest, spoiler of a title. Giving away the answer is great when you're giving information that people might want to quickly internalize. ↩ If you do run into this, I'm very curious to hear about the circumstances. It's the kind of problem that I'd love to look at and work on. It's kind of messy, and you know that there's something very interesting that led to it being this way. ↩

0 views
Jeremy Daly 1 months ago

Context Engineering for Commercial Agent Systems

Memory, Isolation, Hardening, and Multi-Tenant Context Infrastructure

0 views
iDiallo 2 months ago

Last year, all my non-programmer friends built apps

Last year, all my non-programmer friends were building apps. Yet today, those apps are nowhere to be found. Everyone followed the ads. They signed up for Lovable and all the fancy app-building services that exist. My LinkedIn feed was filled with PMs who had discovered new powers. Some posted bullet-point lists of "things to do to be successful with AI." "Don't work hard, work smart," they said, as if it were a deep insight. I must admit, I was a bit jealous. With a full-time job, I don't get to work on my cool side project, which has collected enough dust to turn into a dune. There's probably a little mouse living inside. I'll call him Muad'Dib. What was I talking about? Right. The apps. Today, my friends are silent. I still see the occasional post on LinkedIn, but they don't garner the engagement they used to. The app-building AI services still exist, but their customers have paused their subscriptions. Here's a conversation I had recently. A friend had "vibe-coded" an Android app. A platform for building communities around common interests. Biking enthusiasts could start a biking community. Cooking fans could gather around recipes. It was a neat idea. While using the app on his phone, swiping through different pages and watching the slick animations, I felt a bit jealous. Then I asked: "So where is the data stored?" "It's stored on the app," he replied. "I mean, all the user data," I pressed. "Do you use a database on AWS, or any service like that?" We went back and forth while I tried to clarify my question. His vibe-knowing started to show its limits. I felt some relief, my job was safe for now. Joking aside, we talked about servers, app architecture, and even GDPR compliance. These weren't things the AI builder had prepared him for. This conversation happens often now when I check in on friends who vibe-coded their way into developing an app or website. They felt on top of the world when they were getting started. But then they got stuck. An error message they couldn't debug. The service generating gibberish. Requests the AI couldn't understand. How do you build the backend of an app when you don't know what a backend is? And when the tool asks you to sign up for Google Cloud and start paying monthly fees, what are you supposed to do? Another friend wanted to build a newsletter. Right now, ChatGPT told him to set up WordPress and learn about SMTP. These are all good things to learn, but the "S" in SMTP is a lie. It's not that simple. I've been trying to explain to him why the email he is sending from the command line is not reaching his gmail. The AI services that promise to build applications are great at making a storefront you don't want to modify. The moment you start customizing, you run into problems. That's why all Lovable websites look exactly the same. These services continue to exist. The marketing is still effective. But few people end up with a product that actually solves their problems. My friends spent money on these services. They were excited to see a polished brochure. The problem is, they didn't know what it takes to actually run an app. The AI tools are amazing at generating the visible 20% of an app. But the remaining invisible 80% is where the actual work is. The infrastructure, the security, maintenance, scaling issues, and then the actual cost. The free tier on AWS doesn't last forever. And neither does your enthusiasm when you start paying $200/month for a hobby project. My friends' experiments weren't failures. They learned something valuable. Some now understand why developers get paid what they do. Some even started taking programming bootcamp. But the rest have moved on. Their app sits dormant in an abandoned github repo. Their domain will probably expire this year. They're back to their day jobs, a little wiser about the difference between a demo and a product. Their LinkedIn profiles are quieter now, they have stopped posting about "working smart, not hard." As for me, I should probably check on Muad'Dib. That side project isn't going to build itself. AI or no AI.

1 views
Justin Duke 2 months ago

Brief notes on migrating to Postgres-backed jobs

It seems premature to talk about a migration that is only halfway done, even if it's the hard half that's done — but I think there's something useful in documenting the why and how of a transition while you're still in the thick of it, before the revisionist history of completion sets in. Early last year, we built out a system for running background jobs directly against Postgres within Django. This very quickly got abstracted out into a generic task runner — shout out to Brandur and many other people who have been beating this drum for a while. And as far as I can tell, this concept of shifting away from Redis and other less-durable caches for job infrastructure is regaining steam on the Rails side of the ecosystem, too. The reason we did it was mostly for ergonomics around graceful batch processing. It is significantly easier to write a poller in Django for stuff backed by the ORM than it is to try and extend RQ or any of the other task runner options that are Redis-friendly. Django gives you migrations, querysets, admin visibility, transactional guarantees — all for free, all without another moving part. And as we started using it and it proved stable, we slowly moved more and more things over to it. At the time of this writing, around half of our jobs by quantity — which represent around two-thirds by overall volume — have been migrated over from RQ onto this system. This is slightly ironic given that we also last year released django-rq-cron , a library that, if I have my druthers, we will no longer need. Fewer moving parts is the watchword. We're removing spindles from the system and getting closer and closer to a simple, portable, and legible stack of infrastructure.

1 views
Preah's Website 2 months ago

BlogLog January 30 2026

Subscribe via email or RSS Updated Feeds page to format feeds list as a table instead of bulleted list for a cleaner appearance. Updated conversion script to use this change.

0 views
devansh 3 months ago

HonoJS JWT/JWKS Algorithm Confusion

After spending some time looking for security issues in JS/TS frameworks , I moved on to Hono - fast, clean, and popular enough that small auth footguns can become "big internet problems". This post is about two issues I found in Hono's JWT/JWKS verification path: Both were fixed in hono 4.11.4 , and GitHub Security Advisories were published on January 13, 2026 . If you already have experience with JWT stuff, you can skip this: The key point here is that, algorithm choice must not be attacker-controlled. Hono's JWT helper documents that is optional - and defaults to HS256. That sounds harmless until you combine it with a very common real-world setup: In that case, the verification path defaults to HS256, treating that public key string as an HMAC secret, and that becomes forgeable because public keys are, well… public. If an attacker can generate a token that passes verification, they can mint whatever claims the application trusts ( , , , etc.) and walk straight into protected routes. This is the "algorithm confusion" class of bugs, where you think you're doing asymmetric verification, but you're actually doing symmetric verification with a key the attacker knows. This is configuration-dependent. The dangerous case is: The core issue is, Hono defaults to , so a public key string can accidentally be used as an HMAC secret, allowing forged tokens and auth bypass. Advisory: GHSA-f67f-6cw9-8mq4 This was classified as High (CVSS 8.2) and maps it to CWE-347 (Improper Verification of Cryptographic Signature) . Affected versions: Patched version: 4.11.4 In the JWK/JWKS verification middleware, Hono could pick the verification algorithm like this: GitHub's advisory spells it out, when the selected JWK doesn't explicitly define an algorithm, the middleware falls back to using the from the unverified JWT header - and since in JWK is optional and commonly omitted, this becomes a real-world issue. If the matching JWKS key lacks , falls back to token-controlled , enabling algorithm confusion / downgrade attacks. "Trusting " is basically letting the attacker influence how you verify the signature. Depending on surrounding constraints (allowed algorithms, how keys are selected, and how the app uses claims), this can lead to forged tokens being accepted and authz/authn bypass . Advisory: GHSA-3vhc-576x-3qv4 This was classified as High (CVSS 8.2) , also CWE-347 , with affected versions and patched in 4.11.4 . Both advisories took the same philosophical stance i.e. Make explicit. Don't infer it from attacker-controlled input. The JWT middleware now requires an explicit option — a breaking change that forces callers to pin the algorithm instead of relying on defaults. Before (vulnerable): After (patched): (Example configuration shown in the advisory.) The JWK/JWKS middleware now requires an explicit allowlist of asymmetric algorithms, and it no longer derives the algorithm from untrusted JWT header values. It also explicitly rejects symmetric HS* algorithms in this context. Before (vulnerable): After (patched): (Example configuration shown in the advisory.) JWT / JWK / JWKS Primer Vulnerabilities [CVE-2026-22817] - JWT middleware "unsafe default" (HS256) Why this becomes an auth bypass Who is affected? Advisory / severity [CVE-2026-22817] - JWK/JWKS middleware fallback Why it matters Advisory / severity The Fix Fix for #1 (JWT middleware) Fix for #2 (JWK/JWKS middleware) Disclosure Timeline a default algorithm footgun in the JWT middleware that can lead to forged tokens if an app is misconfigured a JWK/JWKS algorithm selection bug where verification could fall back to an untrusted value JWT is . The header includes (the signing algorithm). JWK is a JSON representation of a key (e.g. an RSA public key). JWKS is a set of JWKs, usually hosted at something like . The app expects RS256 (asymmetric) The developer passes an RSA public key string But they don't explicitly set you use the JWT middleware with an asymmetric public key and you don't pin Use if present Otherwise, fall back to from the JWT (unverified input) Discovery: 09th Dec, 2025 First Response: 09th Dec, 2025 Patched in: hono 4.11.4 Advisories published: 13 Jan, 2026 Advisory: GHSA-f67f-6cw9-8mq4 Advisory: GHSA-3vhc-576x-3qv4

0 views
Grumpy Gamer 3 months ago

Hugo comments

I’ve been cleaning up my comments script for hugo and am about ready to upload it to Github. I added an option to use flat files or sqlite and it can notify Discord (and probably other services) when a comment is added. It’s all one php file. The reason I’m telling you this is to force myself to actually do it. Otherwise there would be “one more thing” and I’d never do it. I was talking to a game dev today about how to motivate yourself to get things done on your game. We both agreed publicly making promises is a good way.

0 views
Farid Zakaria 3 months ago

Huge binaries: papercuts and limits

In a previous post , I synthetically built a program that demonstrated a relocation overflow for a instruction. However, the demo required I add to disable some additional data that might cause other overflows for the purpose of this demonstration. What’s going on? 🤔 This is a good example that only a select few are facing the size-pressure of massive binaries. Even with which already is beginning to articulate to the compiler & linker: “Hey, I expect my binary to be pretty big.”; there are surprising gaps where the linker overflows. On Linux, an ELF binary includes many other sections beyond text and data necessary for code execution. Notably there are sections included for debugging (DWARF) and language-specific sections such as which is used by C++ to help unwind the stack on exceptions. Turns out that even with you might still run into overflow errors! 🤦🏻‍♂️ Note Funny enough, there is a very recent opened issue for this with LLVM #172777 ; perfect timing! For instance, assumes 32-bit values regardless of the code model. There are similar 32-bit assumptions in the data-structure of as well. I also mentioned earlier about a pattern about using multiple GOT, Global Offset Tables, to also avoid the 31-bit (±2GiB) relative offset limitation. Is there even a need for the large code-model? How far can that take us before we are forced to use the large code-model? Let’s think about it: First, let’s think about any limit due to overflow accessing the multiple GOTs. Let’s say we decide to space out our duplicative GOT every 1.5GiB. That means each GOT can grow at most 500MiB before there could exist a instruction from the code section that would result in an overflow. Each GOT entry is 8 bytes, a 64bit pointer. That means we have roughly ~65 million possible entries. A typical GOT relocation looks like the following and it requires 9 bytes: 7 bytes for the and 2 bytes for . That means we have 1.5GiB / 9 = ~178 million possible unique relocations. So theoretically, we can require more unique symbols in our code section than we can fit in the nearest GOT, and therefore cause a relocation overflow. 💥 The same problem exists for thunks, since the thunk is larger than the relative call in bytes. At some point, there is no avoiding the large code-model, however with multiple GOTs, thunks and other linker optimizations (i.e. LTO, relaxation), we have a lot of headroom before it’s necessary. 🕺🏻

0 views
matklad 3 months ago

The Second Great Error Model Convergence

I feel like this has been said before, more than once, but I want to take a moment to note that most modern languages converged to the error management approach described in Joe Duffy’s The Error Model , which is a generational shift from the previous consensus on exception handling. C++, JavaScript, Python, Java, C# all have roughly equivalent , , constructs with roughly similar runtime semantics and typing rules. Even functional languages like Haskell, OCaml, and Scala feature exceptions prominently in their grammar, even if their usage is frowned upon by parts of the community. But the same can be said about Go, Rust, Swift, and Zig! Their error handling is similar to each other, and quite distinct from the previous bunch, with Kotlin and Dart being notable, ahem, exceptions. Here are some commonalities of modern error handling: First , and most notably, functions that can fail are annotated at the call side. While the old way looked like this: the new way is There’s a syntactic marker alerting the reader that a particular operation is fallible, though the verbosity of the marker varies. For the writer, the marker ensures that changing the function contract from infallible to fallible (or vice versa) requires changing not only the function definition itself, but the entire call chain. On the other hand, adding a new error condition to a set of possible errors of a fallible function generally doesn’t require reconsidering rethrowing call-sites. Second , there’s a separate, distinct mechanism that is invoked in case of a detectable bug. In Java, index out of bounds or null pointer dereference (examples of programming errors) use the same language machinery as operational errors. Rust, Go, Swift, and Zig use a separate panic path. In Go and Rust, panics unwind the stack, and they are recoverable via a library function. In Swift and Zig, panic aborts the entire process. Operational error of a lower layer can be classified as a programming error by the layer above, so there’s generally a mechanism to escalate an erroneous result value to a panic. But the opposite is more important: a function which does only “ordinary” computations can be buggy, and can fail, but such failures are considered catastrophic and are invisible in the type system, and sufficiently transparent at runtime. Third , results of fallible computation are first-class values, as in Rust’s . There’s generally little type system machinery dedicated exclusively to errors and expressions are just a little more than syntax sugar for that little Go spell. This isn’t true for Swift, which does treat errors specially. For example, the generic function has to explicitly care about errors, and hard-codes the decision to bail early: Swift does provide first-classifier type for errors. Should you want to handle an exception, rather than propagate it, the handling is localized to a single throwing expression to deal with a single specific errors, rather than with any error from a block of statements: Swift again sticks to more traditional try catch, but, interestingly, Kotlin does have expressions. The largest remaining variance is in what the error value looks like. This still feels like a research area. This is a hard problem due to a fundamental tension: The two extremes are well understood. For exhaustiveness, nothing beats sum types ( s in Rust). This I think is one of the key pieces which explains why the pendulum seemingly swung back on checked exceptions. In Java, a method can throw one of the several exceptions: Critically, you can’t abstract over this pair. The call chain has to either repeat the two cases, or type-erase them into a superclass, losing information. The former has a nasty side-effect that the entire chain needs updating if a third variant is added. Java-style checked exceptions are sensitive to “N to N + 1” transitions. Modern value-oriented error management is only sensitive to “0 to 1” transition. Still, if I am back to writing Java at any point, I’d be very tempted to standardize on coarse-grained signature for all throwing methods. This is exactly the second well understood extreme: there’s a type-erased universal error type, and the “throwableness” of a function contains one bit of information. We only care if the function can throw, and the error itself can be whatever. You still can downcast dynamic error value handle specific conditions, but the downcasting is not checked by the compiler. That is, downcasting is “save” and nothing will panic in the error handling mechanism itself, but you’ll never be sure if the errors you are handling can actually arise, and whether some errors should be handled, but aren’t. Go and Swift provide first-class universal errors, like Midori. Starting with Swift 4, you can also narrow the type down. Rust doesn’t really have super strong conventions about the errors, but it started with mostly enums, and then and shone spotlight on the universal error type. But overall, it feels like “midpoint” error handling is poorly served by either extreme. In larger applications, you sorta care about error kinds, and there are usually a few place where it is pretty important to be exhaustive in your handling, but threading necessary types to those few places infects the rest of the codebases, and ultimately leads to “a bag of everything” error types with many “dead” variants. Zig makes an interesting choice of assuming mostly closed-world compilation model, and relying on cross-function inference to learn who can throw what. What I find the most fascinating about the story is the generational aspect. There really was a strong consensus about exceptions, and then an agreement that checked exceptions are a failure , and now, suddenly, we are back to “checked exceptions” with a twist, in the form of “errors are values” philosophy. What happened between the lull of the naughts and the past decade industrial PLT renaissance? On the one hand, at lower-levels you want to exhaustively enumerate errors to make sure that: internal error handling logic is complete and doesn’t miss a case, public API doesn’t leak any extra surprise error conditions. On the other hand, at higher-levels, you want to string together widely different functionality from many separate subsystems without worrying about specific errors, other than: separating fallible functions from infallible, ensuring that there is some top-level handler to show a 500 error or an equivalent.

0 views

Does the Internet know what time is it?

Time is one of those things that is significantly harder to deal with than you’d naively expect. Its common in computing to assume that computers know the current time. After all, there are protocols like NTP for synchronizing computer clocks, and they presumably work well and are widely used. Practically speaking, what kinds of hazards lie hidden here? I’ll start this post with some questions: Some quick definitions: I just checked the system time of my laptop against time.gov , which reports a -0.073s offset. So for a N=1 sample size, I’m cautiously optimistic. There are research papers, like Spanner, TrueTime & The CAP Theorem , that describe custom systems that rely on atomic clocks and GPS to provide clock services with very low, bounded error. While these are amazing feats of engineering, they remain out of reach for most applications. What if we needed to build a system that spanned countless computers across the Internet and required each to have a fairly accurate clock? I wasn’t able to find a study that measured clock offset in this way. There are, however, a number of studies that measure clock skew (especially for fingerprinting). Many of these studies are dated, so it seems like now is a good time for a new measurement. This post is my attempt to measure clock offsets, Internet-wide. When processing HTTP requests, servers fill the HTTP Date header . This header should indicate “the date and time at which the message originated”. Lots of web servers generate responses on-the-fly, so the Date header reveals the server’s clock in seconds. Looks pretty good. I’ll use this as the basis for the measurements. Unfortunately, there are a bunch of challenges we’ll need to deal with. First, resources may get cached in a CDN for some time and the Date header would reflect when the resource was generated instead of the server’s current time reference. Requesting a randomized path will bypass the CDN, typically generating a 404 error. Unfortunately, I found some servers will set the Date header to the last modified time of the 404 page template. I considered performing multiple lookups to see how the Date header advances between requests, but some websites are distributed, so we’d be measuring a different system’s clock with each request. The safest way to avoid this hazard is to only consider Date headers that are offset to the future, which is the approach we’ll use. HTTP responses will take some time to generate; sometimes spanning a couple seconds. We can’t be sure when the Date header was filled, but we know it was before we got the response. Since we only want to measure timestamps that are from the future, we can subtract the timestamp in the date header from when we received the response. This gives a lower bound for the underlying clock offset. When performing broad Internet scans you’ll find many servers have invalid or expired TLS certificates. For the sake of collecting more data I’ve disabled certificate validations while scanning. Finally, our own system clock has skew. To minimize the effect of local clock skew I made sure I had a synchronization service running (systemd-timesyncd on Debian) and double checked my offset on time.gov. All offset measurements are given in whole seconds, rounding towards zero, to account for this challenge. The measurement tool is mostly a wrapper around this Golang snippet: For performance reasons, the code performs a HTTP HEAD request instead of the heavier GET request. Starting in late-November I scanned all domain names on the Tranco top 1,000,000 domains list (NNYYW) . I scanned slowly to avoid any undesired load on third-party systems, with the scan lasting 25 days. Of the million domain names, 241,570 systems could not be measured due to connection error such as timeout, DNS lookup failure, connection refusal, or similar challenges. Not all the domains on the Tranco list have Internet-accessible HTTPS servers running at the apex on the standard port, so these errors are expected. Further issues included HTTP responses that lacked a Date header (13,098) or had an unparsable Date header (102). In all, 745,230 domain names were successfully measured. The vast majority of the measured domains had an offset of zero (710,189; 95.3%). Date headers set to the future impacted 12,717 domains (1.7%). Date headers set to the past will be otherwise ignored, but impacted 22,324 domains (3.0%). The largest positive offset was 39,867,698 seconds, landing us 461 days in the future (March 2027 at scan time). If we graph this we’ll see that the vast majority of our non-negative offsets are very near zero. We also observe that very large offsets are possible but quite rare. I can’t make out many useful trends from this graph. The large amount of data points near zero seconds skews the vertical scale and the huge offsets skew the horizontal scale. Adjusting the graph to focus on 10 seconds to 86,400 seconds (one day) and switching offsets to a log scale provides this graph: This curve is much closer to my expectations. I can see that small offsets of less than a minute have many observances. One thing I didn’t expect were spikes at intervals of whole hours, but it makes a lot of sense in hindsight. This next graph shows the first day, emphasizing data points that exactly align to whole hour offsets. The largest spikes occur at one, three, and nine hours with no clear trend. Thankfully, geography seems to explain these spikes quite well. Here are the top-level domains (TLDs) of domains seen with exactly one hour offset: Germany (.DE), Czech Republic (.CZ), Sweden (.SE), Norway (.NO), Italy (.IT), and Belgium (.BE) are all currently using Central European Time, which uses offset UTC+1. TLDs of domains seen with exactly three hour offset: The country-code top-level domain (ccTLD) for Russia is .RU and Moscow Standard Time is UTC+3. TLDs of domains with exactly nine hour offset: South Korea (.KR) and Cocos (Keeling) Islands (.CC) follow UTC+9. So I strongly suspect these whole-hour offset spikes are driven by local time zones. These systems seem to have set their UTC time to the local time, perhaps due to an administrator who set the time manually to local time, instead of using UTC and setting their timezone. While this type of error is quite rare, impacting only 49 of the measured domain names (0.007%), the large offsets could be problematic. Another anomalous datapoint at 113 seconds caught my attention. Almost all of the data points at the 113 second offset are for domain names hosted by the same internet service provider using the same IP block. A single server can handle traffic for many domain names, all of which will have the same clock offset. We’ll see more examples of this pattern later. Knowing that we have some anomalous spikes due to shared hosting and spikes at whole hour intervals due to timezone issues, I smoothed out the data to perform modeling. Here’s a graph from zero to fifty-nine minutes, aggregating ten second periods using the median. I added a power-law trend line, which matches the data quite well (R 2 = 0.92). I expected to see a power-law distribution, as these are common when modeling randomized errors, so my intuition feels confirmed. The average clock offset, among those with a non-negative offset, was 6544.8 seconds (about 109 minutes). The median clock offset was zero. As with other power-law distributions, the average doesn’t feel like a useful measure due to the skew of the long tail. The HTTP Date header measurement has proven useful for assessing offsets of modern clocks, but I’m also interested in historical trends. I expect that computers are getting better at keeping clocks synchronized as we get better at building hardware, but can we measure it? I know of some bizarre issues that have popped up over time, like this Windows STS bug , so its even possible we’ve regressed. Historical measurements require us to ask “when was this timestamp generated?” and measure the error. This is obviously tricky as the point of the timestamp is to record the time, but we suspect the timestamp has error. Somehow, we’ve got to find a more accurate time to compare each timestamp against. It took me a while to think of a useful dataset, but I think git commits provide a viable way to measure historical clock offsets. We’ve got to analyze git commit timestamps carefully as there’s lots of ways timestamps can be out of order even when clocks are fully synchronized. Let’s first understand how “author time” and “commit time” work. When you write some code and it, you’ve “authored” the code. The git history at this point will show both an “author time” and “commit time” of the same moment. Later you may merge that code into a “main” branch, which updates the “commit time” to the time of the merge. When you’re working on a team you may see code merged in an order that’s opposite the order it was written, meaning the “author times” can be out of chronological order. The “commit times”, however, should be in order. The Linux kernel source tree is a good candidate for analysis. Linux was one of the first adopters of git, as git was written to help Linux switch source control systems. My local git clone of Linux shows 1,397,347 commits starting from 2005. It may be the largest substantive project using git, and provides ample data for us to detect timestamp-based anomalies. I extracted the timing and other metadata from the git history using: Here’s a graph of the “commit time”, aggregating 1000 commit blocks using various percentiles, showing that commits times are mostly increasing. While there’s evidence of anomalous commit timestamps here, there are too few for us find meaningful trends. Let’s keep looking. Here’s a graph of the “author time” showing much more variation: We should expect to see author times vary, as it takes differing amounts of time for code to be accepted and merged. But there are also large anomalies here, including author times that are decidedly in the future and author times that pre-date both git and Linux. We can get more detail in the graph by zooming into the years Linux has been developed thus far: This graph tells a story about commits usually getting merged quickly, but some taking a long time to be accepted. Certain code taking longer to review is expected, so the descending blue data points are expected. There are many different measurements we could perform here, but I think the most useful will be “author time” minus “commit time”. Typically, we expect that code is developed, committed, reviewed, approved, and finally merged. This provides an author time that is less than the commit time, as review and approval steps take time. A positive value of author time minus commit time would indicate that the code was authored in the future, relative to the commit timestamp. We can’t be sure whether the author time or the commit time was incorrect (or both), but collectively they record a timestamp error. These commits are anomalous as the code was seemingly written, committed, then traveled back in time to be merged. We’ll refer to these commits as time travelling commits, although timestamp errors are very likely the correct interpretation. Looking at the Linux git repo, I see 1,397,347 commits, of which 1,773 are time travelling commits. This is 0.127% of all commits, a somewhat rare occurrence. Here’s a graph of these timestamp errors: There are some fascinating patterns here! Ignoring the marked regions for a moment, I notice that offsets below 100 seconds are rare; this is quite unlike the pattern seen for HTTP Date header analysis. I suspect the challenge is that there is usually a delay between when a commit is authored and when it is merged. Code often needs testing and review before it can be merged; those tasks absorb any small timestamp errors. This will make modeling historical clock offset trends much more difficult. The region marked “A” shows many errors below 100 seconds, especially along linear spikes. There appears to be two committers in this region, both using “de.ibm.com” in their email address. The majority of authors in region A have “ibm.com” in their email address. So these anomalies appear to be largely due to a single company. These commits appear to have the author timestamp rewritten to a (mostly) sequential pattern. Here are the commits for two of the days: The author dates here are perfectly sequential, with one second between each commit. The commit dates also increase, but more slowly, such that the difference between author date and commit date increases with later commits. I suspect these timestamps were set via some sort of automation software when processing a batch of commits. The software may have initially set both author and commit timestamps to the current time, but then incremented the author timestamp by one with each subsequent commit while continuing to use the current time for the commit timestamp. If the software processed commits faster than one per second, we’d see this pattern. I don’t think these timestamps are evidence of mis-set clocks, but rather an automated system with poor timestamp handling code. The region marked “B” shows many errors near a 15.5 hour offset (with several exactly on the half-hour mark). Looking at the email addresses I see several “com.au” domains, suggesting some participants were located in Australia (.AU). Australia uses several time zones, including UTC+8, UTC+8:45, UTC+9:30, UTC+10, UTC+10:30, and UTC+11… but nothing near 15.5 hours. The GitHub profiles for one of the committers shows a current timezone of UTC-5. This suggests that an author in Australia and a committer in the Americas both mis-set their clocks, perhaps combining UTC+10:30 and UTC-5 to to reach the 15.5 hour offset. We saw examples of timezone related clock errors when looking at the HTTP Date header; this appears to be an example of two timezone errors combining. The region marked “C” shows many error around 30 to 260 days, which are unusually large errors. The committer for each of these is the same email address, using the “kernel.org” domain name. If we render the author and committer timestamps we’ll see this pattern: I notice that the day in the author timestamp usually matches the month in the committer timestamp, and when it doesn’t it’s one smaller. When the author day and the committer month match, the author month is less than or the same as the committer day. The days in the author timestamp vary between one and nine, while the days in the commit timestamp vary between eight and twenty-one. This suggests that the author timestamp was set incorrectly, swapping the day and month. Looking at these commits relative to the surrounding commits, the commit timestamps appears accurate. If I fix the author timestamps by swapping the day and month, then the data points are much more reasonable. The author timestamps are no longer after the commit timestamps, with differences varying between zero and thirty-six days, and an average of nine days. So it seems these author timestamps were generated incorrectly, swapping month and day, causing them to appear to travel back in time. Git has had code for mitigating these sorts of issues since 2006, like this code that limits timestamps to ten days in the future . I’m not sure why the commits in region “C” weren’t flagged as erroneous. Perhaps a different code path was used? Region “C” doesn’t appear to be related to a mis-set system clock, but instead a date parsing error that swapped day and month. This type of error is common when working between different locales, as the ordering of month and day in a date varies by country . Finally, the region marked “D” shows a relatively sparse collection of errors. This may suggest that git timestamp related errors are becoming less common. But there’s an analytical hazard here: we’re measuring timestamps that are known to time travel. It’s possible that this region will experience more errors in the future! I suspect region “A” and “C” are due to software bugs, not mis-set clocks. Region “B” may be due to two clocks, both mis-set due to timezone handling errors. It seems unwise to assume that I’ve caught all the anomalies and can attribute the rest of the data points to mis-set clocks. Let’s continue with that assumption anyway, knowing that we’re not on solid ground. The Linux kernel source tree is an interesting code base, but we should look at more projects. This next graph counts positive values of “author time” minus “commit time” for Linux, Ruby, Kubernetes, Git, and OpenSSL. The number of erroneous timestamps is measured per-project against the total commits in each year. It’s difficult to see a trend here. Linux saw the most time travelling commits from 2008 through 2011, each year above 0.4%, and has been below 0.1% since 2015. Git had zero time travelling commits since 2014, with a prior rate below 0.1%. Digging into the raw data I notice that many time travelling commits were generated by the same pair of accounts. For Kubernetes, 78% were authored by [email protected] and merged by [email protected] , although these were only one second in the future. These appear to be due to the “Kubernetes Submit Queue”, where the k8s-merge-robot authors a commit on one system and the merge happens within GitHub. For Ruby, 89% were authored by the same user and committed by [email protected] with an offset near 30 seconds. I attempted to correct for these biases by deduplicating commit-author pairs, but the remaining data points were too sparse to perform meaningful analysis. Time travelling usually reaches its peak two to four years after a project adopts source control, ramping up before, and generally falling after. This hints at a project management related cause to these spikes. I’ll speculate that this is due to developers initially using Git cautiously as it is new to them, then as they get comfortable with Git they begin to build custom automation systems. These new automation systems have bugs or lack well-synchronized clocks, but these issues are addressed over time. I don’t think I can make any conclusion from this data about system clocks being better managed over time. This data doesn’t support my expectation that erroneous timestamps would reduce over time, and I’ll call this a “negative result”. There’s too many challenges in this data set. This analysis explored timestamps impacted by suspected mis-set clocks. HTTP scanning found that 1.7% of domain names had a Date header mis-set to the future. Web server offsets strongly matched a power-law distribution such that small offsets were by far the most common. Git commit analysis found up to 0.65% of commits (Linux, 2009) had author timestamps in the future, relative to the commit timestamp. No clear historical trend was discovered. Timestamps with huge offsets were detected. The largest Linux commit timestamp was in the year 2085 and the largest HTTP Date header was in the year 2027. This shows that while small timestamps were most common, large errors will occur. Many underlying causes were proposed while analyzing the data, including timezone handling errors, date format parsing errors, and timestamps being overwritten by automated systems. Many data points were caused by the same group, like IP address blocks used by many domains or Git users (or robots) interacting with multiple commits. Deduplicating these effects left too few data points to perform trend analysis. Synchronizing computer clocks and working with timestamps remains a challenge for the industry. I’m sure there are other data sets that support this kind of measurement. If you’ve got any, I’d love to hear what trends you can discover! How often are computer clocks set to the wrong time? How large do these offsets grow? Can we model clock offsets, and make predictions about them? Are out-of-sync clocks a historical concern that we’ve largely solved, or is this still a concern? Clock skew : the rate at which a clock deviates from a one-second-per-second standard, often measured in parts per million Clock offset : the difference between the displayed time and Coordinated Universal Time (UTC), often measured in seconds

0 views
Filippo Valsorda 3 months ago

Building a Transparent Keyserver

Today, we are going to build a keyserver to lookup age public keys. That part is boring. What’s interesting is that we’ll apply the same transparency log technology as the Go Checksum Database to keep the keyserver operator honest and unable to surreptitiously inject malicious keys, while still protecting user privacy and delivering a smooth UX. You can see the final result at keyserver.geomys.org . We’ll build it step-by-step, using modern tooling from the tlog ecosystem, integrating transparency in less than 500 lines. I am extremely excited to write this post: it demonstrates how to use a technology that I strongly believe is key in protecting users and holding centralized services accountable, and it’s the result of years of effort by me, the TrustFabric team at Google, the Sigsum team at Glasklar , and many others. This article is being cross-posted on the Transparency.dev Community Blog . Let’s start by defining the goal: we want a secure and convenient way to fetch age public keys for other people and services. 1 The easiest and most usable way to achieve that is to build a centralized keyserver: a web service where you log in with your email address to set your public key, and other people can look up public keys by email address. Trusting the third party that operates the keyserver lets you solve identity, authentication, and spam by just delegating the responsibilities of checking email ownership and implementing rate limiting. The keyserver can send a link to the email address, and whoever receives it is authorized to manage the public key(s) bound to that address. I had Claude Code build the base service , because it’s simple and not the interesting part of what we are doing today. There’s nothing special in the implementation: just a Go server, an SQLite database, 2 a lookup API, a set API protected by a CAPTCHA that sends an email authentication link, 3 and a Go CLI that calls the lookup API. A lot of problems are shaped like this and are much more solvable with a trusted third party: PKIs, package registries, voting systems… Sometimes the trusted third party is encapsulated behind a level of indirection, and we talk about Certificate Authorities, but it’s the same concept. Centralization is so appealing that even the OpenPGP ecosystem embraced it: after the SKS pool was killed by spam , a new OpenPGP keyserver was built which is just a centralized, email-authenticated database of public keys. Its FAQ claims they don’t wish to be a CA, but also explains they don’t support the (dubiously effective) Web-of-Trust at all, so effectively they can only act as a trusted third party. The obvious downside of a trusted third party is, well, trust. You need to trust the operator, but also whoever will control the operator in the future, and also the operator’s security practices. That’s asking a lot, especially these days, and a malicious or compromised keyserver could provide fake public keys to targeted victims with little-to-no chance of detection. Transparency logs are a technology for applying cryptographic accountability to centralized systems with no UX sacrifices. A transparency log or tlog is an append-only, globally consistent list of entries, with efficient cryptographic proofs of inclusion and consistency. The log operator appends entries to the log, which can be tuples like (package, version, hash) or (email, public key) . The clients verify an inclusion proof before accepting an entry, guaranteeing that the log operator will have to stand by that entry in perpetuity and to the whole world, with no way to hide it or disown it. As long as someone who can check the authenticity of the entry will eventually check (or “monitor”) the log, the client can trust that malfeasance will be caught. Effectively, a tlog lets the log operator stake their reputation to borrow time for collective, potentially manual verification of the log’s entries. This is a middle-ground between impractical local verification mechanisms like the Web of Trust , and fully trusted mechanisms like centralized X.509 PKIs. If you’d like a longer introduction, my Real World Crypto 2024 talk presents both the technical functioning and abstraction of modern transparency logs. There is a whole ecosystem of interoperable tlog tools and publicly available infrastructure built around C2SP specifications. That’s what we are going to use today to add a tlog to our keyserver. If you want to catch up with the tlog ecosystem, my 2025 Transparency.dev Summit Keynote maps out the tools, applications, and specifications. If you are familiar with Certificate Transparency, tlogs are derived from CT, but with a few major differences. Most importantly, there is no separate entry producer (in CT, the CAs) and log operator; moreover, clients check actual inclusion proofs instead of SCTs; finally, there are stronger split-view protections, as we will see below. The Static CT API and Sunlight CT log implementation were a first successful step in moving CT towards the tlog ecosystem, and a proposed design called Merkle Tree Certificates redesigns the WebPKI to have tlog-like and tlog-interoperable transparency. In my experience, it’s best not to think about CT when learning about tlogs. A better production example of a tlog is the Go Checksum Database , where Google logs the module name, version, and hash for every module version observed by the Go Modules Proxy. The module fetches happen over regular HTTPS, so there is no publicly-verifiable proof of their authenticity. Instead, the central party appends every observation to the tlog, so that any misbehavior can be caught. The command verifies inclusion proofs for every module it downloads, protecting 100% of the ecosystem, without requiring module authors to manage keys. Katie Hockman gave a great talk on the Go Checksum Database at GopherCon 2019. You might also have heard of Key Transparency . KT is an overlapping technology that was deployed by Apple, WhatsApp, and Signal amongst others. It has similar goals, but picks different tradeoffs that involve significantly more complexity, in exchange for better privacy and scalability in some settings. Ok, so how do we apply a tlog to our email-based keyserver? It’s pretty simple, and we can do it with a 250-line diff using Tessera and Torchwood . Tessera is a general-purpose tlog implementation library, which can be backed by object storage or a POSIX filesystem. For our keyserver, we’ll use the latter backend, which stores the whole tlog in a directory according to the c2sp.org/tlog-tiles specification. Every time a user sets their key, we append an encoded (email, public key) entry to the tlog, and we store the tlog entry index in the database. The lookup API produces a proof from the index and provides it to the client. The proof follows the c2sp.org/tlog-proof specification. It looks like this and it combines a checkpoint (a signed snapshot of the log at a certain size), the index of the entry in the log, and a proof of inclusion of the entry in the checkpoint. The client CLI receives the proof from the lookup API, checks the signature on the checkpoint from the built-in log public key, hashes the expected entry, and checks the inclusion proof for that hash and checkpoint. It can do all this without interacting further with the log. If you squint, you can see that the proof is really a “fat signature” for the entry, which you verify with the log’s public key, just like you’d verify an Ed25519 or RSA signature for a message. I like to call them spicy signatures to stress how tlogs can be deployed anywhere you can deploy regular digital signatures . What’s the point of all this though? The point is that anyone can look through the log to make sure the keyserver is not serving unauthorized keys for their email address! Indeed, just like backups are useless without restores and signatures are useless without verification , tlogs are useless without monitoring . That means we need to build tooling to monitor the log. On the server side, it takes two lines of code, to expose the Tessera POSIX log directory. On the client side, we add an flag to the CLI that reads all matching entries in the log. To enable effective monitoring, we also normalize email addresses by trimming spaces and lowercasing them, since users are unlikely to monitor all the variations. We do it before sending the login link, so normalization can’t lead to impersonation. A complete monitoring story would involve 3rd party services that monitor the log for you and email you if new keys are added, like gopherwatch and Source Spotter do for the Go Checksum Database, but the flag is a start. The full change involves 5 files changed, 251 insertions(+), 6 deletions(-) , plus tests, and includes a new keygen helper binary, the required database schema and help text and API changes, and web UI changes to show the proof. Edit : the original patch series is missing freshness checks in monitor mode, to ensure the log is not hiding entries from monitors by serving them an old checkpoint. The easiest solution is checking the timestamp on witness cosignatures ( +15 lines ). You will learn about witness cosignatures below. We created a problem by implementing this tlog, though: now all the email addresses of our users are public! While this is ok for module names in the Go Checksum Database, allowing email address enumeration in our keyserver is a non-starter for privacy and spam reasons. We could hash the email addresses, but that would still allow offline brute-force attacks. The right tool for the job is a Verifiable Random Function. You can think of a VRF as a hash with a private and public key: only you can produce a hash value, using the private key, but anyone can check that it’s the correct (and unique) hash value, using the public key. Overall, implementing VRFs takes less than 130 lines using the c2sp.org/vrf-r255 instantiation based on ristretto255 , implemented by filippo.io/mostly-harmless/vrf-r255 (pending a more permanent location). Instead of the email address, we include the VRF hash in the log entry, and we save the VRF proof in the database. The tlog proof format has space for application-specific opaque extra data, so we can store the VRF proof there, to keep the tlog proof self-contained. In the client CLI, we extract the VRF hash from the tlog proof’s extra data and verify it’s the correct hash for the email address. How do we do monitoring now, though? We need to add a new API that provides the VRF hash (and proof) for an email address. On the client side, we use that API to obtain the VRF proof, we verify it, and we look for the VRF hash in the log instead of looking for the email address. Attackers can still enumerate email addresses by hitting the public lookup or monitor API, but they’ve always been able to do that: serving such a public API is the point of the keyserver! With VRFs, we restored the original status quo: enumeration requires brute-forcing the online, rate-limited API, instead of having a full list of email addresses in the tlog (or hashes that can be brute-forced offline). VRFs have a further benefit: if a user requests to be deleted from the service, we can’t remove their entries from the tlog, but we can stop serving the VRF for their email address 4 from the lookup and monitor APIs. This makes it impossible to obtain the key history for that user, or even to check if they ever used the keyserver, but doesn’t impact monitoring for other users. The full change adding VRFs involves 3 files changed, 125 insertions(+), 13 deletions(-) , plus tests. We have one last marginal risk to mitigate: since we can’t ever remove entries from the tlog, what if someone inserts some unsavory message in the log by smuggling it in as a public key, like ? Protecting against this risk is called anti-poisoning . The risk to our log is relatively small, public keys have to be Bech32-encoded and short, so an attacker can’t usefully embed images or malware. Still, it’s easy enough to neutralize it: instead of the public keys, we put their hashes in the tlog entry, keeping the original public keys in a new table in the database, and serving them as part of the monitor API. It’s very important that we persist the original key in the database before adding the entry to the tlog. Losing the original key would be indistinguishable from refusing to provide a malicious key to monitors. On the client side, to do a lookup we just hash the public key when verifying the inclusion proof. To monitor in mode, we match the hashes against the list of original public keys provided by the server through the monitor API. Our final log entry format is . Designing the tlog entry is the most important part of deploying a tlog: it needs to include enough information to let monitors isolate all the entries relevant to them, but not enough information to pose privacy or poisoning threats. The full change providing anti-poisoning involves 2 files changed, 93 insertions(+), 19 deletions(-) , plus tests. We’re almost done! There’s still one thing to fix, and it used to be the hardest part. To get the delayed, collective verification we need, all clients and monitors must see consistent views of the same log, where the log maintains its append-only property. This is called non-equivocation, or split-view protection. In other words, how do we stop the log operator from showing an inclusion proof for log A to a client, and then a different log B to the monitors? Just like logging without a monitoring story is like signing without verification, logging without a non-equivocation story is just a complicated signature algorithm with no strong transparency properties. This is the hard part because in the general case you can’t do it alone . Instead, the tlog ecosystem has the concept of witness cosigners : third-party operated services which cosign a checkpoint to attest that it is consistent with all the other checkpoints the witness observed for that log. Clients check these witness cosignatures to get assurance that—unless a quorum of witnesses is colluding with the log—they are not being presented a split-view of the log. These witnesses are extremely efficient to operate: the log provides the O(log N) consistency proof when requesting a cosignature, and the witness only needs to store the O(1) latest checkpoint it observed. All the potentially intensive verification is deferred and delegated to monitors, which can be sure to have the same view as all clients thanks to the witness cosignatures. This efficiency makes it possible to operate witnesses for free as public benefit infrastructure. The Witness Network collects public witnesses and maintains an open list of tlogs that the witnesses automatically configure. For the Geomys instance of the keyserver, I generated a tlog key and then I sent a PR to the Witness Network to add the following lines to the testing log list. This got my log configured in a handful of witnesses , from which I picked three to build the default keyserver witness policy. The policy format is based on Sigsum’s policies , and it encodes the log’s public key and the witnesses’ public keys (for the clients) and submission URLs (for the log). Tessera supports these policies directly. When minting a new checkpoint, it will reach out in parallel to all the witnesses, and return the checkpoint once it satisfies the policy. Configuration is trivial, and the added latency is minimal (less than one second). On the client side, we can use Torchwood to parse the policy and use it directly with VerifyProof in place of the policy we were manually constructing from the log’s public key. Again, if you squint you can see that just like tlog proofs are spicy signatures , the policy is a spicy public key . Verification is a deterministic, offline function that takes a policy/public key and a proof/signature, just like digital signature verification! The policies are a DAG that can get complex to match even the strictest uptime requirements. For example, you can require 3 out of 10 witness operators to cosign a checkpoint, where each operator can use any 1 out of N witness instances to do so. Note however that in that case you will need to periodically provide to monitors all cosignatures from at least 8 out of 10 operators, to prevent split-views . The full change implementing witnessing involves 5 files changed, 43 insertions(+), 11 deletions(-) , plus tests. We started with a simple centralized email-authenticated 5 keyserver, and we turned it into a transparent, privacy-preserving, anti-poisoning, and witness-cosigned service. We did that in four small steps using Tessera , Torchwood , and various C2SP specifications. Overall, it took less than 500 lines. 7 files changed, 472 insertions(+), 9 deletions(-) The UX is completely unchanged: there are no keys for users to manage, and the web UI and CLI work exactly like they did before. The only difference is the new functionality of the CLI, which allows holding the log operator accountable for all the public keys it could ever have presented for an email address. The result is deployed live at keyserver.geomys.org . This tlog system still has two limitations: To monitor the log, the monitor needs to download it all. This is probably fine for our little keyserver, and even for the Go Checksum Database, but it’s a scaling problem for the Certificate Transparency / Merkle Tree Certificates ecosystem. The inclusion proof guarantees that the public key is in the log, not that it’s the latest entry in the log for that email address. Similarly, the Go Checksum Database can’t efficiently prove the Go Modules Proxy response is complete. We are working on a design called Verifiable Indexes which plugs on top of a tlog to provide verifiable indexes or even map-reduce operations over the log entries. We expect VI to be production-ready before the end of 2026, while everything above is ready today. Even without VI, the tlog provides strong accountability for our keyserver, enabling a secure UX that would have simply not been possible without transparency. I hope this step-by-step demo will help you apply tlogs to your own systems. If you need help, you can join the Transparency.dev Slack . You might also want to follow me on Bluesky at @filippo.abyssdomain.expert or on Mastodon at @[email protected] . Growing up, I used to drive my motorcycle around the hills near my hometown, trying to reach churches I could spot from hilltops. This was one of my favorite spots. Geomys , my Go open source maintenance organization, is funded by Smallstep , Ava Labs , Teleport , Tailscale , and Sentry . Through our retainer contracts they ensure the sustainability and reliability of our open source maintenance work and get a direct line to my expertise and that of the other Geomys maintainers. (Learn more in the Geomys announcement .) Here are a few words from some of them! Teleport — For the past five years, attacks and compromises have been shifting from traditional malware and security breaches to identifying and compromising valid user accounts and credentials with social engineering, credential theft, or phishing. Teleport Identity is designed to eliminate weak access patterns through access monitoring, minimize attack surface with access requests, and purge unused permissions via mandatory access reviews. Ava Labs — We at Ava Labs , maintainer of AvalancheGo (the most widely used client for interacting with the Avalanche Network ), believe the sustainable maintenance and development of open source cryptographic protocols is critical to the broad adoption of blockchain technology. We are proud to support this necessary and impactful work through our ongoing sponsorship of Filippo and his team. age is not really meant to encrypt messages to strangers, nor does it encourage long-term keys. Instead, keys are simple strings that can be exchanged easily through any semi-trusted (i.e. safe against active attackers) channel. Still, a keyserver could be useful in some cases, and it will serve as a decent example for what we are doing today.  ↩ I like to use the SQLite built-in JSON support as a simple document database, to avoid tedious table migrations when adding columns.  ↩ Ok, one thing is special, but it doesn’t have anything to do with transparency. I strongly prefer email magic links that authenticate your original tab, where you have your browsing session history, instead of making you continue in the new tab you open from the email. However, intermediating that flow via a server introduces a phishing risk: if you click the link you risk authenticating the attacker’s session. This implementation uses the JavaScript Broadcast Channel API to pass the auth token locally to the original tab , if it’s open in the same browser, and otherwise authenticates the new tab. Another advantage of this approach is that there are no authentication cookies.  ↩ Someone who stored the VRF for that email address could continue to match the tlog entries, but since we won’t be adding any new entries to the tlog for that email address, they can’t learn anything they didn’t already know.  ↩ Something cool about tlogs is that they are often agnostic to the mechanism by which entries are added to the log. For example, instead of email identities and verification we could have used OIDC identities, with our centralized server checking OIDC bearer tokens, held accountable by the tlog. Everything would have worked exactly the same.  ↩ To monitor the log, the monitor needs to download it all. This is probably fine for our little keyserver, and even for the Go Checksum Database, but it’s a scaling problem for the Certificate Transparency / Merkle Tree Certificates ecosystem. The inclusion proof guarantees that the public key is in the log, not that it’s the latest entry in the log for that email address. Similarly, the Go Checksum Database can’t efficiently prove the Go Modules Proxy response is complete. age is not really meant to encrypt messages to strangers, nor does it encourage long-term keys. Instead, keys are simple strings that can be exchanged easily through any semi-trusted (i.e. safe against active attackers) channel. Still, a keyserver could be useful in some cases, and it will serve as a decent example for what we are doing today.  ↩ I like to use the SQLite built-in JSON support as a simple document database, to avoid tedious table migrations when adding columns.  ↩ Ok, one thing is special, but it doesn’t have anything to do with transparency. I strongly prefer email magic links that authenticate your original tab, where you have your browsing session history, instead of making you continue in the new tab you open from the email. However, intermediating that flow via a server introduces a phishing risk: if you click the link you risk authenticating the attacker’s session. This implementation uses the JavaScript Broadcast Channel API to pass the auth token locally to the original tab , if it’s open in the same browser, and otherwise authenticates the new tab. Another advantage of this approach is that there are no authentication cookies.  ↩ Someone who stored the VRF for that email address could continue to match the tlog entries, but since we won’t be adding any new entries to the tlog for that email address, they can’t learn anything they didn’t already know.  ↩ Something cool about tlogs is that they are often agnostic to the mechanism by which entries are added to the log. For example, instead of email identities and verification we could have used OIDC identities, with our centralized server checking OIDC bearer tokens, held accountable by the tlog. Everything would have worked exactly the same.  ↩

0 views
DHH 4 months ago

The O'Saasy License

One of my favorite parts of the early web was how easy it was to see how the front-end was built. Before View Source was ruined by minification, transpiling, and bundling, you really could just right-click on any web page and learn how it was all done. It was glorious. But even back then, this only ever applied to the front-end. At least with commercial applications, the back-end was always kept proprietary. So learning how to write great web applications still meant piecing together lessons from books, tutorials, and hello-world-style code examples, not from production-grade commercial software. The O'Saasy License seeks to remedy that. It's basically the do-whatever-you-want MIT license, but with the commercial rights to run the software as a service (SaaS) reserved for the copyright holder, thus encouraging more code to be open source while allowing the original creators to see a return on their investment. We need more production-grade code to teach juniors and LLMs alike. A view source that extends to the back-end along with the open source invitation to fix bugs, propose features, and run the system yourself for free (if your data requirements or interests maks that a sensible choice over SaaS). This is what we're doing with Fizzy, but now we've also given the O'Saasy License a home to call its own at osaasy.dev. The license is yours to download and apply to any project where it makes sense. I hope to read a lot more production-grade SaaS code as a result!

1 views