Posts in Backend (20 found)

Binding port 0 to avoid port collisions

It's common to spin up a server in a test so that you can do full end-to-end requests of it. It's a very important sort of test, to make sure things work all together. Most of the work I do is in complex web backends, and there's so much risk of not having all the request processing and middleware and setup exactly the same in a mock test... you must do at least some end-to-end tests or you're making a gamble that's going to bite you. And this is great, but you quickly run into a problem: port collisions! This can happen when you run multiple tests at once and all of them start a separate server, and whoops, two have picked the same port. Or it can happen if something else running on your development machine happens to be running on the port you chose. It's annoying when it happens, too, because it's often hard to reproduce. So... how do we fix that? You read the title [1] , so you know where we're going, but let's go there together. There are a few potential solutions to this. Perhaps the most obvious is binding to a port you choose randomly. This will work a lot of the time, but it's going to be flaky. You can drive down the probability of collision, but it's going to happen sometimes. Side note, I think the only thing worse than a test that fails 10% of the time is one that fails 1% of the time. It's not flaky enough to drive urgency for anyone to fix it, but it's flaky enough that in a team context, you will run into this on a daily basis. Ask me how I know. How often you get a collision depends on a lot of factors. How many times do you bind a port in the range? How many other services might bind something in that range? How likely are two things to run concurrently? As a simple example, let's say we pick a random port in the range 9000-9999, and you have 4 concurrent tests that will overlap. If you uniformly sample from this range, then you will have a 1/1000 chance of a collision from the second test, a 2/1000 chance from the third, and a 3/1000 chance from the fourth. Our probability of having no collision is . That means that we have a 0.6% chance of a collision. This isn't horrible, but it's not great! We could also have each test increment the port it picks by 1. I've done this before, and it avoids one set of problems from collisions, but it makes a new problem. Now you're sweeping across the entire range starting from the first port. If you have anything else running on your system that binds in that range, you'll run into a collision! And if you run your entire test suite in parallel, you're much more likely to have a problem now, since they all start at the same port. The problem we've had all along is that we don't have full information. If we know the system state and all the currently open ports, then binding to one that's not in use is an easy problem. And you know who knows all that info? The kernel does. And it turns out, this is something we can ask the kernel for. We can just say "please give me a nice unused port" and it will! There's a range of ports that the kernel uses for this. It varies by system, but it's not usually very relevant what the particular range is. On my system, I can find the range by checking . My ephemeral port range is from 32768 to 60999. I'm curious why the range stops there instead of going all the way up, so that's a future investigation. To get an ephemeral port on Linux systems, you bind or listen on port 0 . Then the kernel will hand you back a port in the ephemeral range. And you know that it's available, since the kernel is keeping track. It's possible to have an issue here if the full range of ports has been exhausted but, you know what, if you hit that limit, you probably have other problems [2] . The only thing is that if you've bound to an unknown port, how do you send requests to it? We can get the port we've bound to by another syscall, . This lets us find out what address a socket is bound to, and then we can do something with that information. For tests, that means that you'll need to find a way to communicate this port from the listener to the requester. If they're in the same process, I like to do this by either injecting in the listener or returning the address. If you're doing something like postgres or redis on an ephemeral port, then you'd probably have to find the port from its output, which is tedious but doable. Here's an example from a web app I'm working on. This is how a simple test looks. We launch the web server, binding to port 0, and get the address back. Then we can send requests to that address! And inside , the relevant two lines are: ...where in our case. That's all we have to do, and we'll get a much more reliable test setup. I think suspenseful titles can be fun, improve storytelling, and drive attention. But sometimes you really need a clear, honest, spoiler of a title. Giving away the answer is great when you're giving information that people might want to quickly internalize. ↩ If you do run into this, I'm very curious to hear about the circumstances. It's the kind of problem that I'd love to look at and work on. It's kind of messy, and you know that there's something very interesting that led to it being this way. ↩ I think suspenseful titles can be fun, improve storytelling, and drive attention. But sometimes you really need a clear, honest, spoiler of a title. Giving away the answer is great when you're giving information that people might want to quickly internalize. ↩ If you do run into this, I'm very curious to hear about the circumstances. It's the kind of problem that I'd love to look at and work on. It's kind of messy, and you know that there's something very interesting that led to it being this way. ↩

0 views
Jeremy Daly 1 weeks ago

Context Engineering for Commercial Agent Systems

Memory, Isolation, Hardening, and Multi-Tenant Context Infrastructure

0 views
iDiallo 2 weeks ago

Last year, all my non-programmer friends built apps

Last year, all my non-programmer friends were building apps. Yet today, those apps are nowhere to be found. Everyone followed the ads. They signed up for Lovable and all the fancy app-building services that exist. My LinkedIn feed was filled with PMs who had discovered new powers. Some posted bullet-point lists of "things to do to be successful with AI." "Don't work hard, work smart," they said, as if it were a deep insight. I must admit, I was a bit jealous. With a full-time job, I don't get to work on my cool side project, which has collected enough dust to turn into a dune. There's probably a little mouse living inside. I'll call him Muad'Dib. What was I talking about? Right. The apps. Today, my friends are silent. I still see the occasional post on LinkedIn, but they don't garner the engagement they used to. The app-building AI services still exist, but their customers have paused their subscriptions. Here's a conversation I had recently. A friend had "vibe-coded" an Android app. A platform for building communities around common interests. Biking enthusiasts could start a biking community. Cooking fans could gather around recipes. It was a neat idea. While using the app on his phone, swiping through different pages and watching the slick animations, I felt a bit jealous. Then I asked: "So where is the data stored?" "It's stored on the app," he replied. "I mean, all the user data," I pressed. "Do you use a database on AWS, or any service like that?" We went back and forth while I tried to clarify my question. His vibe-knowing started to show its limits. I felt some relief, my job was safe for now. Joking aside, we talked about servers, app architecture, and even GDPR compliance. These weren't things the AI builder had prepared him for. This conversation happens often now when I check in on friends who vibe-coded their way into developing an app or website. They felt on top of the world when they were getting started. But then they got stuck. An error message they couldn't debug. The service generating gibberish. Requests the AI couldn't understand. How do you build the backend of an app when you don't know what a backend is? And when the tool asks you to sign up for Google Cloud and start paying monthly fees, what are you supposed to do? Another friend wanted to build a newsletter. Right now, ChatGPT told him to set up WordPress and learn about SMTP. These are all good things to learn, but the "S" in SMTP is a lie. It's not that simple. I've been trying to explain to him why the email he is sending from the command line is not reaching his gmail. The AI services that promise to build applications are great at making a storefront you don't want to modify. The moment you start customizing, you run into problems. That's why all Lovable websites look exactly the same. These services continue to exist. The marketing is still effective. But few people end up with a product that actually solves their problems. My friends spent money on these services. They were excited to see a polished brochure. The problem is, they didn't know what it takes to actually run an app. The AI tools are amazing at generating the visible 20% of an app. But the remaining invisible 80% is where the actual work is. The infrastructure, the security, maintenance, scaling issues, and then the actual cost. The free tier on AWS doesn't last forever. And neither does your enthusiasm when you start paying $200/month for a hobby project. My friends' experiments weren't failures. They learned something valuable. Some now understand why developers get paid what they do. Some even started taking programming bootcamp. But the rest have moved on. Their app sits dormant in an abandoned github repo. Their domain will probably expire this year. They're back to their day jobs, a little wiser about the difference between a demo and a product. Their LinkedIn profiles are quieter now, they have stopped posting about "working smart, not hard." As for me, I should probably check on Muad'Dib. That side project isn't going to build itself. AI or no AI.

1 views
Justin Duke 4 weeks ago

Brief notes on migrating to Postgres-backed jobs

It seems premature to talk about a migration that is only halfway done, even if it's the hard half that's done — but I think there's something useful in documenting the why and how of a transition while you're still in the thick of it, before the revisionist history of completion sets in. Early last year, we built out a system for running background jobs directly against Postgres within Django. This very quickly got abstracted out into a generic task runner — shout out to Brandur and many other people who have been beating this drum for a while. And as far as I can tell, this concept of shifting away from Redis and other less-durable caches for job infrastructure is regaining steam on the Rails side of the ecosystem, too. The reason we did it was mostly for ergonomics around graceful batch processing. It is significantly easier to write a poller in Django for stuff backed by the ORM than it is to try and extend RQ or any of the other task runner options that are Redis-friendly. Django gives you migrations, querysets, admin visibility, transactional guarantees — all for free, all without another moving part. And as we started using it and it proved stable, we slowly moved more and more things over to it. At the time of this writing, around half of our jobs by quantity — which represent around two-thirds by overall volume — have been migrated over from RQ onto this system. This is slightly ironic given that we also last year released django-rq-cron , a library that, if I have my druthers, we will no longer need. Fewer moving parts is the watchword. We're removing spindles from the system and getting closer and closer to a simple, portable, and legible stack of infrastructure.

1 views
Preah's Website 1 months ago

BlogLog January 30 2026

Subscribe via email or RSS Updated Feeds page to format feeds list as a table instead of bulleted list for a cleaner appearance. Updated conversion script to use this change.

0 views
devansh 1 months ago

HonoJS JWT/JWKS Algorithm Confusion

After spending some time looking for security issues in JS/TS frameworks , I moved on to Hono - fast, clean, and popular enough that small auth footguns can become "big internet problems". This post is about two issues I found in Hono's JWT/JWKS verification path: Both were fixed in hono 4.11.4 , and GitHub Security Advisories were published on January 13, 2026 . If you already have experience with JWT stuff, you can skip this: The key point here is that, algorithm choice must not be attacker-controlled. Hono's JWT helper documents that is optional - and defaults to HS256. That sounds harmless until you combine it with a very common real-world setup: In that case, the verification path defaults to HS256, treating that public key string as an HMAC secret, and that becomes forgeable because public keys are, well… public. If an attacker can generate a token that passes verification, they can mint whatever claims the application trusts ( , , , etc.) and walk straight into protected routes. This is the "algorithm confusion" class of bugs, where you think you're doing asymmetric verification, but you're actually doing symmetric verification with a key the attacker knows. This is configuration-dependent. The dangerous case is: The core issue is, Hono defaults to , so a public key string can accidentally be used as an HMAC secret, allowing forged tokens and auth bypass. Advisory: GHSA-f67f-6cw9-8mq4 This was classified as High (CVSS 8.2) and maps it to CWE-347 (Improper Verification of Cryptographic Signature) . Affected versions: Patched version: 4.11.4 In the JWK/JWKS verification middleware, Hono could pick the verification algorithm like this: GitHub's advisory spells it out, when the selected JWK doesn't explicitly define an algorithm, the middleware falls back to using the from the unverified JWT header - and since in JWK is optional and commonly omitted, this becomes a real-world issue. If the matching JWKS key lacks , falls back to token-controlled , enabling algorithm confusion / downgrade attacks. "Trusting " is basically letting the attacker influence how you verify the signature. Depending on surrounding constraints (allowed algorithms, how keys are selected, and how the app uses claims), this can lead to forged tokens being accepted and authz/authn bypass . Advisory: GHSA-3vhc-576x-3qv4 This was classified as High (CVSS 8.2) , also CWE-347 , with affected versions and patched in 4.11.4 . Both advisories took the same philosophical stance i.e. Make explicit. Don't infer it from attacker-controlled input. The JWT middleware now requires an explicit option — a breaking change that forces callers to pin the algorithm instead of relying on defaults. Before (vulnerable): After (patched): (Example configuration shown in the advisory.) The JWK/JWKS middleware now requires an explicit allowlist of asymmetric algorithms, and it no longer derives the algorithm from untrusted JWT header values. It also explicitly rejects symmetric HS* algorithms in this context. Before (vulnerable): After (patched): (Example configuration shown in the advisory.) JWT / JWK / JWKS Primer Vulnerabilities [CVE-2026-22817] - JWT middleware "unsafe default" (HS256) Why this becomes an auth bypass Who is affected? Advisory / severity [CVE-2026-22817] - JWK/JWKS middleware fallback Why it matters Advisory / severity The Fix Fix for #1 (JWT middleware) Fix for #2 (JWK/JWKS middleware) Disclosure Timeline a default algorithm footgun in the JWT middleware that can lead to forged tokens if an app is misconfigured a JWK/JWKS algorithm selection bug where verification could fall back to an untrusted value JWT is . The header includes (the signing algorithm). JWK is a JSON representation of a key (e.g. an RSA public key). JWKS is a set of JWKs, usually hosted at something like . The app expects RS256 (asymmetric) The developer passes an RSA public key string But they don't explicitly set you use the JWT middleware with an asymmetric public key and you don't pin Use if present Otherwise, fall back to from the JWT (unverified input) Discovery: 09th Dec, 2025 First Response: 09th Dec, 2025 Patched in: hono 4.11.4 Advisories published: 13 Jan, 2026 Advisory: GHSA-f67f-6cw9-8mq4 Advisory: GHSA-3vhc-576x-3qv4

0 views
Grumpy Gamer 1 months ago

Hugo comments

I’ve been cleaning up my comments script for hugo and am about ready to upload it to Github. I added an option to use flat files or sqlite and it can notify Discord (and probably other services) when a comment is added. It’s all one php file. The reason I’m telling you this is to force myself to actually do it. Otherwise there would be “one more thing” and I’d never do it. I was talking to a game dev today about how to motivate yourself to get things done on your game. We both agreed publicly making promises is a good way.

0 views
Farid Zakaria 2 months ago

Huge binaries: papercuts and limits

In a previous post , I synthetically built a program that demonstrated a relocation overflow for a instruction. However, the demo required I add to disable some additional data that might cause other overflows for the purpose of this demonstration. What’s going on? 🤔 This is a good example that only a select few are facing the size-pressure of massive binaries. Even with which already is beginning to articulate to the compiler & linker: “Hey, I expect my binary to be pretty big.”; there are surprising gaps where the linker overflows. On Linux, an ELF binary includes many other sections beyond text and data necessary for code execution. Notably there are sections included for debugging (DWARF) and language-specific sections such as which is used by C++ to help unwind the stack on exceptions. Turns out that even with you might still run into overflow errors! 🤦🏻‍♂️ Note Funny enough, there is a very recent opened issue for this with LLVM #172777 ; perfect timing! For instance, assumes 32-bit values regardless of the code model. There are similar 32-bit assumptions in the data-structure of as well. I also mentioned earlier about a pattern about using multiple GOT, Global Offset Tables, to also avoid the 31-bit (±2GiB) relative offset limitation. Is there even a need for the large code-model? How far can that take us before we are forced to use the large code-model? Let’s think about it: First, let’s think about any limit due to overflow accessing the multiple GOTs. Let’s say we decide to space out our duplicative GOT every 1.5GiB. That means each GOT can grow at most 500MiB before there could exist a instruction from the code section that would result in an overflow. Each GOT entry is 8 bytes, a 64bit pointer. That means we have roughly ~65 million possible entries. A typical GOT relocation looks like the following and it requires 9 bytes: 7 bytes for the and 2 bytes for . That means we have 1.5GiB / 9 = ~178 million possible unique relocations. So theoretically, we can require more unique symbols in our code section than we can fit in the nearest GOT, and therefore cause a relocation overflow. 💥 The same problem exists for thunks, since the thunk is larger than the relative call in bytes. At some point, there is no avoiding the large code-model, however with multiple GOTs, thunks and other linker optimizations (i.e. LTO, relaxation), we have a lot of headroom before it’s necessary. 🕺🏻

0 views
matklad 2 months ago

The Second Great Error Model Convergence

I feel like this has been said before, more than once, but I want to take a moment to note that most modern languages converged to the error management approach described in Joe Duffy’s The Error Model , which is a generational shift from the previous consensus on exception handling. C++, JavaScript, Python, Java, C# all have roughly equivalent , , constructs with roughly similar runtime semantics and typing rules. Even functional languages like Haskell, OCaml, and Scala feature exceptions prominently in their grammar, even if their usage is frowned upon by parts of the community. But the same can be said about Go, Rust, Swift, and Zig! Their error handling is similar to each other, and quite distinct from the previous bunch, with Kotlin and Dart being notable, ahem, exceptions. Here are some commonalities of modern error handling: First , and most notably, functions that can fail are annotated at the call side. While the old way looked like this: the new way is There’s a syntactic marker alerting the reader that a particular operation is fallible, though the verbosity of the marker varies. For the writer, the marker ensures that changing the function contract from infallible to fallible (or vice versa) requires changing not only the function definition itself, but the entire call chain. On the other hand, adding a new error condition to a set of possible errors of a fallible function generally doesn’t require reconsidering rethrowing call-sites. Second , there’s a separate, distinct mechanism that is invoked in case of a detectable bug. In Java, index out of bounds or null pointer dereference (examples of programming errors) use the same language machinery as operational errors. Rust, Go, Swift, and Zig use a separate panic path. In Go and Rust, panics unwind the stack, and they are recoverable via a library function. In Swift and Zig, panic aborts the entire process. Operational error of a lower layer can be classified as a programming error by the layer above, so there’s generally a mechanism to escalate an erroneous result value to a panic. But the opposite is more important: a function which does only “ordinary” computations can be buggy, and can fail, but such failures are considered catastrophic and are invisible in the type system, and sufficiently transparent at runtime. Third , results of fallible computation are first-class values, as in Rust’s . There’s generally little type system machinery dedicated exclusively to errors and expressions are just a little more than syntax sugar for that little Go spell. This isn’t true for Swift, which does treat errors specially. For example, the generic function has to explicitly care about errors, and hard-codes the decision to bail early: Swift does provide first-classifier type for errors. Should you want to handle an exception, rather than propagate it, the handling is localized to a single throwing expression to deal with a single specific errors, rather than with any error from a block of statements: Swift again sticks to more traditional try catch, but, interestingly, Kotlin does have expressions. The largest remaining variance is in what the error value looks like. This still feels like a research area. This is a hard problem due to a fundamental tension: The two extremes are well understood. For exhaustiveness, nothing beats sum types ( s in Rust). This I think is one of the key pieces which explains why the pendulum seemingly swung back on checked exceptions. In Java, a method can throw one of the several exceptions: Critically, you can’t abstract over this pair. The call chain has to either repeat the two cases, or type-erase them into a superclass, losing information. The former has a nasty side-effect that the entire chain needs updating if a third variant is added. Java-style checked exceptions are sensitive to “N to N + 1” transitions. Modern value-oriented error management is only sensitive to “0 to 1” transition. Still, if I am back to writing Java at any point, I’d be very tempted to standardize on coarse-grained signature for all throwing methods. This is exactly the second well understood extreme: there’s a type-erased universal error type, and the “throwableness” of a function contains one bit of information. We only care if the function can throw, and the error itself can be whatever. You still can downcast dynamic error value handle specific conditions, but the downcasting is not checked by the compiler. That is, downcasting is “save” and nothing will panic in the error handling mechanism itself, but you’ll never be sure if the errors you are handling can actually arise, and whether some errors should be handled, but aren’t. Go and Swift provide first-class universal errors, like Midori. Starting with Swift 4, you can also narrow the type down. Rust doesn’t really have super strong conventions about the errors, but it started with mostly enums, and then and shone spotlight on the universal error type. But overall, it feels like “midpoint” error handling is poorly served by either extreme. In larger applications, you sorta care about error kinds, and there are usually a few place where it is pretty important to be exhaustive in your handling, but threading necessary types to those few places infects the rest of the codebases, and ultimately leads to “a bag of everything” error types with many “dead” variants. Zig makes an interesting choice of assuming mostly closed-world compilation model, and relying on cross-function inference to learn who can throw what. What I find the most fascinating about the story is the generational aspect. There really was a strong consensus about exceptions, and then an agreement that checked exceptions are a failure , and now, suddenly, we are back to “checked exceptions” with a twist, in the form of “errors are values” philosophy. What happened between the lull of the naughts and the past decade industrial PLT renaissance? On the one hand, at lower-levels you want to exhaustively enumerate errors to make sure that: internal error handling logic is complete and doesn’t miss a case, public API doesn’t leak any extra surprise error conditions. On the other hand, at higher-levels, you want to string together widely different functionality from many separate subsystems without worrying about specific errors, other than: separating fallible functions from infallible, ensuring that there is some top-level handler to show a 500 error or an equivalent.

0 views

Does the Internet know what time is it?

Time is one of those things that is significantly harder to deal with than you’d naively expect. Its common in computing to assume that computers know the current time. After all, there are protocols like NTP for synchronizing computer clocks, and they presumably work well and are widely used. Practically speaking, what kinds of hazards lie hidden here? I’ll start this post with some questions: Some quick definitions: I just checked the system time of my laptop against time.gov , which reports a -0.073s offset. So for a N=1 sample size, I’m cautiously optimistic. There are research papers, like Spanner, TrueTime & The CAP Theorem , that describe custom systems that rely on atomic clocks and GPS to provide clock services with very low, bounded error. While these are amazing feats of engineering, they remain out of reach for most applications. What if we needed to build a system that spanned countless computers across the Internet and required each to have a fairly accurate clock? I wasn’t able to find a study that measured clock offset in this way. There are, however, a number of studies that measure clock skew (especially for fingerprinting). Many of these studies are dated, so it seems like now is a good time for a new measurement. This post is my attempt to measure clock offsets, Internet-wide. When processing HTTP requests, servers fill the HTTP Date header . This header should indicate “the date and time at which the message originated”. Lots of web servers generate responses on-the-fly, so the Date header reveals the server’s clock in seconds. Looks pretty good. I’ll use this as the basis for the measurements. Unfortunately, there are a bunch of challenges we’ll need to deal with. First, resources may get cached in a CDN for some time and the Date header would reflect when the resource was generated instead of the server’s current time reference. Requesting a randomized path will bypass the CDN, typically generating a 404 error. Unfortunately, I found some servers will set the Date header to the last modified time of the 404 page template. I considered performing multiple lookups to see how the Date header advances between requests, but some websites are distributed, so we’d be measuring a different system’s clock with each request. The safest way to avoid this hazard is to only consider Date headers that are offset to the future, which is the approach we’ll use. HTTP responses will take some time to generate; sometimes spanning a couple seconds. We can’t be sure when the Date header was filled, but we know it was before we got the response. Since we only want to measure timestamps that are from the future, we can subtract the timestamp in the date header from when we received the response. This gives a lower bound for the underlying clock offset. When performing broad Internet scans you’ll find many servers have invalid or expired TLS certificates. For the sake of collecting more data I’ve disabled certificate validations while scanning. Finally, our own system clock has skew. To minimize the effect of local clock skew I made sure I had a synchronization service running (systemd-timesyncd on Debian) and double checked my offset on time.gov. All offset measurements are given in whole seconds, rounding towards zero, to account for this challenge. The measurement tool is mostly a wrapper around this Golang snippet: For performance reasons, the code performs a HTTP HEAD request instead of the heavier GET request. Starting in late-November I scanned all domain names on the Tranco top 1,000,000 domains list (NNYYW) . I scanned slowly to avoid any undesired load on third-party systems, with the scan lasting 25 days. Of the million domain names, 241,570 systems could not be measured due to connection error such as timeout, DNS lookup failure, connection refusal, or similar challenges. Not all the domains on the Tranco list have Internet-accessible HTTPS servers running at the apex on the standard port, so these errors are expected. Further issues included HTTP responses that lacked a Date header (13,098) or had an unparsable Date header (102). In all, 745,230 domain names were successfully measured. The vast majority of the measured domains had an offset of zero (710,189; 95.3%). Date headers set to the future impacted 12,717 domains (1.7%). Date headers set to the past will be otherwise ignored, but impacted 22,324 domains (3.0%). The largest positive offset was 39,867,698 seconds, landing us 461 days in the future (March 2027 at scan time). If we graph this we’ll see that the vast majority of our non-negative offsets are very near zero. We also observe that very large offsets are possible but quite rare. I can’t make out many useful trends from this graph. The large amount of data points near zero seconds skews the vertical scale and the huge offsets skew the horizontal scale. Adjusting the graph to focus on 10 seconds to 86,400 seconds (one day) and switching offsets to a log scale provides this graph: This curve is much closer to my expectations. I can see that small offsets of less than a minute have many observances. One thing I didn’t expect were spikes at intervals of whole hours, but it makes a lot of sense in hindsight. This next graph shows the first day, emphasizing data points that exactly align to whole hour offsets. The largest spikes occur at one, three, and nine hours with no clear trend. Thankfully, geography seems to explain these spikes quite well. Here are the top-level domains (TLDs) of domains seen with exactly one hour offset: Germany (.DE), Czech Republic (.CZ), Sweden (.SE), Norway (.NO), Italy (.IT), and Belgium (.BE) are all currently using Central European Time, which uses offset UTC+1. TLDs of domains seen with exactly three hour offset: The country-code top-level domain (ccTLD) for Russia is .RU and Moscow Standard Time is UTC+3. TLDs of domains with exactly nine hour offset: South Korea (.KR) and Cocos (Keeling) Islands (.CC) follow UTC+9. So I strongly suspect these whole-hour offset spikes are driven by local time zones. These systems seem to have set their UTC time to the local time, perhaps due to an administrator who set the time manually to local time, instead of using UTC and setting their timezone. While this type of error is quite rare, impacting only 49 of the measured domain names (0.007%), the large offsets could be problematic. Another anomalous datapoint at 113 seconds caught my attention. Almost all of the data points at the 113 second offset are for domain names hosted by the same internet service provider using the same IP block. A single server can handle traffic for many domain names, all of which will have the same clock offset. We’ll see more examples of this pattern later. Knowing that we have some anomalous spikes due to shared hosting and spikes at whole hour intervals due to timezone issues, I smoothed out the data to perform modeling. Here’s a graph from zero to fifty-nine minutes, aggregating ten second periods using the median. I added a power-law trend line, which matches the data quite well (R 2 = 0.92). I expected to see a power-law distribution, as these are common when modeling randomized errors, so my intuition feels confirmed. The average clock offset, among those with a non-negative offset, was 6544.8 seconds (about 109 minutes). The median clock offset was zero. As with other power-law distributions, the average doesn’t feel like a useful measure due to the skew of the long tail. The HTTP Date header measurement has proven useful for assessing offsets of modern clocks, but I’m also interested in historical trends. I expect that computers are getting better at keeping clocks synchronized as we get better at building hardware, but can we measure it? I know of some bizarre issues that have popped up over time, like this Windows STS bug , so its even possible we’ve regressed. Historical measurements require us to ask “when was this timestamp generated?” and measure the error. This is obviously tricky as the point of the timestamp is to record the time, but we suspect the timestamp has error. Somehow, we’ve got to find a more accurate time to compare each timestamp against. It took me a while to think of a useful dataset, but I think git commits provide a viable way to measure historical clock offsets. We’ve got to analyze git commit timestamps carefully as there’s lots of ways timestamps can be out of order even when clocks are fully synchronized. Let’s first understand how “author time” and “commit time” work. When you write some code and it, you’ve “authored” the code. The git history at this point will show both an “author time” and “commit time” of the same moment. Later you may merge that code into a “main” branch, which updates the “commit time” to the time of the merge. When you’re working on a team you may see code merged in an order that’s opposite the order it was written, meaning the “author times” can be out of chronological order. The “commit times”, however, should be in order. The Linux kernel source tree is a good candidate for analysis. Linux was one of the first adopters of git, as git was written to help Linux switch source control systems. My local git clone of Linux shows 1,397,347 commits starting from 2005. It may be the largest substantive project using git, and provides ample data for us to detect timestamp-based anomalies. I extracted the timing and other metadata from the git history using: Here’s a graph of the “commit time”, aggregating 1000 commit blocks using various percentiles, showing that commits times are mostly increasing. While there’s evidence of anomalous commit timestamps here, there are too few for us find meaningful trends. Let’s keep looking. Here’s a graph of the “author time” showing much more variation: We should expect to see author times vary, as it takes differing amounts of time for code to be accepted and merged. But there are also large anomalies here, including author times that are decidedly in the future and author times that pre-date both git and Linux. We can get more detail in the graph by zooming into the years Linux has been developed thus far: This graph tells a story about commits usually getting merged quickly, but some taking a long time to be accepted. Certain code taking longer to review is expected, so the descending blue data points are expected. There are many different measurements we could perform here, but I think the most useful will be “author time” minus “commit time”. Typically, we expect that code is developed, committed, reviewed, approved, and finally merged. This provides an author time that is less than the commit time, as review and approval steps take time. A positive value of author time minus commit time would indicate that the code was authored in the future, relative to the commit timestamp. We can’t be sure whether the author time or the commit time was incorrect (or both), but collectively they record a timestamp error. These commits are anomalous as the code was seemingly written, committed, then traveled back in time to be merged. We’ll refer to these commits as time travelling commits, although timestamp errors are very likely the correct interpretation. Looking at the Linux git repo, I see 1,397,347 commits, of which 1,773 are time travelling commits. This is 0.127% of all commits, a somewhat rare occurrence. Here’s a graph of these timestamp errors: There are some fascinating patterns here! Ignoring the marked regions for a moment, I notice that offsets below 100 seconds are rare; this is quite unlike the pattern seen for HTTP Date header analysis. I suspect the challenge is that there is usually a delay between when a commit is authored and when it is merged. Code often needs testing and review before it can be merged; those tasks absorb any small timestamp errors. This will make modeling historical clock offset trends much more difficult. The region marked “A” shows many errors below 100 seconds, especially along linear spikes. There appears to be two committers in this region, both using “de.ibm.com” in their email address. The majority of authors in region A have “ibm.com” in their email address. So these anomalies appear to be largely due to a single company. These commits appear to have the author timestamp rewritten to a (mostly) sequential pattern. Here are the commits for two of the days: The author dates here are perfectly sequential, with one second between each commit. The commit dates also increase, but more slowly, such that the difference between author date and commit date increases with later commits. I suspect these timestamps were set via some sort of automation software when processing a batch of commits. The software may have initially set both author and commit timestamps to the current time, but then incremented the author timestamp by one with each subsequent commit while continuing to use the current time for the commit timestamp. If the software processed commits faster than one per second, we’d see this pattern. I don’t think these timestamps are evidence of mis-set clocks, but rather an automated system with poor timestamp handling code. The region marked “B” shows many errors near a 15.5 hour offset (with several exactly on the half-hour mark). Looking at the email addresses I see several “com.au” domains, suggesting some participants were located in Australia (.AU). Australia uses several time zones, including UTC+8, UTC+8:45, UTC+9:30, UTC+10, UTC+10:30, and UTC+11… but nothing near 15.5 hours. The GitHub profiles for one of the committers shows a current timezone of UTC-5. This suggests that an author in Australia and a committer in the Americas both mis-set their clocks, perhaps combining UTC+10:30 and UTC-5 to to reach the 15.5 hour offset. We saw examples of timezone related clock errors when looking at the HTTP Date header; this appears to be an example of two timezone errors combining. The region marked “C” shows many error around 30 to 260 days, which are unusually large errors. The committer for each of these is the same email address, using the “kernel.org” domain name. If we render the author and committer timestamps we’ll see this pattern: I notice that the day in the author timestamp usually matches the month in the committer timestamp, and when it doesn’t it’s one smaller. When the author day and the committer month match, the author month is less than or the same as the committer day. The days in the author timestamp vary between one and nine, while the days in the commit timestamp vary between eight and twenty-one. This suggests that the author timestamp was set incorrectly, swapping the day and month. Looking at these commits relative to the surrounding commits, the commit timestamps appears accurate. If I fix the author timestamps by swapping the day and month, then the data points are much more reasonable. The author timestamps are no longer after the commit timestamps, with differences varying between zero and thirty-six days, and an average of nine days. So it seems these author timestamps were generated incorrectly, swapping month and day, causing them to appear to travel back in time. Git has had code for mitigating these sorts of issues since 2006, like this code that limits timestamps to ten days in the future . I’m not sure why the commits in region “C” weren’t flagged as erroneous. Perhaps a different code path was used? Region “C” doesn’t appear to be related to a mis-set system clock, but instead a date parsing error that swapped day and month. This type of error is common when working between different locales, as the ordering of month and day in a date varies by country . Finally, the region marked “D” shows a relatively sparse collection of errors. This may suggest that git timestamp related errors are becoming less common. But there’s an analytical hazard here: we’re measuring timestamps that are known to time travel. It’s possible that this region will experience more errors in the future! I suspect region “A” and “C” are due to software bugs, not mis-set clocks. Region “B” may be due to two clocks, both mis-set due to timezone handling errors. It seems unwise to assume that I’ve caught all the anomalies and can attribute the rest of the data points to mis-set clocks. Let’s continue with that assumption anyway, knowing that we’re not on solid ground. The Linux kernel source tree is an interesting code base, but we should look at more projects. This next graph counts positive values of “author time” minus “commit time” for Linux, Ruby, Kubernetes, Git, and OpenSSL. The number of erroneous timestamps is measured per-project against the total commits in each year. It’s difficult to see a trend here. Linux saw the most time travelling commits from 2008 through 2011, each year above 0.4%, and has been below 0.1% since 2015. Git had zero time travelling commits since 2014, with a prior rate below 0.1%. Digging into the raw data I notice that many time travelling commits were generated by the same pair of accounts. For Kubernetes, 78% were authored by [email protected] and merged by [email protected] , although these were only one second in the future. These appear to be due to the “Kubernetes Submit Queue”, where the k8s-merge-robot authors a commit on one system and the merge happens within GitHub. For Ruby, 89% were authored by the same user and committed by [email protected] with an offset near 30 seconds. I attempted to correct for these biases by deduplicating commit-author pairs, but the remaining data points were too sparse to perform meaningful analysis. Time travelling usually reaches its peak two to four years after a project adopts source control, ramping up before, and generally falling after. This hints at a project management related cause to these spikes. I’ll speculate that this is due to developers initially using Git cautiously as it is new to them, then as they get comfortable with Git they begin to build custom automation systems. These new automation systems have bugs or lack well-synchronized clocks, but these issues are addressed over time. I don’t think I can make any conclusion from this data about system clocks being better managed over time. This data doesn’t support my expectation that erroneous timestamps would reduce over time, and I’ll call this a “negative result”. There’s too many challenges in this data set. This analysis explored timestamps impacted by suspected mis-set clocks. HTTP scanning found that 1.7% of domain names had a Date header mis-set to the future. Web server offsets strongly matched a power-law distribution such that small offsets were by far the most common. Git commit analysis found up to 0.65% of commits (Linux, 2009) had author timestamps in the future, relative to the commit timestamp. No clear historical trend was discovered. Timestamps with huge offsets were detected. The largest Linux commit timestamp was in the year 2085 and the largest HTTP Date header was in the year 2027. This shows that while small timestamps were most common, large errors will occur. Many underlying causes were proposed while analyzing the data, including timezone handling errors, date format parsing errors, and timestamps being overwritten by automated systems. Many data points were caused by the same group, like IP address blocks used by many domains or Git users (or robots) interacting with multiple commits. Deduplicating these effects left too few data points to perform trend analysis. Synchronizing computer clocks and working with timestamps remains a challenge for the industry. I’m sure there are other data sets that support this kind of measurement. If you’ve got any, I’d love to hear what trends you can discover! How often are computer clocks set to the wrong time? How large do these offsets grow? Can we model clock offsets, and make predictions about them? Are out-of-sync clocks a historical concern that we’ve largely solved, or is this still a concern? Clock skew : the rate at which a clock deviates from a one-second-per-second standard, often measured in parts per million Clock offset : the difference between the displayed time and Coordinated Universal Time (UTC), often measured in seconds

0 views
Filippo Valsorda 2 months ago

Building a Transparent Keyserver

Today, we are going to build a keyserver to lookup age public keys. That part is boring. What’s interesting is that we’ll apply the same transparency log technology as the Go Checksum Database to keep the keyserver operator honest and unable to surreptitiously inject malicious keys, while still protecting user privacy and delivering a smooth UX. You can see the final result at keyserver.geomys.org . We’ll build it step-by-step, using modern tooling from the tlog ecosystem, integrating transparency in less than 500 lines. I am extremely excited to write this post: it demonstrates how to use a technology that I strongly believe is key in protecting users and holding centralized services accountable, and it’s the result of years of effort by me, the TrustFabric team at Google, the Sigsum team at Glasklar , and many others. This article is being cross-posted on the Transparency.dev Community Blog . Let’s start by defining the goal: we want a secure and convenient way to fetch age public keys for other people and services. 1 The easiest and most usable way to achieve that is to build a centralized keyserver: a web service where you log in with your email address to set your public key, and other people can look up public keys by email address. Trusting the third party that operates the keyserver lets you solve identity, authentication, and spam by just delegating the responsibilities of checking email ownership and implementing rate limiting. The keyserver can send a link to the email address, and whoever receives it is authorized to manage the public key(s) bound to that address. I had Claude Code build the base service , because it’s simple and not the interesting part of what we are doing today. There’s nothing special in the implementation: just a Go server, an SQLite database, 2 a lookup API, a set API protected by a CAPTCHA that sends an email authentication link, 3 and a Go CLI that calls the lookup API. A lot of problems are shaped like this and are much more solvable with a trusted third party: PKIs, package registries, voting systems… Sometimes the trusted third party is encapsulated behind a level of indirection, and we talk about Certificate Authorities, but it’s the same concept. Centralization is so appealing that even the OpenPGP ecosystem embraced it: after the SKS pool was killed by spam , a new OpenPGP keyserver was built which is just a centralized, email-authenticated database of public keys. Its FAQ claims they don’t wish to be a CA, but also explains they don’t support the (dubiously effective) Web-of-Trust at all, so effectively they can only act as a trusted third party. The obvious downside of a trusted third party is, well, trust. You need to trust the operator, but also whoever will control the operator in the future, and also the operator’s security practices. That’s asking a lot, especially these days, and a malicious or compromised keyserver could provide fake public keys to targeted victims with little-to-no chance of detection. Transparency logs are a technology for applying cryptographic accountability to centralized systems with no UX sacrifices. A transparency log or tlog is an append-only, globally consistent list of entries, with efficient cryptographic proofs of inclusion and consistency. The log operator appends entries to the log, which can be tuples like (package, version, hash) or (email, public key) . The clients verify an inclusion proof before accepting an entry, guaranteeing that the log operator will have to stand by that entry in perpetuity and to the whole world, with no way to hide it or disown it. As long as someone who can check the authenticity of the entry will eventually check (or “monitor”) the log, the client can trust that malfeasance will be caught. Effectively, a tlog lets the log operator stake their reputation to borrow time for collective, potentially manual verification of the log’s entries. This is a middle-ground between impractical local verification mechanisms like the Web of Trust , and fully trusted mechanisms like centralized X.509 PKIs. If you’d like a longer introduction, my Real World Crypto 2024 talk presents both the technical functioning and abstraction of modern transparency logs. There is a whole ecosystem of interoperable tlog tools and publicly available infrastructure built around C2SP specifications. That’s what we are going to use today to add a tlog to our keyserver. If you want to catch up with the tlog ecosystem, my 2025 Transparency.dev Summit Keynote maps out the tools, applications, and specifications. If you are familiar with Certificate Transparency, tlogs are derived from CT, but with a few major differences. Most importantly, there is no separate entry producer (in CT, the CAs) and log operator; moreover, clients check actual inclusion proofs instead of SCTs; finally, there are stronger split-view protections, as we will see below. The Static CT API and Sunlight CT log implementation were a first successful step in moving CT towards the tlog ecosystem, and a proposed design called Merkle Tree Certificates redesigns the WebPKI to have tlog-like and tlog-interoperable transparency. In my experience, it’s best not to think about CT when learning about tlogs. A better production example of a tlog is the Go Checksum Database , where Google logs the module name, version, and hash for every module version observed by the Go Modules Proxy. The module fetches happen over regular HTTPS, so there is no publicly-verifiable proof of their authenticity. Instead, the central party appends every observation to the tlog, so that any misbehavior can be caught. The command verifies inclusion proofs for every module it downloads, protecting 100% of the ecosystem, without requiring module authors to manage keys. Katie Hockman gave a great talk on the Go Checksum Database at GopherCon 2019. You might also have heard of Key Transparency . KT is an overlapping technology that was deployed by Apple, WhatsApp, and Signal amongst others. It has similar goals, but picks different tradeoffs that involve significantly more complexity, in exchange for better privacy and scalability in some settings. Ok, so how do we apply a tlog to our email-based keyserver? It’s pretty simple, and we can do it with a 250-line diff using Tessera and Torchwood . Tessera is a general-purpose tlog implementation library, which can be backed by object storage or a POSIX filesystem. For our keyserver, we’ll use the latter backend, which stores the whole tlog in a directory according to the c2sp.org/tlog-tiles specification. Every time a user sets their key, we append an encoded (email, public key) entry to the tlog, and we store the tlog entry index in the database. The lookup API produces a proof from the index and provides it to the client. The proof follows the c2sp.org/tlog-proof specification. It looks like this and it combines a checkpoint (a signed snapshot of the log at a certain size), the index of the entry in the log, and a proof of inclusion of the entry in the checkpoint. The client CLI receives the proof from the lookup API, checks the signature on the checkpoint from the built-in log public key, hashes the expected entry, and checks the inclusion proof for that hash and checkpoint. It can do all this without interacting further with the log. If you squint, you can see that the proof is really a “fat signature” for the entry, which you verify with the log’s public key, just like you’d verify an Ed25519 or RSA signature for a message. I like to call them spicy signatures to stress how tlogs can be deployed anywhere you can deploy regular digital signatures . What’s the point of all this though? The point is that anyone can look through the log to make sure the keyserver is not serving unauthorized keys for their email address! Indeed, just like backups are useless without restores and signatures are useless without verification , tlogs are useless without monitoring . That means we need to build tooling to monitor the log. On the server side, it takes two lines of code, to expose the Tessera POSIX log directory. On the client side, we add an flag to the CLI that reads all matching entries in the log. To enable effective monitoring, we also normalize email addresses by trimming spaces and lowercasing them, since users are unlikely to monitor all the variations. We do it before sending the login link, so normalization can’t lead to impersonation. A complete monitoring story would involve 3rd party services that monitor the log for you and email you if new keys are added, like gopherwatch and Source Spotter do for the Go Checksum Database, but the flag is a start. The full change involves 5 files changed, 251 insertions(+), 6 deletions(-) , plus tests, and includes a new keygen helper binary, the required database schema and help text and API changes, and web UI changes to show the proof. Edit : the original patch series is missing freshness checks in monitor mode, to ensure the log is not hiding entries from monitors by serving them an old checkpoint. The easiest solution is checking the timestamp on witness cosignatures ( +15 lines ). You will learn about witness cosignatures below. We created a problem by implementing this tlog, though: now all the email addresses of our users are public! While this is ok for module names in the Go Checksum Database, allowing email address enumeration in our keyserver is a non-starter for privacy and spam reasons. We could hash the email addresses, but that would still allow offline brute-force attacks. The right tool for the job is a Verifiable Random Function. You can think of a VRF as a hash with a private and public key: only you can produce a hash value, using the private key, but anyone can check that it’s the correct (and unique) hash value, using the public key. Overall, implementing VRFs takes less than 130 lines using the c2sp.org/vrf-r255 instantiation based on ristretto255 , implemented by filippo.io/mostly-harmless/vrf-r255 (pending a more permanent location). Instead of the email address, we include the VRF hash in the log entry, and we save the VRF proof in the database. The tlog proof format has space for application-specific opaque extra data, so we can store the VRF proof there, to keep the tlog proof self-contained. In the client CLI, we extract the VRF hash from the tlog proof’s extra data and verify it’s the correct hash for the email address. How do we do monitoring now, though? We need to add a new API that provides the VRF hash (and proof) for an email address. On the client side, we use that API to obtain the VRF proof, we verify it, and we look for the VRF hash in the log instead of looking for the email address. Attackers can still enumerate email addresses by hitting the public lookup or monitor API, but they’ve always been able to do that: serving such a public API is the point of the keyserver! With VRFs, we restored the original status quo: enumeration requires brute-forcing the online, rate-limited API, instead of having a full list of email addresses in the tlog (or hashes that can be brute-forced offline). VRFs have a further benefit: if a user requests to be deleted from the service, we can’t remove their entries from the tlog, but we can stop serving the VRF for their email address 4 from the lookup and monitor APIs. This makes it impossible to obtain the key history for that user, or even to check if they ever used the keyserver, but doesn’t impact monitoring for other users. The full change adding VRFs involves 3 files changed, 125 insertions(+), 13 deletions(-) , plus tests. We have one last marginal risk to mitigate: since we can’t ever remove entries from the tlog, what if someone inserts some unsavory message in the log by smuggling it in as a public key, like ? Protecting against this risk is called anti-poisoning . The risk to our log is relatively small, public keys have to be Bech32-encoded and short, so an attacker can’t usefully embed images or malware. Still, it’s easy enough to neutralize it: instead of the public keys, we put their hashes in the tlog entry, keeping the original public keys in a new table in the database, and serving them as part of the monitor API. It’s very important that we persist the original key in the database before adding the entry to the tlog. Losing the original key would be indistinguishable from refusing to provide a malicious key to monitors. On the client side, to do a lookup we just hash the public key when verifying the inclusion proof. To monitor in mode, we match the hashes against the list of original public keys provided by the server through the monitor API. Our final log entry format is . Designing the tlog entry is the most important part of deploying a tlog: it needs to include enough information to let monitors isolate all the entries relevant to them, but not enough information to pose privacy or poisoning threats. The full change providing anti-poisoning involves 2 files changed, 93 insertions(+), 19 deletions(-) , plus tests. We’re almost done! There’s still one thing to fix, and it used to be the hardest part. To get the delayed, collective verification we need, all clients and monitors must see consistent views of the same log, where the log maintains its append-only property. This is called non-equivocation, or split-view protection. In other words, how do we stop the log operator from showing an inclusion proof for log A to a client, and then a different log B to the monitors? Just like logging without a monitoring story is like signing without verification, logging without a non-equivocation story is just a complicated signature algorithm with no strong transparency properties. This is the hard part because in the general case you can’t do it alone . Instead, the tlog ecosystem has the concept of witness cosigners : third-party operated services which cosign a checkpoint to attest that it is consistent with all the other checkpoints the witness observed for that log. Clients check these witness cosignatures to get assurance that—unless a quorum of witnesses is colluding with the log—they are not being presented a split-view of the log. These witnesses are extremely efficient to operate: the log provides the O(log N) consistency proof when requesting a cosignature, and the witness only needs to store the O(1) latest checkpoint it observed. All the potentially intensive verification is deferred and delegated to monitors, which can be sure to have the same view as all clients thanks to the witness cosignatures. This efficiency makes it possible to operate witnesses for free as public benefit infrastructure. The Witness Network collects public witnesses and maintains an open list of tlogs that the witnesses automatically configure. For the Geomys instance of the keyserver, I generated a tlog key and then I sent a PR to the Witness Network to add the following lines to the testing log list. This got my log configured in a handful of witnesses , from which I picked three to build the default keyserver witness policy. The policy format is based on Sigsum’s policies , and it encodes the log’s public key and the witnesses’ public keys (for the clients) and submission URLs (for the log). Tessera supports these policies directly. When minting a new checkpoint, it will reach out in parallel to all the witnesses, and return the checkpoint once it satisfies the policy. Configuration is trivial, and the added latency is minimal (less than one second). On the client side, we can use Torchwood to parse the policy and use it directly with VerifyProof in place of the policy we were manually constructing from the log’s public key. Again, if you squint you can see that just like tlog proofs are spicy signatures , the policy is a spicy public key . Verification is a deterministic, offline function that takes a policy/public key and a proof/signature, just like digital signature verification! The policies are a DAG that can get complex to match even the strictest uptime requirements. For example, you can require 3 out of 10 witness operators to cosign a checkpoint, where each operator can use any 1 out of N witness instances to do so. Note however that in that case you will need to periodically provide to monitors all cosignatures from at least 8 out of 10 operators, to prevent split-views . The full change implementing witnessing involves 5 files changed, 43 insertions(+), 11 deletions(-) , plus tests. We started with a simple centralized email-authenticated 5 keyserver, and we turned it into a transparent, privacy-preserving, anti-poisoning, and witness-cosigned service. We did that in four small steps using Tessera , Torchwood , and various C2SP specifications. Overall, it took less than 500 lines. 7 files changed, 472 insertions(+), 9 deletions(-) The UX is completely unchanged: there are no keys for users to manage, and the web UI and CLI work exactly like they did before. The only difference is the new functionality of the CLI, which allows holding the log operator accountable for all the public keys it could ever have presented for an email address. The result is deployed live at keyserver.geomys.org . This tlog system still has two limitations: To monitor the log, the monitor needs to download it all. This is probably fine for our little keyserver, and even for the Go Checksum Database, but it’s a scaling problem for the Certificate Transparency / Merkle Tree Certificates ecosystem. The inclusion proof guarantees that the public key is in the log, not that it’s the latest entry in the log for that email address. Similarly, the Go Checksum Database can’t efficiently prove the Go Modules Proxy response is complete. We are working on a design called Verifiable Indexes which plugs on top of a tlog to provide verifiable indexes or even map-reduce operations over the log entries. We expect VI to be production-ready before the end of 2026, while everything above is ready today. Even without VI, the tlog provides strong accountability for our keyserver, enabling a secure UX that would have simply not been possible without transparency. I hope this step-by-step demo will help you apply tlogs to your own systems. If you need help, you can join the Transparency.dev Slack . You might also want to follow me on Bluesky at @filippo.abyssdomain.expert or on Mastodon at @[email protected] . Growing up, I used to drive my motorcycle around the hills near my hometown, trying to reach churches I could spot from hilltops. This was one of my favorite spots. Geomys , my Go open source maintenance organization, is funded by Smallstep , Ava Labs , Teleport , Tailscale , and Sentry . Through our retainer contracts they ensure the sustainability and reliability of our open source maintenance work and get a direct line to my expertise and that of the other Geomys maintainers. (Learn more in the Geomys announcement .) Here are a few words from some of them! Teleport — For the past five years, attacks and compromises have been shifting from traditional malware and security breaches to identifying and compromising valid user accounts and credentials with social engineering, credential theft, or phishing. Teleport Identity is designed to eliminate weak access patterns through access monitoring, minimize attack surface with access requests, and purge unused permissions via mandatory access reviews. Ava Labs — We at Ava Labs , maintainer of AvalancheGo (the most widely used client for interacting with the Avalanche Network ), believe the sustainable maintenance and development of open source cryptographic protocols is critical to the broad adoption of blockchain technology. We are proud to support this necessary and impactful work through our ongoing sponsorship of Filippo and his team. age is not really meant to encrypt messages to strangers, nor does it encourage long-term keys. Instead, keys are simple strings that can be exchanged easily through any semi-trusted (i.e. safe against active attackers) channel. Still, a keyserver could be useful in some cases, and it will serve as a decent example for what we are doing today.  ↩ I like to use the SQLite built-in JSON support as a simple document database, to avoid tedious table migrations when adding columns.  ↩ Ok, one thing is special, but it doesn’t have anything to do with transparency. I strongly prefer email magic links that authenticate your original tab, where you have your browsing session history, instead of making you continue in the new tab you open from the email. However, intermediating that flow via a server introduces a phishing risk: if you click the link you risk authenticating the attacker’s session. This implementation uses the JavaScript Broadcast Channel API to pass the auth token locally to the original tab , if it’s open in the same browser, and otherwise authenticates the new tab. Another advantage of this approach is that there are no authentication cookies.  ↩ Someone who stored the VRF for that email address could continue to match the tlog entries, but since we won’t be adding any new entries to the tlog for that email address, they can’t learn anything they didn’t already know.  ↩ Something cool about tlogs is that they are often agnostic to the mechanism by which entries are added to the log. For example, instead of email identities and verification we could have used OIDC identities, with our centralized server checking OIDC bearer tokens, held accountable by the tlog. Everything would have worked exactly the same.  ↩ To monitor the log, the monitor needs to download it all. This is probably fine for our little keyserver, and even for the Go Checksum Database, but it’s a scaling problem for the Certificate Transparency / Merkle Tree Certificates ecosystem. The inclusion proof guarantees that the public key is in the log, not that it’s the latest entry in the log for that email address. Similarly, the Go Checksum Database can’t efficiently prove the Go Modules Proxy response is complete. age is not really meant to encrypt messages to strangers, nor does it encourage long-term keys. Instead, keys are simple strings that can be exchanged easily through any semi-trusted (i.e. safe against active attackers) channel. Still, a keyserver could be useful in some cases, and it will serve as a decent example for what we are doing today.  ↩ I like to use the SQLite built-in JSON support as a simple document database, to avoid tedious table migrations when adding columns.  ↩ Ok, one thing is special, but it doesn’t have anything to do with transparency. I strongly prefer email magic links that authenticate your original tab, where you have your browsing session history, instead of making you continue in the new tab you open from the email. However, intermediating that flow via a server introduces a phishing risk: if you click the link you risk authenticating the attacker’s session. This implementation uses the JavaScript Broadcast Channel API to pass the auth token locally to the original tab , if it’s open in the same browser, and otherwise authenticates the new tab. Another advantage of this approach is that there are no authentication cookies.  ↩ Someone who stored the VRF for that email address could continue to match the tlog entries, but since we won’t be adding any new entries to the tlog for that email address, they can’t learn anything they didn’t already know.  ↩ Something cool about tlogs is that they are often agnostic to the mechanism by which entries are added to the log. For example, instead of email identities and verification we could have used OIDC identities, with our centralized server checking OIDC bearer tokens, held accountable by the tlog. Everything would have worked exactly the same.  ↩

0 views
DHH 2 months ago

The O'Saasy License

One of my favorite parts of the early web was how easy it was to see how the front-end was built. Before View Source was ruined by minification, transpiling, and bundling, you really could just right-click on any web page and learn how it was all done. It was glorious. But even back then, this only ever applied to the front-end. At least with commercial applications, the back-end was always kept proprietary. So learning how to write great web applications still meant piecing together lessons from books, tutorials, and hello-world-style code examples, not from production-grade commercial software. The O'Saasy License seeks to remedy that. It's basically the do-whatever-you-want MIT license, but with the commercial rights to run the software as a service (SaaS) reserved for the copyright holder, thus encouraging more code to be open source while allowing the original creators to see a return on their investment. We need more production-grade code to teach juniors and LLMs alike. A view source that extends to the back-end along with the open source invitation to fix bugs, propose features, and run the system yourself for free (if your data requirements or interests maks that a sensible choice over SaaS). This is what we're doing with Fizzy, but now we've also given the O'Saasy License a home to call its own at osaasy.dev. The license is yours to download and apply to any project where it makes sense. I hope to read a lot more production-grade SaaS code as a result!

1 views
Binary Igor 2 months ago

Authentication: who are you? Proofs are passwords, codes and keys

In many systems, various actions can only be performed as some kind of Identity. We must authenticate ourselves by proving who we are. Authentication fundamentally is just an answer to this question: who are you and can you prove it is true?

0 views
matduggan.com 2 months ago

SQLite for a REST API Database?

When I wrote the backend for my Firefox time-wasting extension ( here ), I assumed I was going to be setting up Postgres. My setup is boilerplate and pretty boring, with everything running in Docker Compose for personal projects and then persistence happening in volumes. However when I was working with it locally, I obviously used SQLite since that's always the local option that I use. It's very easy to work with, nice to back up and move around and in general is a pleasure to work with. As I was setting up the launch, I realized I really didn't want to set up a database. There's nothing wrong with having a Postgres container running, but I'd like to skip it if its possible. So my limited understanding of SQLite before I started this was "you can have one writer and many readers". I had vaguely heard of SQLite "WAL" but my understanding of WAL is more in the context of shipping WAL between database servers. You have one primary, many readers, you ship WAL to from the primary to the readers and then you can promote a reader to the primary position once it has caught up on WAL. My first attempt at setting up SQLite for a REST API died immediately in exactly this way. So by default SQLite: This seems to be caused by SQLite having a rollback journal and using strict locking. Which makes perfect sense for the use-case that SQLite is typically used for, but I want to abuse that setup for something it is not typically used for. So after doing some Googling I ended up with these as the sort of "best recommended" options. I'm 95% sure I copy/pasted the entire block. What is this configuration doing. However my results from load testing sucked. Now this is under heavy load (simulating 1000 active users making a lot of requests at the same time, which is more than I've seen), but still this is pretty bad. The cause of it was, of course, my fault. My "blacklist" is mostly just sites that publish a ton of dead links. However I had the setup wrong and was making a database query per website to see if it matched the black list. Stupid mistake. Once I fixed that. Great! Or at least "good enough from an unstable home internet connection with some artificial packet loss randomly inserted". So should you use SQLite as the backend database for a FastAPI setup? Well it depends on how many users you are planning on having. Right now I can handle between 1000 and 2000 requests per second if they're mostly reads, which is exponentially more than I will need for years of running the service. If at some point in the future that no longer works, it's thankfully very easy to migrate off of SQLite onto something else. So yeah overall I'm pretty happy with it as a design. Only one writer at a time Writers block readers during transactions Switches SQLite from rollback journal to Write-Ahead Logging (WAL) Default behavior is Write -> Copy original data to journal -> Modify database -> Delete journal. WAL mode is Write -> Append changes to WAL file -> Periodically checkpoint to main DB So here you have 4 options to toggle for how often SQLite syncs to disk. OFF is SQlite lets the OS handle it. NORMAL is the SQLite engine still syncs, but less often than FULL. WAL mode is safe from corruption with NORMAL typically. FULL uses the Xsync method of the VFS (don't feel bad I've never heard of it before either: https://sqlite.org/vfs.html ) to ensure everything is written to disk before moving forward. EXTRA: I'm not 100% sure what this exactly does but it sounds extra. "EXTRA synchronous is like FULL with the addition that the directory containing a rollback journal is synced after that journal is unlinked to commit a transaction in DELETE mode. EXTRA provides additional durability if the commit is followed closely by a power loss. Without EXTRA, depending on the underlying filesystem, it is possible that a single transaction that commits right before a power loss might get rolled back upon reboot. The database will not go corrupt. But the last transaction might go missing, thus violating durability, if EXTRA is not set." = please wait up to 60 seconds. this one threw me for a loop. Why is it a negative number? If you set it to a positive number, you mean pages. SQLite page size is 4kb by default, so 2000 = 8MB. A negative number means KB which is easier to reason about than pages. I don't really know what a "good" cache_size is here. 64MB feels right given the kind of data I'm throwing around and how small it is, but this is guess work. = write to memory, not disk. Makes sense for speed.

0 views
Karboosx 2 months ago

Building Your Own Web Framework - The Basics

Ever wondered what happens under the hood when you use frameworks like Symfony or Laravel? We'll start building our own framework from scratch, covering the absolute basics - how to handle HTTP requests and responses. This is the foundation that everything else builds on.

0 views
iDiallo 2 months ago

We Should Call Them Macroservices

I love the idea of microservices. When there's a problem on your website, you don't need to fix and redeploy your entire codebase. If the issue only affects your authentication service, you can deploy just that one component and call it a day. You've isolated the authentication feature into an independent microservice that can be managed and maintained on its own. That's the theory. The reality is often different. Microservices are a software architecture style where an application is built as a collection of small, independent, and loosely coupled services that communicate with each other. The "micro" in the name implies they should be small, and they usually start that way. When you first adopt this philosophy, all services are genuinely small and build incredibly fast. At this stage, you start questioning why you ever thought working on a monolith was a good idea. I love working on applications where the time between pushing a change and seeing its effect is minimal. The feedback loop is tight, deployments are quick, and each service feels manageable. But I've worked long enough in companies adopting this style to watch the transformation. Small becomes complex. Fast becomes extremely slow. Cheap becomes resource-intensive. Microservices start small, then they grow. And grow. And the benefits you once enjoyed start to vanish. For example, your authentication service starts with just login and logout. Then you add password reset. Then OAuth integration. Then multi-factor authentication. Then session management improvements. Then API key handling. Before you know it, your "micro" service has ballooned to thousands of lines of code, multiple database tables, and complex business logic. When you find yourself increasing the memory allocation on your Lambda functions by 2x or 3x, you've reached this stage. The service that once spun up in milliseconds now takes seconds to cold start. The deployment that took 30 seconds now takes 5 minutes. If speed were the only issue, I could live with it. But as services grow and get used, they start to depend on one another. When using microservices, we typically need an orchestration layer that consumes those services. Not only does this layer grow over time, but it's common for the microservices themselves to accumulate application logic that isn't easy to externalize. A service that was supposed to be a simple data accessor now contains validation rules, business logic, and workflow coordination. Imagine you're building an e-commerce checkout flow. You might have: Where does the logic live that says "only charge the customer if all items are in stock"? Or "apply the discount before calculating shipping"? This orchestration logic has to live somewhere, and it often ends up scattered across multiple services or duplicated in various places. As microservices grow, it's inevitable that they grow teams around them. A team specializes in managing a service and becomes the domain expert. Not a bad thing on its own, but it becomes an issue when someone debugging a client-side problem discovers the root cause lies in a service only another team understands. A problem that could have been solved by one person now requires coordination, meetings, and permissions to identify and resolve. For example, a customer reports that they're not receiving password reset emails. The frontend developer investigates and confirms the request is being sent correctly. The issue could be: Each of these components is owned by a different team. What should be a 30-minute investigation becomes a day-long exercise in coordination. The feature spans across several microservices, but domain experts only understand how their specific service works. There's a disconnect between how a feature functions end-to-end and the teams that build its components. When each microservice requires an actual HTTP request (or message queue interaction), things get relatively slower. Loading a page that requires data from several dependent services, each taking 50-100 milliseconds, means those latencies quickly compound. Imagine for a second you are displaying a user profile page. Here is the data that's being loaded: If these calls happen sequentially, you're looking at 350ms just for service-to-service communication, before any actual processing happens. Even with parallelization, you're paying the network tax multiple times over. In a monolith, this would be a few database queries totaling perhaps 50ms. There are some real benefits to microservices, especially when you have good observability in place. When a bug is identified via distributed tracing, the team that owns the affected service can take over the resolution process. Independent deployment means that a critical security patch to your authentication service doesn't require redeploying your entire application. Different services can use different technology stacks suited to their specific needs. These address real pain points that people have and is why we are attracted to this architecture in the first place. But Microservices are not a solution to every architectural problem. I always say if everybody is "holding it wrong," then they're not the problem, the design is. Microservices have their advantages, but they're just one option among many architectural patterns. To build a good system, we don't have to exclusively follow one style. Maybe what many organizations actually need isn't microservices at all, but what I'd call "macroservices". Larger, more cohesive service boundaries that group related functionality together. Instead of separate services for user accounts, authentication, and authorization, combine them into an identity service. Instead of splitting notification into separate services for email, SMS, and push notifications, keep them together where the shared logic and coordination naturally lives. The goal should be to draw service boundaries around business capabilities and team ownership, not around technical functions. Make your services large enough that a feature can live primarily within one service, but small enough that a team can own and understand the entire thing. Microservices promised us speed and independence. What many of us got instead were distributed monoliths, all the complexity of a distributed system with all the coupling of a monolith. An inventory service to check stock A pricing service to calculate totals A payment service to process transactions A shipping service to calculate delivery options A notification service to send confirmations The account service isn't triggering the email request properly The email service is failing to send messages The email service is sending to the wrong queue The notification preferences service has the user marked as opted-out The rate limiting service is blocking the request User account details (Account Service: 50ms) Recent orders (Order Service: 80ms) Saved payment methods (Payment Service: 60ms) Personalized recommendations (Recommendation Service: 120ms) Notification preferences (Settings Service: 40ms)

0 views
Jack Vanlightly 2 months ago

The Durable Function Tree - Part 2

In part 1 we covered how durable function trees work mechanically and the importance of function suspension. Now let's zoom out and consider where they fit in broader system architecture, and ask what durable execution actually provides us. Durable function trees are great, but they aren’t the only kid in town. In fact, they’re like the new kid on the block, trying to prove themselves against other more established kids. Earlier this year I wrote Coordinated Progress , a conceptual model exploring how event-driven architecture, stream processing, microservices and durable execution fit into architecture, within the context of multi-step business processes, aka, workflows. I also wrote about responsibility boundaries , exploring how multi-step work is made reliable inside and across boundaries. I’ll revisit that now, with this function tree model in mind. In these works I described how reliable triggers not only initiate work but also establish responsibility boundaries. A reliable trigger could be a message in a queue or a function backed by a durable execution engine. The reliable trigger ensures that the work is retriggered should it fail. Fig 1. A tree of work kicked off by a root reliable trigger, for example a queue message kicks off a consumer that executes a tree of synchronous HTTP calls. Should any downstream nodes fail (despite in situ retries), the whole tree must be re-executed from the top. Where a reliable trigger exists, a new boundary is created, one where that trigger becomes responsible for ensuring the eventual execution of the sub-graph of work downstream of it. A tree of work can be arbitrarily split up into different responsibility boundaries based on the reliable triggers that are planted. Fig 2. Nodes A, B, C, and E form a synchronous flow of execution. Synchronous flows don’t benefit from balkanized responsibility boundaries. Typically, synchronous work involves a single responsibility boundary, where the root caller is the reliable trigger. Nodes D and F are kicked off by messages placed on queues, each functioning as a reliable trigger. Durable function trees also operate in this concept of responsibility boundaries. Each durable function in the tree has its own reliable trigger (managed by the durable execution engine), creating a local fault domain. Fig 3. A durable function tree from part 1 As I explained in part 1 : If func3 crashes, only func3 needs to retry, func2 remains suspended with its promise unresolved, func4 's completed work is preserved, and func1 doesn't even know a failure occurred.  The tree structure creates natural fault boundaries where failures are contained to a single branch and don't cascade upward unless that branch exhausts its retries or reaches a defined timeout. These boundaries are nested like an onion: each function owns its immediate work and the completion of its direct children. Fig 4. A function tree consists of an outer responsibility boundary that wraps nested boundaries based on reliable triggers (one per durable function). When each of these nodes is a fully fledged function (rather than a local-context side effect), A’s boundary encompasses B’s boundary, which in turn encompasses C's and so on. Each function owns its invocation of child functions and must handle their outcomes, but the DEE drives the actual execution of child functions and their retries. This creates a nested responsibility model where parents delegate execution of children to the DEE but remain responsible for reacting to results. In the above figure, if C exhausts retries, that error propagates up to B, which must handle it (perhaps triggering compensation logic) and resolving its promise to A (possibly with an error in turn). Likewise, as errors propagate up, cancellations propagate down the tree. This single outer boundary model contrasts sharply with choreographed, event-driven architectures (EDA) . In choreography, each node in the execution graph has its own reliable trigger, and so each node owns its own recovery. The workflow as a whole emerges from the collective behavior of independent services reacting to events as reliable triggers. Fig 5. The entire execution graph is executed asynchronously, with each node existing in its own boundary with a Kafka topic or queue as its reliable trigger. EDA severs responsibility completely, once the event is published, the producer has no responsibility for consumer outcomes. The Kafka topic itself is the guarantor in its role as the reliable trigger for each consumer that has subscribed to it. This creates fine-grained responsibility boundaries with decoupling. Services can be deployed independently, failures are isolated, and the architecture scales naturally as new event consumers are added. If we zoom into any one node, that might carry out multiple local-context side effects, including the publishing of an event, we can view the boundaries as follows: Fig 6. Each consumer is invoked by a topic event (a reliable trigger) and executes a number of local-context side effects. If a failure occurs in one of the local side effects, the event is not acknowledged and can be processed again. But without durable execution’s memoization , the entire sequence of local side effects inside a boundary must either be idempotent or tolerate multiple executions. This can be more difficult to handle than implementing idempotency or duplication tolerance at the individual side effect level (as with durable execution). The bigger the responsibility boundary, the larger the graph of work it encompasses, the more tightly coupled things get. You can’t wrap an entire architecture in one nested responsibility boundary. As the boundary grows, so does the frequency of change, making coordination and releases increasingly painful. Large function trees are an anti-pattern. The larger the function tree the wider the net of coupling goes, the more reasons for a given workflow to change, with more frequent versioning. The bigger the tree the greater scope for non-determinism to creep in, causing failures and weird behaviors. Ultimately, you can achieve multi-step business processes through other means, such as via queues and topics. You can wire up SpringBoot with annotations and Kafka. We can even wire up compensation steps. Kafka acts as the reliable trigger for each step in the workflow. I think that’s why I see many people asking what durable execution valuable? What is the value-add? I can do reliable workflow already, I can even make it look quite procedural, as each step can be programmed procedurally even if the wider flow is reactive. The way I see it is that: EDA focuses on step-level reliability (each consumer handles retries, each message is durable) with results in step decoupling . Because Kafka is reliable, we can build reliable workflows from reliable steps. Because each node in the graph of work is independent, we get a decoupled architecture. Durable execution focuses on workflow-level reliability. The entire business process is an entity itself (creating step coupling) . It executes from the root function down to the leaves, with visibility and control over the process as a whole. But it comes with the drawback of greater coupling and the thorn of determinism. As long as progress is made by re-executing a function from the top using memoization, the curse of determinism will remain. Everything else can hopefully be abstracted. We can build reliable workflows the event-driven way or the orchestration way. For durable execution engines to be widely adopted they need to make durability invisible, letting you write code that looks synchronous but survives failures, retries, and even migration across machines. Allowing developers to write normal looking code (that magically can be scheduled across several servers, suspending and resuming when needed) is nice. But more than that, durable execution as a category should make workflows more governable—that is the true value-add in my opinion. In practice, many organizations could benefit from a hybrid coordination model. As I argued in the Coordinated Progress series, orchestration (such as durable functions) should focus on the direct edges (the critical path steps that must succeed for the business goal to be achieved). An orders workflow consisting of payment processing, inventory reservation, and order confirmation form a tightly coupled workflow where failure at any step means the whole operation fails. It makes sense to maintain this coupling. But orchestration shouldn't try to control everything. Indirect edges (such as triggering other related workflows or any number of auxiliary actions) are better handled through choreography. Workflows directly invoking other workflows only expands the function tree. Instead an orchestrated order workflow can emit an OrderCompleted event that any number of decoupled services and workflows can react to without the orchestrator needing to know or care. Fig 7. Orchestration employed in bounded contexts (or just core business workflow) with events as the wider substrate. Note also that workflows invoking other workflows directly can also be a result of the constrained workflow→step/activity model. Sometimes it might make sense to split up a large monolithic workflow into a child workflow, yet, both workflows essentially form the critical path of a single business process. The durable function tree in summary: Functions call functions, each returning a durable promise Execution flows down; promise resolution flows back up Local side effects run synchronously; remote side effects enable function suspension Continuations are implemented via re-execution + memoization Nested fault boundaries:  Each function ensures its child functions are invoked The DEE drives progress Parents functions handle the outcomes of its children The durable function tree offers a distinct set of tradeoffs compared to event-driven choreography. Both can build reliable multi-step workflows; the question is which properties matter more for a given use case. Event-driven architecture excels at decoupling : services evolve independently, failures are isolated, new consumers can be added without touching existing producers. With this decoupling comes fragmented visibility as the workflow emerges from many independent handlers, making it harder to reason about the critical path or enforce end-to-end timeouts. Durable function trees excel at governance of the workflow as an entity : the workflow is explicit, observable as a whole, and subject to policies that span all steps. But this comes with coupling as the orchestrated code must know about all services in the critical path. Plus the curse of determinism that comes with replay + memoization based execution. The honest truth is you don't need durable execution. Event-driven architecture also has the same reliability from durability. You can wire up a SpringBoot application with Kafka and build reliable workflows through event-driven choreography. Many successful systems do exactly this. The real value-add of durable execution, in my opinion, is treating a workflow as a single governable entity. For durable execution to be successful as a category, it has to be more than just allowing developers to write normal-ish looking code that can make progress despite failures. If we only want procedural code that survives failures, then I think the case for durable execution is weak. When durable execution is employed, keep it narrow, aligned to specific core business flows where the benefits of seeing the workflow as a single governable entity makes it worth it. Then use events to tie the rest of the architecture together as a whole. EDA focuses on step-level reliability (each consumer handles retries, each message is durable) with results in step decoupling . Because Kafka is reliable, we can build reliable workflows from reliable steps. Because each node in the graph of work is independent, we get a decoupled architecture. Durable execution focuses on workflow-level reliability. The entire business process is an entity itself (creating step coupling) . It executes from the root function down to the leaves, with visibility and control over the process as a whole. But it comes with the drawback of greater coupling and the thorn of determinism. As long as progress is made by re-executing a function from the top using memoization, the curse of determinism will remain. Everything else can hopefully be abstracted. Functions call functions, each returning a durable promise Execution flows down; promise resolution flows back up Local side effects run synchronously; remote side effects enable function suspension Continuations are implemented via re-execution + memoization Nested fault boundaries:  Each function ensures its child functions are invoked The DEE drives progress Parents functions handle the outcomes of its children Event-driven architecture excels at decoupling : services evolve independently, failures are isolated, new consumers can be added without touching existing producers. With this decoupling comes fragmented visibility as the workflow emerges from many independent handlers, making it harder to reason about the critical path or enforce end-to-end timeouts. Durable function trees excel at governance of the workflow as an entity : the workflow is explicit, observable as a whole, and subject to policies that span all steps. But this comes with coupling as the orchestrated code must know about all services in the critical path. Plus the curse of determinism that comes with replay + memoization based execution.

0 views