Posts in Api (20 found)
Jefferson Heard 1 weeks ago

The best worst hack that saved our bacon

No-one really likes engineering war stories, but this one's relevant because there's a moral to it. I've talked before about defining technical debt as technical decisions that provide immediate value, but with long-term negative impact if they aren't cleaned up. Sometimes introducing technical debt is necessary and you do it consciously to avoid a disaster. As long as you provide yourself enough room to clean it up, it's just part of the regular course of business when millions of people count on your software to get through their days. Twelve years of calendar appointments on our platform, and the data model was starting to show some wear and tear. Specifically, though, our occurrence table was created with a plain integer primary key, and we were approaching two billion occurrences on the calendar. Well, specifically, the primary key was rapidly approaching 2,147,483,647 – the magic number that is the maximum value for a signed 32-bit integer. We had actually known about this for some time, and we had done most of the work to fix it already. Our backend code was upgraded to bigints and the actual column itself had a migration set to upgrade it to a big integer. The plan had been in the works for a month and a half, and we almost ran with it. But then, roughly a week before we were going to deploy it (and maybe only a month before the keys ran out), someone, maybe me, I don't recall, noticed that these integer keys were exposed in one of our public APIs. You can count on one thing in SaaS software. If you provide an integration API to your customers or vendors and it exposes an attribute, that attribute is crucial to someone, somewhere. And in our case the people using the integrations often had to rely on their university's IT department to do the integration itself. Those backlogs are counted in months, and so we couldn't deploy something that would potentially break customer integrations. What to do? Well, Postgres integer primary keys are signed. So there's this WHOLE other half of the 32-bit word that you're not using if you're just auto-incrementing keys. My simple (read stupid) solution, which absolutely worked was to set the sequence on that primary key to -2,147,483,648 and let it continue to auto-increment , taking up the other half of that integer space. It was so dumb that I think we met like three times together with SRE to say things like, "Is it really this simple? Is this really likely to work? Are we really doing something this dumb?" and the conclusion was yes, and that it would buy us up to 3 years of time to migrate, but we would do it within 6-8 months so all IT departments can make alternative arrangements for their API integrations. The long term solution was the BigInt, yes, but it was also to expose all keys as opaque handles rather than integers to avoid dictionary attacks and so that we could use any type we needed to on the backend without API users having to account for it. It was also to work through the Customer Success team and make sure no-one counted on the integer-ness (integrality?) of the keys or better that no-one was using the occurrence IDs at all. In the end we had a smooth transition because of quick thinking and willingness to apply a baldfaced hack to our production (and staging) database. We had a fixed timeline we all acknowledged where the tech debt had to be addressed, and we'd firmly scoped out the negative consequences of not addressing it. It wasn't hard, but it meant that no matter who was in charge or what team changes were made, the cleanup would get done in time and correctly. It was the right thing to do. A few customers had been counting on those IDs and we were able to advise their IT departments about how to change their code and to show them what the new API response would look like long before they actually were forced to use it. In the meantime, everything just worked. Do I advise that you use negative primary keys to save room on your database? No. Was it the right choice of technical debt for the time? Absolutely.

0 views
Lea Verou 2 weeks ago

In the economy of user effort, be a bargain, not a scam

Alan Kay [source] One of my favorite product design principles is Alan Kay’s “Simple things should be simple, complex things should be possible” . [1] I had been saying it almost verbatim long before I encountered Kay’s quote. Kay’s maxim is deceptively simple, but its implications run deep. It isn’t just a design ideal — it’s a call to continually balance friction, scope, and tradeoffs in service of the people using our products. This philosophy played a big part in Prism’s success back in 2012, helping it become the web’s de facto syntax highlighter for years, with over 2 billion npm downloads. Highlighting code on a page took including two files. No markup changes. Styling used readable CSS class names. Even adding new languages — the most common “complex” use case — required far less knowledge and effort than alternatives. At the same time, Prism exposed a deep extensibility model so plugin authors could patch internals and dramatically alter behavior. These choices are rarely free. The friendly styling API increased clash risk, and deep extensibility reduced encapsulation. These were conscious tradeoffs, and they weren’t easy. Simple refers to use cases that are simple from the user’s perspective , i.e. the most common use cases. They may be hard to implement, and interface simplicity is often inversely correlated with implementation simplicity. And which things are complex , depends on product scope . Instagram’s complex cases are vastly different than Photoshop’s complex cases, but as long as there is a range, Kay’s principle still applies. Since Alan Kay was a computer scientist, his quote is typically framed as a PL or API design principle, but that sells it short. It applies to a much, much broader class of interfaces. This class hinges on the distribution of use cases . Products often cut scope by identifying the ~20% of use cases that drive ~80% of usage — aka the Pareto Principle . Some products, however, have such diverse use cases that Pareto doesn’t meaningfully apply to the product as a whole. There are common use cases and niche use cases, but no clean 20-80 split. The long tail of niche use cases is so numerous, it becomes significant in aggregate . For lack of a better term, I’ll call these long‑tail UIs . Nearly all creative tools are long-tail UIs. That’s why it works so well for programming languages and APIs — both are types of creative interfaces. But so are graphics editors, word processors, spreadsheets, and countless other interfaces that help humans create artifacts — even some you would never describe as creative. Yes, programming languages and APIs are user interfaces . If this surprises you, watch my DotJS 2024 talk titled “API Design is UI Design” . It’s only 20 minutes, but covers a lot of ground, including some of the ideas in this post. I include both code and GUI examples to underscore this point; if the API examples aren’t your thing, skip them and the post will still make sense. You wouldn’t describe Google Calendar as a creative tool, but it is a tool that helps humans create artifacts (calendar events). It is also a long-tail product: there is a set of common, conceptually simple cases (one-off events at a specific time and date), and a long tail of complex use cases (recurring events, guests, multiple calendars, timezones, etc.). Indeed, Kay’s maxim has clearly been used in its design. The simple case has been so optimized that you can literally add a one hour calendar event with a single click (using a placeholder title). A different duration can be set after that first click through dragging [2] . But almost every edge case is also catered to — with additional user effort. Google Calendar is also an example of an interface that digitally encodes real-life, demonstrating that complex use cases are not always power user use cases . Often, the complexity is driven by life events. E.g. your taxes may be complex without you being a power user of tax software, and your family situation may be unusual without you being a power user of every form that asks about it. The Pareto Principle is still useful for individual features , as they tend to be more narrowly defined. E.g. there is a set of spreadsheet formulas (actually much smaller than 20%) that drives >80% of formula usage. While creative tools are the poster child of long-tail UIs, there are long-tail components in many transactional interfaces such as e-commerce or meal delivery (e.g. result filtering & sorting, product personalization interfaces, etc.). Filtering UIs are another big category of long-tail UIs, and they involve so many tradeoffs and tough design decisions you could literally write a book about just them. Airbnb’s filtering UI here is definitely making an effort to make simple things easy with (personalized! 😍) shortcuts and complex things possible via more granular controls. Picture a plane with two axes: the horizontal axis being the complexity of the desired task (again from the user’s perspective, nothing to do with implementation complexity), and the vertical axis the cognitive and/or physical effort users need to expend to accomplish their task using a given interface. Following Kay’s maxim guarantees these two points: But even if we get these two points — what about all the points in between? There are a ton of different ways to connect them, and they produce vastly different overall user experiences. How does your interface fare when a use case is only slightly more complex? Are users yeeted into the deep end of interface complexity (bad), or do they only need to invest a proportional, incremental amount of effort to achieve their goal (good)? Meet the complexity-to-effort curve , the most important usability metric you’ve never heard of. For delightful user experiences, making simple things easy and complex things possible is not enough — the transition between the two should also be smooth. You see, simple use cases are the spherical cows in space of product design . They work great for prototypes to convince stakeholders, or in marketing demos, but the real world is messy . Most artifacts that users need to create to achieve their real-life goals rarely fit into your “simple” flows completely, no matter how well you’ve done your homework. They are mostly simple — with a liiiiitle wart here and there. For a long-tail interface to serve user needs well in practice , we also need to design the curve, not just its endpoints . A model with surprising predictive power is to treat user effort as a currency that users are spending to buy solutions to their problems. Nobody likes paying it; in an ideal world software would read our mind and execute perfectly with zero user effort. Since we don’t live in such a world, users are typically willing to pay more in effort when they feel their use case warrants it. Just like regular pricing, actual user experience often depends more on the relationship between cost and expectation (budget) than on the absolute cost itself. If you pay more than you expected, you feel ripped off. You may still pay it because you need the product in the moment, but you’ll be looking for a better deal in the future. And if you pay less than you expected, you feel like you got a bargain, with all the delight and loyalty that entails. Incremental user effort cost should be proportional to incremental value gained. Suppose you’re ordering pizza. You want a simple cheese pizza with ham and mushrooms. You use the online ordering system, and you notice that adding ham to your pizza triples its price. We’re not talking some kind of fancy ham where the pigs were fed on caviar and bathed in champagne, just a regular run-of-the-mill pizza topping. You may still order it if you’re starving and no other options are available, but how does it make you feel? It’s not that different when the currency is user effort. The all too familiar “ But I just wanted to _________, why is it so hard? ”. When a slight increase in complexity results in a significant increase in user effort cost, we have a usability cliff . Usability cliffs make users feel resentful, just like the customers of our fictitious pizza shop. A usability cliff is when a small increase in use case complexity requires a large increase in user effort. Usability cliffs are very common in products that make simple things easy and complex things possible through entirely separate flows with no integration between them: a super high level one that caters to the most common use case with little or no flexibility, and a very low-level one that is an escape hatch: it lets users do whatever, but they have to recreate the solution to the simple use case from scratch before they can tweak it. Simple things are certainly easy: all we need to get a video with a nice sleek set of controls that work well on every device is a single attribute: . We just slap it on our element and we’re done with a single line of HTML: Now suppose use case complexity increases just a little . Maybe I want to add buttons to jump 10 seconds back or forwards. Or a language picker for subtitles. Or just to hide the volume control on a video that has no audio track. None of these are particularly niche, but the default controls are all-or-nothing: the only way to change them is to reimplement the whole toolbar from scratch, which takes hundreds of lines of code to do well. Simple things are easy and complex things are possible. But once use case complexity crosses a certain (low) threshold, user effort abruptly shoots up. That’s a usability cliff. For Instagram’s photo editor, the simple use case is canned filters, whereas the complex ones are those requiring tweaking through individual low-level controls. However, they are implemented as separate flows: you can tweak the filter’s intensity , but you can’t see or adjust the primitives it’s built from. You can layer both types of edits on the same image, but they are additive, which doesn’t work well. Ideally, the two panels would be integrated, so that selecting a filter would adjust the low-level controls accordingly, which would facilitate incremental tweaking AND would serve as a teaching aid for how filters work. My favorite end-user facing product that gets this right is Coda , a cross between a document editor, a spreadsheet, and a database. All over its UI, it supports entering formulas instead of raw values, which makes complex things possible. To make simple things easy, it also provides the GUI you’d expect even without a formula language. But here’s the twist: these presets generate formulas behind the scenes that users can tweak ! Whenever users need to go a little beyond what the UI provides, they can switch to the formula editor and adjust what was generated — far easier than writing it from scratch. Another nice touch: “And” is not just communicating how multiple filters are combined, but is also a control that lets users edit the logic. Defining high-level abstractions in terms of low-level primitives is a great way to achieve a smooth complexity-to-effort curve, as it allows you to expose tweaking at various intermediate levels and scopes. The downside is that it can sometimes constrain the types of high-level solutions that can be implemented. Whether the tradeoff is worth it depends on the product and use cases. If you like eating out, this may be a familiar scenario: — I would like the rib-eye please, medium-rare. — Thank you sir. How would you like your steak cooked? Keep user effort close to the minimum necessary to declare intent Annoying, right? And yet, this is how many user interfaces work; expecting users to communicate the same intent multiple times in slightly different ways. If incremental value should require incremental user effort , an obvious corollary is that things that produce no value should not require user effort . Using the currency model makes this obvious: who likes paying without getting anything in return? Respect user effort. Treat it as a scarce resource — just like regular currency — and keep it close to the minimum necessary to declare intent . Do not require users to do work that confers them no benefit, and could have been handled by the UI. If it can be derived from other input, it should be derived from other input. Source: NNGroup (adapted). A once ubiquitous example that is thankfully going away, is the credit card form which asks for the type of credit card in a separate dropdown. Credit card numbers are designed so that the type of credit card can be determined from the first four digits. There is zero reason to ask for it separately. Beyond wasting user effort, duplicating input that can be derived introduces an unnecessary error condition that you now need to handle: what happens when the entered type is not consistent with the entered number? User actions that meaningfully communicate intent to the interface are signal . Any other step users need to take to accomplish their goal, is noise . This includes communicating the same input more than once, providing input separately that could be derived from other input with complete or high certainty, transforming input from their mental model to the interface’s mental model, and any other demand for user effort that does not serve to communicate new information about the user’s goal. Some noise is unavoidable. The only way to have 100% signal-to-noise ratio would be if the interface could mind read. But too much noise increases friction and obfuscates signal. A short yet demonstrative example is the web platform’s methods for programmatically removing an element from the page. To signal intent in this case, the user needs to communicate two things: (a) what they want to do (remove an element), and (b) which element to remove. Anything beyond that is noise. The modern DOM method has an extremely high signal-to-noise ratio. It’s hard to imagine a more concise way to signal intent. However, the older method that it replaced had much worse ergonomics. It required two parameters: the element to remove, and its parent. But the parent is not a separate source of truth — it would always be the child node’s parent! As a result, its actual usage involved boilerplate , where developers had to write a much noisier [3] . Boilerplate is repetitive code that users need to include without thought, because it does not actually communicate intent. It’s the software version of red tape : hoops you need to jump through to accomplish your goal, that serve no obvious purpose in furthering said goal except for the fact that they are required. In this case, the amount of boilerplate may seem small, but when viewed as a percentage of the total amount of code, the difference is staggering. The exact ratio (81% vs 20% here) varies based on specifics such as variable names, but when the difference is meaningful, it transcends these types of low-level details. Of course, it was usually encapsulated in utility functions, which provided a similar signal-to-noise ratio as the modern method. However, user-defined abstractions don’t come for free, there is an effort (and learnability) tax there, too. Improving signal-to-noise ratio is also why the front-end web industry gravitated towards component architectures: they increase signal-to-noise ratio by encapsulating boilerplate. As an exercise for the reader, try to calculate the signal-to-noise ratio of a Bootstrap accordion (or any other complex Bootstrap component). Users are much more vocal about things not being possible, than things being hard. When pointing out friction issues in design reviews , I have sometimes heard “ users have not complained about this ”. This reveals a fundamental misunderstanding about the psychology of user feedback . Users are much more vocal about things not being possible, than about things being hard. The reason becomes clear if we look at the neuroscience of each. Friction is transient in working memory (prefrontal cortex). After completing a task, details fade. The negative emotion persists and accumulates, but filing a complaint requires prefrontal engagement that is brief or absent. Users often can’t articulate why the software feels unpleasant: the specifics vanish; the feeling remains. Hard limitations, on the other hand, persist as conscious appraisals. The trigger doesn’t go away, since there is no workaround, so it’s far more likely to surface in explicit user feedback. Both types of pain points cause negative emotions, but friction is primarily processed by the limbic system (emotion), whereas hard limitations remain in the prefrontal cortex (reasoning). This also means that when users finally do reach the breaking point and complain about friction, you better listen. Friction is primarily processed by the limbic system, whereas hard limitations remain in the prefrontal cortex Second, user complaints are filed when there is a mismatch in expectations . Things are not possible but the user feels they should be, or interactions cost more user effort than the user had budgeted, e.g. because they know that a competing product offers the same feature for less (work). Often, users have been conditioned to expect poor user experiences, either because all options in the category are high friction, or because the user is too novice to know better [4] . So they begrudgingly pay the price, and don’t think they have the right to complain, because it’s just how things are. You might ask, “If all competitors are equally high-friction, how does this hurt us?” An unmet need is a standing invitation to disruption that a competitor can exploit at any time. Because you’re not only competing within a category; you’re competing with all alternatives — including nonconsumption (see Jobs‑to‑be‑Done ). Even for retention, users can defect to a different category altogether (e.g., building native apps instead of web apps). Historical examples abound. When it comes to actual currency, a familiar example is Airbnb : Until it came along, nobody would complain that a hotel of average price is expensive — it was just the price of hotels. If you couldn’t afford it, you just couldn’t afford to travel, period. But once Airbnb showed there is a cheaper alternative for hotel prices as a whole , tons of people jumped ship. It’s no different when the currency is user effort. Stripe took the payment API market by storm when it demonstrated that payment APIs did not have to be so high friction. iPhone disrupted the smartphone market when it demonstrated that no, you did not have to be highly technical to use a smartphone. The list goes on. Unfortunately, friction is hard to instrument. With good telemetry you can detect specific issues (e.g., dead clicks), but there is no KPI to measure friction as a whole. And no, NPS isn’t it — and you’re probably using it wrong anyway . Instead, the emotional residue from friction quietly drags many metrics down (churn, conversion, task completion), sending teams in circles like blind men touching an elephant . That’s why dashboards must be paired with product vision and proactive, first‑principles product leadership . Steve Jobs exemplified this posture: proactively, aggressively eliminating friction presented as “inevitable.” He challenged unnecessary choices, delays, and jargon, without waiting for KPIs to grant permission. Do mice really need multiple buttons? Does installing software really need multiple steps? Do smartphones really need a stylus? Of course, this worked because he had the authority to protect the vision; most orgs need explicit trust to avoid diluting it. So, if there is no metric for friction, how do you identify it? Reducing friction rarely comes for free, just because someone had a good idea. These cases do exist, and they are great, but it usually takes sacrifices. And without it being an organizational priority, it’s very hard to steer these tradeoffs in that direction. The most common tradeoff is implementation complexity. Simplifying user experience is usually a process of driving complexity inwards and encapsulating it in the implementation. Explicit, low-level interfaces are far easier to implement, which is why there are so many of them. Especially as deadlines loom, engineers will often push towards externalizing complexity into the user interface, so that they can ship faster. And if Product leans more data-driven than data-informed, it’s easy to look at customer feedback and conclude that what users need is more features ( it’s not ) . The first faucet is a thin abstraction : it exposes the underlying implementation directly, passing the complexity on to users, who now need to do their own translation of temperature and pressure into amounts of hot and cold water. It prioritizes implementation simplicity at the expense of wasting user effort. The second design prioritizes user needs and abstracts the underlying implementation to support the user’s mental model. It provides controls to adjust the water temperature and pressure independently, and internally translates them to the amounts of hot and cold water. This interface sacrifices some implementation simplicity to minimize user effort. This is why I’m skeptical of blanket calls for “simplicity.”: they are platitudes. Everyone agrees that, all else equal, simpler is better. It’s the tradeoffs between different types of simplicity that are tough. In some cases, reducing friction even carries tangible financial risks, which makes leadership buy-in crucial. This kind of tradeoff cannot be made by individual designers — it requires usability as a priority to trickle down from the top of the org chart. The Oslo airport train ticket machine is the epitome of a high signal-to-noise interface. You simply swipe your credit card to enter and you swipe your card again as you leave the station at your destination. That’s it. No choices to make. No buttons to press. No ticket. You just swipe your card and you get on the train. Today this may not seem radical, but back in 2003, it was groundbreaking . To be able to provide such a frictionless user experience, they had to make a financial tradeoff: it does not ask for a PIN code, which means the company would need to simply absorb the financial losses from fraudulent charges (stolen credit cards, etc.). When user needs are prioritized at the top, it helps to cement that priority as an organizational design principle to point to when these tradeoffs come along in the day-to-day. Having a design principle in place will not instantly resolve all conflict, but it helps turn conflict about priorities into conflict about whether an exception is warranted, or whether the principle is applied correctly, both of which are generally easier to resolve. Of course, for that to work everyone needs to be on board with the principle. But here’s the thing with design principles (and most principles in general): they often seem obvious in the abstract, so it’s easy to get alignment in the abstract. It’s when the abstract becomes concrete that it gets tough. The Web Platform has its own version of this principle, which is called Priority of Constituencies : “User needs come before the needs of web page authors, which come before the needs of user agent implementors, which come before the needs of specification writers, which come before theoretical purity.” This highlights another key distinction. It’s more nuanced than users over developers; a better framing is consumers over producers . Developers are just one type of producer. The web platform has multiple tiers of producers: Even within the same tier there are producer vs consumer dynamics. When it comes to web development libraries, the web developers who write them are producers and the web developers who use them are consumers. This distinction also comes up in extensible software, where plugin authors are still consumers when it comes to the software itself, but producers when it comes to their own plugins. It also comes up in dual sided marketplace products (e.g. Airbnb, Uber, etc.), where buyer needs are generally higher priority than seller needs. In the economy of user effort, the antithesis of overpriced interfaces that make users feel ripped off are those where every bit of user effort required feels meaningful and produces tangible value to them. The interface is on the user’s side, gently helping them along with every step, instead of treating their time and energy as disposable. The user feels like they’re getting a bargain : they get to spend less than they had budgeted for! And we all know how motivating a good bargain is. User effort bargains don’t have to be radical innovations; don’t underestimate the power of small touches. A zip code input that auto-fills city and state, a web component that automatically adapts to its context without additional configuration, a pasted link that automatically defaults to the website title (or the selected text, if any), a freeform date that is correctly parsed into structured data, a login UI that remembers whether you have an account and which service you’ve used to log in before, an authentication flow that takes you back to the page you were on before. Sometimes many small things can collectively make a big difference. In some ways, it’s the polar opposite of death by a thousand paper cuts : Life by a thousand sprinkles of delight! 😀 In the end, “ simple things simple, complex things possible ” is table stakes. The key differentiator is the shape of the curve between those points. Products win when user effort scales smoothly with use case complexity, cliffs are engineered out, and every interaction declares a meaningful piece of user intent . That doesn’t just happen by itself. It involves hard tradeoffs, saying no a lot, and prioritizing user needs at the organizational level . Treating user effort like real money, forces you to design with restraint. A rule of thumb is place the pain where it’s best absorbed by prioritizing consumers over producers . Do this consistently, and the interface feels delightful in a way that sticks. Delight turns into trust. Trust into loyalty. Loyalty into product-market fit. Kay himself replied on Quora and provided background on this quote . Don’t you just love the internet? ↩︎ Yes, typing can be faster than dragging, but minimizing homing between input devices improves efficiency more, see KLM ↩︎ Yes, today it would have been , which is a little less noisy, but this was before the optional chaining operator. ↩︎ When I was running user studies at MIT, I’ve often had users exclaim “I can’t believe it! I tried to do the obvious simple thing and it actually worked!” ↩︎

1 views
マリウス 4 weeks ago

📨🚕

📨🚕 ( MSG.TAXI ) is a multi-protocol push notification router. You post to it via a webhook URL and it flings that data to your configured targets . It’s the missing glue between your code and your notification channels, whether that’s your smart home, your CI pipeline, your RPG guild’s Matrix room, or just your phone at 3AM when your server falls over (again). Push notifications from anything, to anything. Intro Updates Website

0 views
Sean Goedecke 1 months ago

The whole point of OpenAI's Responses API is to help them hide reasoning traces

About six months ago, OpenAI released their Responses API , which replaced their previous /chat/completions API for inference. The old API was very simple: you pass in an array of messages representing a conversation between the model and a user, and get the model’s next response back. The new Responses API is more complicated

0 views
Sean Goedecke 1 months ago

An unofficial FAQ for Stripe's new "Tempo" blockchain

Stripe just announced Tempo , a “L1 blockchain” for “stablecoin payments”. What does any of this mean. In 2021, I was interested enough in blockchain to write a simple explainer and a technical description of Bitcoin specifically . But I’ve never been a blockchain fan . Both my old and new “what kind of work I want” posts state that I’m ethically opposed to proof-of-work blockchain

0 views
DuckTyped 1 months ago

An Illustrated Guide to OAuth

OAuth was first introduced in 2007. It was created at Twitter because Twitter wanted a way to allow third-party apps to post tweets on users' behalf. Take a second to imagine designing something like that today. How would you do it? One way would just be to ask the user for their username and password. So you create an unofficial Twitter client, and present the user a login screen that says "log in with Twitter". The user does so, but instead of logging into Twitter, they're actually sending their data to you, this third-party service which logs into Twitter for them. This is bad for a lot of reasons. Even if you trust a third-party app, what if they don't store your password correctly and someone steals it? You should never give your password to a third-party website like this. Another way you might be thinking is, what about API keys? Because you're hitting Twitter's API to post data for a user, and for an API , you use API keys . But API keys are general. What you need is an API key specific to a user. To solve these problems, OAuth was created. You'll see how it solves all these problems, but the crux of OAuth is an access token , which is sort of like an API key for a specific user. An app gets an access token, and then they can use that to take actions on the user's behalf, or access data for a user. OAuth can be used in a lot of different ways, one of the reasons it is so hard to understand. In this post, we’re going to look at a typical OAuth flow. The example I'm going to use is YNAB. If you haven't used it, YNAB is like a paid version of Mint. You connect it to a bank account, and then it pulls all your transactions from that account, and shows them to you with very pretty charts. You can categorize your spending, and then it tells you for example, hey, you're spending too much on groceries. It helps you manage your finances. So, I want to use YNAB, and I want to connect it to Chase Bank, but I don't want to give it my Chase password. So instead, I'm going to use OAuth. Let's look at the flow first, and then let's understand what's going on. We're actually going to look at the flow twice , because I think you need to look through an OAuth flow at least two times to understand what's going on. So to start, I'm at YNAB, and I want to connect Chase as a source. The OAuth flow looks like this: YNAB redirects me to Chase. At Chase, I log in with my username and password. Chase shows me a screen saying "YNAB wants to connect to Chase. Pick what accounts you want to give YNAB access to". It'll show me a list of all my accounts. Let's say I pick just my checking account, to give YNAB read access to this account, and hit OK. From Chase, I'm redirected back to YNAB, and now, magically, YNAB is connected to Chase. This is the experience from a user's perspective. But what happened there? What magic happened in the background, so that YNAB somehow has access to my data on Chase? Remember, the end goal of OAuth is for YNAB to end up with an access token , so it can access my data from Chase. Somehow, as I went through this flow, YNAB ended up with an access token. I'll spoil the surprise by telling you how it got the access token, and then I'll walk you through what happened in more detail. How does Chase give YNAB the access token? When you were redirected from Chase back to YNAB, Chase could have just added the access token in the URL. It could have redirected you back to a URL like this: and then YNAB would be able to get the access token. An access token is supposed to be secret, but URLs can end up in your browser's history or some server logs, in which case it's easy for anyone to see your access token. So Chase could technically redirect you back to YNAB with the access token in the URL , and then YNAB would have the access token. End of OAuth flow. But we don’t do it this way, because sending an access token in the URL is not secure. When you were redirected from Chase back to YNAB, Chase sent YNAB an authorization code in the URL. An authorization code is not an access token! Chase sends YNAB an authorization code, and YNAB exchanges the authorization code for an access token . It does this by making a backend request to Chase, a backend POST request over HTTPS, which means no one can see the access token. And then YNAB has the access token. End of OAuth flow. OAuth success. Let's talk about what we just saw. At a high level, there are two parts to an OAuth flow. The first is the user consent flow , which is where you, the user, log in and pick what to give access to. This is a critical part of OAuth, because in OAuth, we always want the user to be actively involved and in control. The other part is the authorization code flow . This is the flow where YNAB actually gets this access token . Let's talk about more details of exactly how this works. And let's also talk about some terminology, because OAuth has very specific terminology. Instead of user, we say resource owner . Instead of app, we say OAuth client or OAuth app . The server where you log in is called the authorization server . The server where you get user data from is called the resource server (This could be the same as the authorization server). On the authorization server, when the user picks what's allowed, those are called scopes . I'll try to use that terminology, because you'll need to get familiar with it if you're going to read more OAuth documentation. So let’s look at this high level again, with the new terms. You have OAuth clients. An OAuth client wants to access data on a resource server, and the data belongs to the resource owner. To do that, the OAuth client redirects to the authorization server. The user logs in, user agrees to scopes (what this token is allowed to access), and the user gets redirected back to the OAuth client with an authorization code in the URL. On the back end, the OAuth client sends the authorization code and client secret (we'll talk about client secrets shortly) to the authorization server, and the authorization server responds with the access token. That's the exact same flow, but using the new terminology we just discussed. Now let's talk specifics. We've seen what this flow looks like from the user's point of view, let's look at what it looks like from the developer's point of view. To use OAuth, you first need to register a new app. So for example, GitHub provides OAuth. If you want to create a new app for GitHub, you first register it. Different services require different types of data in the app registration, but every service will require at least an app name, because when the user goes to GitHub, for example, GitHub needs to be able to say "Amazon Web Services is requesting read access to your repos and gists" A redirect URI. And we'll talk about what that is shortly. GitHub will respond with: A client ID. This is a public ID that you'll be using to make requests A client secret. You'll be using this to authenticate your request. Awesome, you have registered your OAuth application. Let's say your app is YNAB, and one of your users wants to connect to Chase. So you start a new OAuth flow... your very first one! Step one: You will redirect them to Chase's authorization server's OAuth endpoint, passing these parameters in the URL: Client ID, which we just talked about. The redirect URI. Once the user is done on Chase, this is where Chase will redirect them back to. This will be a YNAB url, since you're the YNAB app. Response type, which is usually "code", because we usually want to get back an authorization code, not an access token, which is less secure. Scopes. So what scopes are we requesting? i.e. what user data do we want to access? This is enough information for the authorization server to validate the request and show the user a message like "YNAB is requesting read access to your accounts". How does the authorization server validate the request? Well, if the client ID isn't valid, the request is invalid right away. If the client ID is valid, the authorization server needs to check the redirect URI. Basically, since the client ID is public, anyone could go get the YNAB client ID, and create their own OAuth flow that hits Chase, but then returns the user back to, let's say, evildude.com. But that's why when you register your app, you have to tell Chase what a valid redirect URI looks like. At that point, you would tell Chase that only YNAB.com URIs are valid, thus preventing this evildude.com scenario. If everything is valid, the authorization server can use the client ID to get the app name, maybe the app icon, and then show a user consent screen. The user will click which accounts they want to give YNAB access to, and hit okay. Chase will redirect them back to the redirect URI that you gave, lets say ynab.com/oauth-callback?authorization_code=xyz. Side note: you might be wondering, what is the difference between URI and URL? Because I'm kind of using both. Well, a URL is any website URL that we know and love. URI is more general. URL is a type of URI, but there are many other types of URIs. The reason I'm saying redirect URI instead of redirect URL is because mobile apps won't have a URL. They'll just have a URI, which is a protocol they have made up that might look something like . So if you're only doing web work, whenever you read URI, you can read it as URL. And if you're doing mobile work, you can read URI and know that yes, your use case is supported too. So user is redirected back to ynab.com/oauth-callback?authorization_code=xyz, and now your app has an authorization code. You send that authorization code to the Chase authorization server, along with your client secret. Why include the client secret? Because again, the authorization code is in the URL. So anyone can see it and anyone could try to exchange it for the access token. That's why we need to send the client secret, so Chase's server can say "Oh yes I remember I had generated this code for this client ID, and the client secret matches. This is a valid request." And then it returns the access token. Note how in every step of the OAuth flow, they have thought through how someone could exploit the flow, and added safeguards*. That is a big reason why it's so complicated. *I'm reliably informed by a friend in security that the OAuth designers learned a bunch of lessons the hard way, and that is another reason why it is so complicated: because it had to be patched repeatedly. The other big reason is because we want the user to be involved. That makes it complicated because all the user stuff has to be frontend, which is insecure, because anyone can see it. And then all the secure stuff has to be on the back end. I keep saying frontend and back-end, but in the OAuth docs, they say front-channel and back-channel instead. Let's talk about why. Front-channel and back-channel So, OAuth doesn't use the terms frontend and back-end, it uses front-channel and back-channel. Front-channel means GET requests, where anyone can see the parameter in the URL, and back-channel means POST requests, where that data is encrypted (as part of the POST body). The reason OAuth doesn't use frontend or backend is, because you could make POST requests using JavaScript! So, theoretically, you could exchange your authorization code for an access token right on the frontend, in JavaScript, by making a POST fetch request. Now, there is a big problem with this, which is you also need the client secret to make that request. And of course, once the secret is on the frontend and accessible in JavaScript, it's not secret anymore. Anyone can access it. So, instead of using the client secret, there's a different way to do it called PKCE , spelled P-K-C-E, pronounced “pixie” (seriously). It's not as secure as doing it on the backend with the client secret, but if backend is not an option for you, you can do it using PKCE. So just know that if you have an app without a back-end, you can still do OAuth. I may cover PKCE in a future post, as it is now recommended for the standard flow as well, since it helps protect against auth code interception. Same problem for mobile apps. Unless you have a mobile app that has a backend component, like a backend server somewhere, if you're putting your client secret in a mobile app, well, anyone can get that because there are tons of tools to extract strings from mobile apps. So, instead of including your client secret in your app, you should again use PKCE to get that access token. So those are two other terms that are good to know: front-channel and back-channel . At this point, you've seen what the OAuth flow looks like from the user's perspective, and from the developer's perspective, and you have seen the components that make it secure. One last thing I want to mention is OAuth can look like a lot of different ways. I covered the main recommended OAuth flow above, but some people may do OAuth by passing back an access token in the redirect instead of the authorization token (doing that is called the "implicit flow"). Some people may do it using PKCE. There's even a way to do OAuth without the user consent part, but that really is not recommended. The other part of OAuth we didn't cover is that tokens expire and you need to refresh them. And that happens through a refresh flow. Also, OAuth is all about authorization, but some workflows use OAuth to log in, such as when you use a “sign-on with Google” feature. This uses OpenID Connect, or OIDC, which is a layer on top of OAuth that also returns user data instead of just an access token. I'm mentioning this here because when you look for OAuth on the web, you'll see a lot of different flows, and you may be confused as to why they're all different. And the reason is, OAuth is not straightforward like HTTP, OAuth can look a lot of different ways. Now you're good to go out and do your own OAuthing. Good luck! Thanks for reading DuckTyped! Subscribe for free to receive new posts and support my work. OAuth was first introduced in 2007. It was created at Twitter because Twitter wanted a way to allow third-party apps to post tweets on users' behalf. Take a second to imagine designing something like that today. How would you do it? One way would just be to ask the user for their username and password. So you create an unofficial Twitter client, and present the user a login screen that says "log in with Twitter". The user does so, but instead of logging into Twitter, they're actually sending their data to you, this third-party service which logs into Twitter for them. This is bad for a lot of reasons. Even if you trust a third-party app, what if they don't store your password correctly and someone steals it? You should never give your password to a third-party website like this. Another way you might be thinking is, what about API keys? Because you're hitting Twitter's API to post data for a user, and for an API , you use API keys . But API keys are general. What you need is an API key specific to a user. To solve these problems, OAuth was created. You'll see how it solves all these problems, but the crux of OAuth is an access token , which is sort of like an API key for a specific user. An app gets an access token, and then they can use that to take actions on the user's behalf, or access data for a user. How OAuth works OAuth can be used in a lot of different ways, one of the reasons it is so hard to understand. In this post, we’re going to look at a typical OAuth flow. The example I'm going to use is YNAB. If you haven't used it, YNAB is like a paid version of Mint. You connect it to a bank account, and then it pulls all your transactions from that account, and shows them to you with very pretty charts. You can categorize your spending, and then it tells you for example, hey, you're spending too much on groceries. It helps you manage your finances. So, I want to use YNAB, and I want to connect it to Chase Bank, but I don't want to give it my Chase password. So instead, I'm going to use OAuth. Let's look at the flow first, and then let's understand what's going on. We're actually going to look at the flow twice , because I think you need to look through an OAuth flow at least two times to understand what's going on. OAuth flow, take 1 So to start, I'm at YNAB, and I want to connect Chase as a source. The OAuth flow looks like this: YNAB redirects me to Chase. At Chase, I log in with my username and password. Chase shows me a screen saying "YNAB wants to connect to Chase. Pick what accounts you want to give YNAB access to". It'll show me a list of all my accounts. Let's say I pick just my checking account, to give YNAB read access to this account, and hit OK. From Chase, I'm redirected back to YNAB, and now, magically, YNAB is connected to Chase. This is the experience from a user's perspective. But what happened there? What magic happened in the background, so that YNAB somehow has access to my data on Chase? The end goal is to end up with an access token Remember, the end goal of OAuth is for YNAB to end up with an access token , so it can access my data from Chase. Somehow, as I went through this flow, YNAB ended up with an access token. I'll spoil the surprise by telling you how it got the access token, and then I'll walk you through what happened in more detail. A quick word on security How does Chase give YNAB the access token? When you were redirected from Chase back to YNAB, Chase could have just added the access token in the URL. It could have redirected you back to a URL like this: and then YNAB would be able to get the access token. BAD IDEA!! An access token is supposed to be secret, but URLs can end up in your browser's history or some server logs, in which case it's easy for anyone to see your access token. So Chase could technically redirect you back to YNAB with the access token in the URL , and then YNAB would have the access token. End of OAuth flow. But we don’t do it this way, because sending an access token in the URL is not secure. When you were redirected from Chase back to YNAB, Chase sent YNAB an authorization code in the URL. An authorization code is not an access token! Chase sends YNAB an authorization code, and YNAB exchanges the authorization code for an access token . It does this by making a backend request to Chase, a backend POST request over HTTPS, which means no one can see the access token. And then YNAB has the access token. End of OAuth flow. OAuth success. Two parts of OAuth Let's talk about what we just saw. At a high level, there are two parts to an OAuth flow. The first is the user consent flow , which is where you, the user, log in and pick what to give access to. This is a critical part of OAuth, because in OAuth, we always want the user to be actively involved and in control. The other part is the authorization code flow . This is the flow where YNAB actually gets this access token . Let's talk about more details of exactly how this works. And let's also talk about some terminology, because OAuth has very specific terminology. Instead of user, we say resource owner . Instead of app, we say OAuth client or OAuth app . The server where you log in is called the authorization server . The server where you get user data from is called the resource server (This could be the same as the authorization server). On the authorization server, when the user picks what's allowed, those are called scopes . I'll try to use that terminology, because you'll need to get familiar with it if you're going to read more OAuth documentation. So let’s look at this high level again, with the new terms. OAuth flow, take 2 You have OAuth clients. An OAuth client wants to access data on a resource server, and the data belongs to the resource owner. To do that, the OAuth client redirects to the authorization server. The user logs in, user agrees to scopes (what this token is allowed to access), and the user gets redirected back to the OAuth client with an authorization code in the URL. On the back end, the OAuth client sends the authorization code and client secret (we'll talk about client secrets shortly) to the authorization server, and the authorization server responds with the access token. That's the exact same flow, but using the new terminology we just discussed. Now let's talk specifics. We've seen what this flow looks like from the user's point of view, let's look at what it looks like from the developer's point of view. Registering a new app To use OAuth, you first need to register a new app. So for example, GitHub provides OAuth. If you want to create a new app for GitHub, you first register it. Different services require different types of data in the app registration, but every service will require at least an app name, because when the user goes to GitHub, for example, GitHub needs to be able to say "Amazon Web Services is requesting read access to your repos and gists" A redirect URI. And we'll talk about what that is shortly. A client ID. This is a public ID that you'll be using to make requests A client secret. You'll be using this to authenticate your request. Client ID, which we just talked about. The redirect URI. Once the user is done on Chase, this is where Chase will redirect them back to. This will be a YNAB url, since you're the YNAB app. Response type, which is usually "code", because we usually want to get back an authorization code, not an access token, which is less secure. Scopes. So what scopes are we requesting? i.e. what user data do we want to access?

0 views
Sean Goedecke 1 months ago

Everything I know about good API design

Most of what modern software engineers do 1 involves APIs: public interfaces for communicating with a program, like this one from Twilio. I’ve spent a lot of time working with APIs, both building and using them

0 views
Can ELMA 2 months ago

Postmortem: How I Crashed an API with a Cloudflare Compression Rule

Sometimes the most valuable lessons come from our biggest mistakes. This is the story of how a single misconfigured Cloudflare compression rule broke our Server-Sent Events (SSE) streaming and brought down an entire API for several hours. Date : August 15, 2025 Duration : 4 hours 23 minutes Impact : ~20% API downtime, 15,000+ affected users Root Cause : Cloudflare Compression Rule Breaking SSE Streaming I was working on performance optimization for our API endpoints. The goal was to reduce bandwidth usage and improve response times by enabling Cloudflare's compression features. I enabled the Cloudflare compression rule: The issue wasn't immediately apparent. The compression rule looked safe, but I had forgotten a critical detail: our API used Server-Sent Events (SSE) for real-time streaming, and Cloudflare's compression breaks SSE . Cloudflare Compression Breaking SSE : The compression rule was enabled without understanding that it buffers data, breaking real-time streaming. This incident taught us that compression isn't always beneficial — it can break real-time protocols like SSE. The key lesson is to understand how infrastructure changes affect your specific use cases, especially streaming protocols.

0 views
W. Jason Gilmore 3 months ago

Notes On the Present State of MCP Servers

I've had the opportunity to spend the last several days immersed in researching the Model Context Protocol and the present state of MCP servers. My early conclusion is this technology is for real and has the potential to entirely change how we use the Internet. That said, like any emerging technology it is most definitely in a state of rapid evolution and so I've compiled a few points here that may be useful to others exploring this topic. It is presently a messy and chaotic space, with both server and client implementations unable to keep up with the rapidly evolving spec. A great example of this is Anthropic deprecating and then removing SSE from transport options ( https://modelcontextprotocol.io/specification/2025-06-18/basic/transports ) while simultaneously advertising their partner extensions which are SSE-based ( https://www.anthropic.com/engineering/desktop-extensions ). That said, I don't think anybody cares, including the major tech companies listed in that partner link, whether their extensions are presently SSE- or Streamable HTTP-based. It is just noise in the grand scheme of things, however SSE will eventually unquestionably be phased out, and doesn't even show up in the latest spec version. MCP client support for critical server features remains uneven. What works in VS Code (server Prompts) does not presently work in Cursor. My personal experiments show Prompts to be a fascinating feature which introduce opportunities for user interactivity not otherwise possible using solely Tools. Not for lack of trying, it remains unclear to me (and apparently almost everybody else, including AWS architects , how OAuth is implemented in MCP servers. Claude Desktop seems to have the best support, as evidenced by the directory they launched a few days ago. Other MCP clients have varying support, and require the use of experimental hacks such as mcp-remote for certain use cases. That said, the exploding mcp-remote weekly download chart is indicative of just how strong the demand presently is for at least experimenting with this new technology. And further, given the obvious advantages OAuth has to offer for enterprises it will only be a matter of time before OAuth is standard. You can already see Anthropic moving in this direction thanks to their recent publication of documents such as this . API key-based authentication works very well across popular clients (VS Code, Cursor, Claude Desktop, etc), and when coupled with a capable authorization solution such as DreamFactory it's already possible to build some really compelling and practical extensions to existing products. To see a concrete example of what I'm talking about, check out this great video by my friend and colleague Terence Bennett. While adding API keys (and MCP servers for that matter) to most clients presently requires a minimal level of technical expertise (modifying a JSON file), my experiments with Claude Desktop extensions (next point) shows installation woes will shortly be a thing of the past. Anthropic (Claude) is emerging as the clear leader in all things MCP which is no surprise considering they invented the concept. Among other things their new Desktop extension spec ( https://www.anthropic.com/engineering/desktop-extensions ) is very cool and I've already successfully built one. I'd love to see this approach adopted on a wider scale because it dramatically lowers the barrier-of-entry in terms of installing MCP servers. Somebody has already started an Awesome Claude Desktop Extensions page which is worth a look. The pace of evolution is such that if you're reading this even a few weeks or months after the publication date, then some or possibly all of what is stated above is outdated. Follow me on Twitter for ongoing updates as I expect to remain immersed in this topic for the foreseeable future. It is presently a messy and chaotic space, with both server and client implementations unable to keep up with the rapidly evolving spec. A great example of this is Anthropic deprecating and then removing SSE from transport options ( https://modelcontextprotocol.io/specification/2025-06-18/basic/transports ) while simultaneously advertising their partner extensions which are SSE-based ( https://www.anthropic.com/engineering/desktop-extensions ). That said, I don't think anybody cares, including the major tech companies listed in that partner link, whether their extensions are presently SSE- or Streamable HTTP-based. It is just noise in the grand scheme of things, however SSE will eventually unquestionably be phased out, and doesn't even show up in the latest spec version. MCP client support for critical server features remains uneven. What works in VS Code (server Prompts) does not presently work in Cursor. My personal experiments show Prompts to be a fascinating feature which introduce opportunities for user interactivity not otherwise possible using solely Tools. Not for lack of trying, it remains unclear to me (and apparently almost everybody else, including AWS architects , how OAuth is implemented in MCP servers. Claude Desktop seems to have the best support, as evidenced by the directory they launched a few days ago. Other MCP clients have varying support, and require the use of experimental hacks such as mcp-remote for certain use cases. That said, the exploding mcp-remote weekly download chart is indicative of just how strong the demand presently is for at least experimenting with this new technology. And further, given the obvious advantages OAuth has to offer for enterprises it will only be a matter of time before OAuth is standard. You can already see Anthropic moving in this direction thanks to their recent publication of documents such as this . API key-based authentication works very well across popular clients (VS Code, Cursor, Claude Desktop, etc), and when coupled with a capable authorization solution such as DreamFactory it's already possible to build some really compelling and practical extensions to existing products. To see a concrete example of what I'm talking about, check out this great video by my friend and colleague Terence Bennett. While adding API keys (and MCP servers for that matter) to most clients presently requires a minimal level of technical expertise (modifying a JSON file), my experiments with Claude Desktop extensions (next point) shows installation woes will shortly be a thing of the past. Anthropic (Claude) is emerging as the clear leader in all things MCP which is no surprise considering they invented the concept. Among other things their new Desktop extension spec ( https://www.anthropic.com/engineering/desktop-extensions ) is very cool and I've already successfully built one. I'd love to see this approach adopted on a wider scale because it dramatically lowers the barrier-of-entry in terms of installing MCP servers. Somebody has already started an Awesome Claude Desktop Extensions page which is worth a look.

0 views

Claude Code with Kimi K2

It looks like Moonshot AI have an Anthropic-compatible API endpoint for their new open frontier model, K2. Since Anthropic lets you set a custom base URL for their API, it's relatively straightforward to set up Claude Code to use K2. Some folks on GitHub put together a workflow to set things up , but...it's a little bit sketchy (and is broken for me). Also, I'm not that excited about instructions that tell you to run commands that pipe to from entities with 'red team' in their names. It also doesn't work that well if you're already a Claude Code user because Claude Code isn't really built to let you swap between different API providers in different sessions. They don't have an easy way to move the systemwide config directory. Thankfully, on Unixlike operating systems, it's pretty easy to...just swap your directory out from under the OS. Head on over to (https://platform.moonshot.ai/console)[https://platform.moonshot.ai/console] and sign up for an account. As of this moment, you'll get $5 in credit for free. Make a directory for 's homedir: Make a shell script :

0 views
Fakeman Show 3 months ago

Lessons learned by building my own cookiecutter for REST APIs

During my university days, I avoided building CRUD apps, not because I couldn’t, but because they felt boring. To me, it was just a client (web or mobile) talking to a server that ran some SQL queries. I dodged them in every project I could. Fast forward a three years, after graduating and working at Oracle, I’ve realized that almost everything in software is just a fancy CRUD app. And you know what?

0 views
Neil Madden 3 months ago

No, no, no. You’re still not doing REST right!

OK, so you’ve made your JSON-over-HTTP API. Then someone told you that it’s not “really” REST unless it’s hypertext-driven. So now all your responses contain links, and you’re defining mediatypes properly and all that stuff. But I’m here to tell you that you’re still not doing it right. What you’re doing now is just “HYPE”. Now I’ll let you in on the final secret to move from HYPE to REST. OK, I’m joking here. But there is an aspect of REST that doesn’t seem to ever get discussed despite the endless nitpicking over what is and isn’t really REST. And it’s an odd one, because it’s literally the name: Representational State Transfer. I remember this being quite widely discussed in the early 2000s when REST was catching on, but seems to have fallen by the wayside in favour of discussion of other architectural decisions. If you’re familiar with OO design, then when you come to design an API you probably think of some service that encapsulates a bunch of state. The service accepts messages (method calls) that manipulate the internal state, from one consistent state to another. That internal state remains hidden and the service just returns bits of it to clients as needed. Clients certainly don’t directly manipulate that state. If you need to perform multiple manipulations then you make multiple requests (multiple method calls). But the idea of REST is to flip that on its head. If a client wants to update the state, it makes a request to the server, which generates a representation of the state of the resource and sends it to the client. Then client then locally makes whatever changes it wants, and then sends the updated representation back to the server. Think of checking out a file from Git, making changes and then pushing the changes back to the server. (Can you imagine instead having to send individual edit commands to make changes?) This was a stunning “ strange inversion of reasoning ” to me at the time, steeped as I was in OO orthodoxy. My first reaction was largely one of horror. But I’d missed the key word “representation” in the description. Returning a representation of the state doesn’t mean it has to directly represent the state as it is stored on the server, it just has to be some logically appropriate representation. And that representation doesn’t have to represent every detail: it can be a summary, or more abstract representation. Is it a good idea? I’ll leave that for you to decide. I think it makes sense in some cases, not in others. I’m more just interested in how this whole radical aspect of REST never gets mentioned anymore. It suggests to me a much more declarative conception of API design, whereas even the most hypertext-driven APIs I see tend to still have a very imperative flavour. Thoughts?

0 views

Analyzing API Design via Algebraic Laws

The other day, someone asked: Why doesn’t [the Data.Map function] allow for different value types the way does? This is a very reasonable question, and it lead down an interesting rabbit hole of at the intersection of API design and efficient implementation. To answer the original question, what would the type of a different value type of look like? It would be something in the flavor of: But this new parameter is somewhat lossy, in that it gives the impression that it could be called with as parameters, which doesn’t fit into the vibe of being a “union.” So instead we could restrict that possibility by using : which seems reasonable enough. But let’s take reasonableness out of the picture and start again from first principles. Instead let’s ask ourselves the deep philsophical question of what even IS a map? A is a particularly efficient implementation of functions with type . But why is this here? It’s really only to encode the “default” value of performing a lookup. Nothing goes wrong if we generalize this to be . In fact, it helps us make sense of the right bias present in , where we see: This equality is hard to justify under the normal understanding of being an encoding of a function . But under the general monoid interpretation, we get a nice semigroup homomorphism: where the monoid in question has been specialized to be . Of course, we also have a monoid homomorphism: Let’s re-evaluate the original question in terms of this newly-generalized . Now that we’ve removed all of the unnecessary baggage of , we can again think about the desired type of : which looks awfully familiar . This new type signature automatically resolves our original concerns about “what should we do if the key isn’t present?”—just call the function with as a parameter! We can give some semantics as to what ought to do again by relating it to the observation . The relevant law here seems like it ought to be: By choosing a degenerate function , say, , where is some value that is not , we can see the beginnings of a problem: Regardless of the key we lookup in our ed , we need to get back . How can we implement such a thing? I see only two ways: #1 is clearly a non-starter, given that we want our s to be efficient encodings of functions, which leaves us with only #2. This is actually a pretty common construction, which stems immediately from the fact that a pair of monoids is itself a monoid. The construction would look something like this: Seems fine, right? The nail in the coffin comes from when we reintroduce our semigroup homomorphism: Without loss of generalization, take (where is just with a constant function.) This gives us: Making this thing efficient is a further complication! We again have two options: #1 clearly requires \(O(n)\) work, which again forces us to look at #2. But #2 seems very challenging, because the monoidal values we need to suspend need not span the entire . For example, consider a constructed a la: Representing this thing efficiently certainly isn’t impossible, but you’re not going to be able to do it on the balanced binary search trees that underlie the implementation of . I find this quite an interesting result. I always assumed that (or at least, ) didn’t have an instance because it would require a constraint on its output—but that’s not the sort of thing we can express in Haskell. But the analysis above says that’s not actually the reason! It’s that there can be no efficient implementation of , even if we could constrain the result. What I find so cool about this style of analysis is that we didn’t actually write any code, nor did we peek into the implementation of (except to know that it’s implemented as a balanced BST.) All we did was look at the obvious laws, instantiate them with degenerate inputs, and think about what would be required to to efficiently get the right answer.

0 views
Corrode 5 months ago

Svix

We don’t usually think much about Webhooks – at least I don’t. It’s just web requests after all, right? In reality, there is a lot of complexity behind routing webhook requests through the internet. What if a webhook request gets lost? How do you know it was received in the first place? Can it be a security issue if a webhook gets handled twice? (Spoiler alert: yes)

0 views
Joel Drapper 6 months ago

LSP-driven API design

Embracing the limitations of Ruby LSP in Ruby API design.

0 views

Posting through it

I'm posting this from a very, very rough cut at a bespoke blogging client I've been having my friend Claude build out over the past couple days. I've long suspected that "just edit text files on disk to make blog posts" is, to a certain kind of person, a great sounding idea...but not actually the way to get me to blog. The problem is that my blog is...a bunch of text files in a git repository that's compiled into a website by a tool called "Eleventy" that runs whenever I put a file in a certain directory of this git repository and push that up to GitHub. There's no API because there's no server. And I've never learned Swift/Cocoa/etc, so building macOS and iOS tooling to create a graphical blogging client has felt...not all that plausible. Over the past year or two, things have been changing pretty fast. We have AI agents that have been trained on...well, pretty much everything humans have ever written. And they're pretty good at stringing together software. So, on a whim, I asked Claude to whip me up a blogging client that talks to GitHub in just the right way. This is the very first post using that new tool, which I'm calling "Post Through It." Ok, technically, this is the fourth post. But it's the first one I've actually been able to add any content to.

0 views
Andre Garzia 8 months ago

Creating a simple posting interface

I recently noticed that I haven't been posting much on my blog which surprised me because blogging has always been among my favourite activities. The main obstacle that has prevented me from posting more often is that I didn't had an easy to use interface or app for doing so. When this blog was done with Racket + NodeJS, I implemented a MetaWeblog API server and thus could use apps such as MarsEdit to post to my blog. Once I rebuilt using Lua, I didn't finish implementing that API — I got it halfway done — and thus couldn't use that app anymore. I implemented Micropub API in Lua but am yet to find an app I like to use that supports that spec. Thankfully, Micropub is such an easy spec to implement that creating a little client for it can be done in hours if not minutes. Today, in about two hours, I made a small single-file HTML editor for my blog. It allows me to create new posts with ease including file uploads. It is actually the interface I'm using to write this post right now. It is a simple HTML form with 137 lines of vanilla JavaScript. All the JS does is simple cosmetics such as disabling buttons when posting or uploading is happening (so I don't press them twice) and using the fetch API to send data to the server. Of course this editor is super simple. There's barelly any error checking and most of the errors will just be console messages, but it is enough for my day to day usage. It serves its purpose which is to provide an easy way for me to make posts. I wonder what new features I'll implement as the week moves on.

0 views
Daniel Mangum 10 months ago

This Website is Hosted on Bluesky

Well, not this one. But this one is! How? Let’s take a closer look at Bluesky and the AT Protocol that underpins it. Note: I communicated with the Bluesky team prior to the publishing of this post. While the functionality described is not the intended use of the application, it is known behavior and does not constitue a vulnerability disclosure process. My main motivation for reaching out to them was because I like the folks and don’t want to make their lives harder.

0 views
Bill Mill 1 years ago

Serving a billion web requests with boring code

When I worked as a contractor to the US government at ad hoc , I was fortunate enough to get the opportunity to design large parts of a relaunch of medicare plan compare , the US government site through which hundreds of thousands of medicare recipients purchase their health care plans each year. We released after about a year of development, and were able to help millions of people find and purchase health care - if you're in the US, you are pretty likely to know somebody who used this system. Though the US health care system is incredibly awful in many respects, I'm very proud of what the team I worked with was able to build in a short time frame and under a lot of constraints. The team of people that I worked with - managers, designers, engineers and business analysts - were excellent from top to bottom. I've never before experienced such a dedicated, collaborative, trusting team, and I learned a tremendous amount from them. I especially want to share this story because I want to show that quality software can get written under government constraints. It can! And if we all start believing that it can, we're more likely to produce it. I worked on this system for about two and a half years, from the very first commit through two open enrollment periods. The API system served about 5 million requests on a normal weekday, with < 10 millisecond average request latency and a 95th percentile latency of less than 100 milliseconds. It served a very low baseline rate of errors, mostly spurious errors due to vulnerability scrapers. I'm proud that I can count the number of times an engineer was woken up by an emergency page on one hand. I was amazed at how far you can get by leaning on postgres and golang, keeping your system as organized and simple as possible, and putting in work every day for a long period of time. My north star when building the system was to keep it as boring as possible, in the Dan McKinley sense . (Go read "Choose Boring Technology" if you haven't yet, it's required reading. It's better than this article.) There is a concept of "Innovation tokens" in that article, and I was explicit in choosing the pieces I used to build the site with how I spent them. There are many valid criticisms of react; this piece is an example, and I was aware of the issues already in 2018 when I was building the site. The main thrust is that it tends towards large application bundles, which take a long time to download and execute, especially on the cheap mobile phones that are the main link to the internet for so many people. In building a piece of infrastructure for the government, it was especially concerning that the application be available to as many people as possible. We took accessibility seriously both in the sense that the site needed to have proper design for users with disabilities and also in the sense that people with many devices needed to connect to it. Nevertheless, I chose an SPA architecture and react for the site. I would have loved to have done differently, but I worried that choosing to use a multi-page architecture or a different library would have slowed us down enough that we wouldn't have delivered on our tight timeline. I didn't have enough trust in any of the alternatives available to me at the time to make me believe we could choose them safely enough. The result fell prey after a few years to a common failure mode of react apps, and became quite heavy and loaded somewhat slowly. I still think I made the right choice at the time, but it's unfortunate that I felt I had to make it and I wish I had known of a nice clean way to avoid it. Golang was overall a joy to build this project in. It runs efficiently both at build time and at run time, and having binary executable artifacts that build quickly makes it easy to deploy rapidly. Developers new to the language (our team of engineers grew from 2 to 15) were able to get onboard quickly and understand the language with no trouble. Error handling being both immediate and verbose is, in my opinion, a great feature for building systems that are resilient. Every time you do something that might fail, you are faced with handling the error case, and once you develop patterns they are consistent and predictable. (I know this is a big topic, I should probably write more about it) The day that we were able to switch to go modules, a big pain point went away for us. We hit a few bumps in the road as very early adopters but it was worth it. My biggest gripe with the golang ecosystem was that the documentation generation for projects that are not public sucks. For a long time, the documentation generator didn't even support projects that used modules. That said, I was overwhelmingly happy with my choice here and never regretted it. I made two architectural bets that I was less confident of than the others: I split the backend up into three parts; they all lived in the same repository but were designed such that they could be pulled apart and given to a new team if necessary. Each component had its own postgres database (which were physically co-located, but never intertwined) and strictly used gRPC to communicate between themselves. The split was largely based around data access patterns: One thing the site needed to be able to do was estimate the cost of any packaging variation of any drug at any pharmacy on any health insurance plan. Easy, right? This combinatorial explosion (I once calculated how many trillions of possibilities this was) necessitated a very carefully-indexed database, a large amount of preprocessing, and a commitment to engineering with performance in mind. It took a long time and a ton of government health system reverse engineering to figure out how to get even close to right with this part. I'm forever indebted to my colleagues who dove deep into the depths of CMS bureaucracy, and then turned it into documentation and code. The main purpose of the site was for people to search for and purchase medicare part C and part D health care plans. Every day, we'd get a new dump of detailed health care plan information from CMS; this module would load the information into a new postgres database, and then we'd deploy a new version pointing at the new day's data. Both and had entirely immutable databases in this way; their only job was to serve an API based on the most recent data for each. In the insurance argot, a person on a health care plan is a "beneficiary" of the plan. It sounds a little self-important to me, but that's just what it is I suppose. (One thing I tried to do throughout my work on this project was to use the proper industry jargon wherever possible rather than something more familiar to myself. I felt it was part of the commitment to boringness to keep the jargon friction down to a minimum.) The job of the module was to store information about plan customers, and was the only part of the application where the database was long-lived and mutable. We strove to store as little data as possible here, to minimize the risk should there be any data leakage, but there was no way around storing a very scary amount of Personally Identifiable Information (PII). We were as serious as possible about this data, and the risk of losing control of it kept me nervous at all times. Overall, gRPC was not as great for us as I'd hoped when beginning the project. The best benefit of using it, and the driver behind my choice to use it, was that every interface was specified in code. This was very useful; we could generate tools and interfaces that changed in lockstep. The biggest pain points were all related to the tooling. I maintained a set of very hairy makefiles with eldritch commands to build all the interfaces into go files that we could use, and debugging those was always painful. Not being able to curl the system, as we would if it were a JSON API, was a pain in the butt. existed, and we used it, but was not nearly as nice. grpc-gateway was the best part of the ecosystem I used, it served more than a billion requests for us and was never once the source of a problem. It enabled us to do what gRPC ought to have been able to do from the start, serve requests to web clients. I loved having interface schemas, but we used so few of gRPC's features and the code generation was so complicated that we probably would have been slightly better off without it. We followed a strict backwards-compatibility requirement, and only added and never removed fields from our interfaces. Once a field was exposed in the public API, it would be exposed forever unless it became a security problem (which, thankfully, never happened to us in the years I worked on this project). We did the same with the databases as the API; columns were added and rarely removed. If a column actually merited removal, the process was to add a column, remove all references to the old one, wait a few weeks to make sure we didn't need to roll back , then finally remove the column from the database. Our discipline with backwards compatibility gave us freedom to keep up a high rate of changes and maintain confidence that local changes would not have negative downstream consequences. A core principle of the app was to rely on postgres whenever possible, and also to be stupid instead of clever whenever possible. Faceted search was an excellent example of both of those properties. We could have reached for elasticsearch, and we also could have tried to use an ORM, or built a little faceting language. We implemented faceted search by having a well-indexed table of plans, and building a SQL query string by tacking on conditions based on a long series of conditions. The function which implemented the core part of this scheme is a single 250 line function, heavily commented, which lays out the logic in a nearly flat way. The focus is kept squarely on business requirements, instead of on fancy code. We stored the database schemas in a series of files with leading numbers, so that they could be loaded in order at database creation time. For and , there were no migrations, because the database was recreated every day. Instead, there was a number stored in both the database and the application. Given that we (tried very hard to) never make backwards incompatible changes to the database, the apps would check at startup that their database schema version number was greater than or equal to the database version number stored in the database, and refuse to start if they were not. This was a general pattern : If an app encountered any unexpected or missing configuration, it refused to start and threw noticeable, hopefully clear, errors. I tried hard to make it so that if the apps actually started up, they would have everything they needed to run properly. There were occasional instances where we accidentally rolled out backwards-incompatible changes to the database, and in those cases we generally rolled back the data update and rebuilt it. The part of the system I'm most proud of, and on which I spent the most effort, is the ETL process. We had a series of shell scripts for each data source we ingested (there were many), which would pull the data and put it in an s3 bucket. Then, early in the morning, a cron job would spin up an EC2 instance, which would pull in the latest ETL code and all the data files. It would spin up a new database in our RDS instance, and begin the ETL process. If things went well, right about the time the east coasters got into work, a new database would be rotating into service. My recollections are not exact, but it took something like two to four hours to generate a new database with more than 250 million rows out of several gigabytes of text files in various formats. The code to insert data into the database heavily utilized postgres' statement, avoiding s as much as possible in favor of generating batches of collections that could be ed into the database. We used the xo library to connect to the database and generate models, along with heavily customized templates. The templates themselves, and the code to create the models from them was hairy. Thankfully it mostly only had to be written once and occasionally edited. Here was my biggest mistake : I invested a great deal of time and effort in creating sql-mock tests for data that changed regularly. These tests needed constant, tedious maintenance. I should instead have tested against a live database, especially given that we were working mostly with immutable databases and wouldn't have had to deal with recreating it for each test. Each table in the database had an accompanying script that would generate a subset of the data for use in local development, since the final database was too large to run on a developer's machine. This let each developer work with a live, local, copy of the database and enabled efficient development of changes. I highly recommend building in this tooling from the start, it saves you from either trying to add it in once your database grows large, or having your team connect to a remote database, making development slower. We had a CLI tool, written mostly as a bunch of shell scripts, with a ton of available commands that performed all kinds of utility functions related to observability and operations. It was mostly written by an excellent coworker, and working with it is where I learned to write effective shell scripts. (Thanks to Nathan and shellcheck for this vital skill). Having this available from the start served as a really useful place for utility features to coalesce; without it they tend to scatter to more places further afield, or just live in a developer's shell history. One fun bit of tooling I built was the ability to generate graphs from splunk (our log aggregation service) via slack commands, which was particularly helpful in incident handling, as you could easily share graphs with your coworkers. Every request that entered the backend got a request id as soon as it hit the system, and that request id was carried around with it wherever it went. Middleware gave the request its id, and then another middleware constructed a sub-logger with the request id embedded into it that was attached to the request context, so that all logs always had the request ID attached. The system logged on entry and exit, with as much detail as was safe. Any other logging that was above the debug level was supposed to be exceptional, although we weren't super strict about that. We used zerolog , and it worked great for us. At some point, I converted the markdown docs I and others had written in github into a book about how the system worked, using sphinx-book-theme . Miraculously, this book gained traction, and I got great contributions from teammates and everybody knew where to look to find system documentation. I have started documentation websites for many other projects, and no other ones have ever worked as successfully, and I wish that I had any idea why that was but I don't. It proudly featured our mascot (the corgi) showing off its most notable feature Our client frequently wanted us to add queries that would operate from the browser, and I was fortunate to be able to push back and turn many of those into build-time requests instead. One place where our performance got killed at our clients' request was with render-blocking analytics scripts; it seemed every team wanted a different script run to get the analytics that they "needed". I advised them against it and tried to demonstrate the performance and download size problems they incurred, but the client was not interested in my arguments. There are so many more parts of a system like this that I haven't covered here; I mostly wanted to write down a bit about a bunch of the pieces while I still remember what they are. I was very fortunate to be able to work with such a positive, creative, and engaged team that made the space for such a successful project to be possible. An article about the social factors and personalities that made the team go, and the site happen, would be a second article as long as this one is.

0 views
Ginger Bill 1 years ago

String Type Distinctions

[Originally from a Twitter Thread] Original Twitter Post One thing many languages & API designers get wrong is the concept of a string. I try to make a firm distinction between: They are not equivalent even if you can theoretically use them as such, and so many garbage collected language use them as such. They have different use cases which don’t actually overlap in practice. Most of the issues with strings come from trying to merge concepts into one. In Odin , the distinction between a string value and byte array is very important. is semantically a string and not an array of 8-bit unsigned integers. There is an implied character encoding (UTF-8) as part of the value. is also an immutable value in Odin. Having a string be immutable allows for a lot of optimizations, but in practice, you never want to mutate the string value itself once it has been created. And when you do mutate it, it most definitely a bug. This is why it is important to make a distinction between #1 and #3 and separate the concepts. Another way to conceptualize the ideas is as the following: Coupled with Run-Time Type Information (RTTI), having a distinction between []byte and string allows for a lot of really decent (de)serialization tooling, especially for “magical” printing (e.g. fmt.println). P.S. Even C makes a distinction between a string an array of integers string value ( or ) string builder ( or ) Backing buffer for a string ( or ) (3) is the “backing data”, an arena of sorts (fixed or dynamic) (2) are the operations on that buffer (fixed or dynamic) (1) is the final value that points to (3) and produced by (2)

0 views