Posts in Api (20 found)
iDiallo Yesterday

Demerdez-vous: A response to Enshittification

There is an RSS reader that I often used in the past and have become very reliant on. I would share the name with you, but as they grew more popular, they have decided to follow the enshittification route. They've changed their UI, hidden several popular links behind multilayered menus, and they have revamped their API. Features that I used to rely on have disappeared, and the API is close to useless. My first instinct was to find a new app that will satisfy my needs. But being so familiar with this reader, I've decided to test a few things in the API first. Even though their documentation doesn't mention older versions anymore, I've discovered that the old API is still active. All I had to do was add a version number to the URL. It's been over 10 years, and that API is still very much active. I'm sorry I won't share it here, but this has served as a lesson for me when it comes to software that becomes worse over time. Don't let them screw you, unscrew yourself! We talk a lot about "enshittification"these days. I've even written about it a couple of times. It's about how platforms start great, get greedy, and slowly turn into user-hostile sludge. But what we rarely talk about is the alternative. What do you do when the product you rely on rots from the inside? The French have a phrase for this: Demerdez-vous. The literal translation is "unshit yourself". What it actually means is to find a way, even if no one is helping you. When a company becomes too big to fail, or simply becomes dominant in its market, drip by drip, it starts to become worse. You don't even notice it at first. It changes in ways that most people tolerate because the cost of switching is high, and the vendor knows it. But before you despair, before you give up, before you let the system drag you into its pit, try to unscrew yourself with the tools available. If the UI changes, try to find the old UI. Patch the inconvenience. Disable the bullshit. Bend the app back into something humane. It might sound impossible at first, but the tools to accomplish this exist and are widely being used. Sometimes the escape hatch is sitting right there, buried under three layers of "Advanced" menus. On the web I hate auto-playing videos, I don't want to receive twelve notifications a day from an app, I don't care about personalization. But for the most part, these can be disabled. When I download an app, I actually spend time going through settings. If I care enough to download an app, or if I'm forced, I'll spend the extra time to ensure that an app works to my advantage, not the other way around. When that RSS reader removes features from the UI, but not from their code, I was still able to continue using them. Another example of this is reddit. Their new UI is riddled with dark patterns, infinite scroll, and popups. But, go to , and you are greeted with that old UI that may not look fancy, but it was designed with the user in mind, not the company's metrics. I also like YouTube removed the dislike button. While it might be hurtful to content creators to see the number of dislikes, as a consumer, this piece of data served as a filter for lots of spam content. For that of course there is the "Return Youtube Dislike" browser extension. Extensions often can help you regain control when popular websites remove functionality useful to users, but the service no longer wants to support. There are several tools that enhance youtube, fix twitter, and of course uBlock. It's not always possible to combat enshittification. Sometimes the developer actively enforces their new annoying features and prevents anyone from removing them. In cases like these, there is still something that users can do. They can walk away. You don’t have to stay in an abusive relationship. You are allowed to leave. When you do, you'll discover that there was an open-source alternative. Or that a small independent app survived quietly in the corner of the internet. Or even sometimes, you'll find that you don't need the app at all. You break your addiction. In the end, "Demerdez-vous" is a reminder that we still have agency in a world designed to take it away. Enshittification may be inevitable, but surrender isn’t. There’s always a switch to flip, a setting to tweak, a backdoor to exploit, or a path to walk away entirely. Companies may keep trying to box us in, but as long as we can still think, poke, and tinker, we don’t have to live with the shit they shovel. At the end of the day "On se demerde"

0 views
iDiallo 2 weeks ago

What Actually Defines a Stable Software Version?

As a developer, you'll hear these terms often: "stable software," "stable release," or "stable version." Intuitively, it just means you can rely on it. That's not entirely wrong, but when I was new to programming, I didn't truly grasp the technical meaning. For anyone learning, the initial, simple definition of "it works reliably" is a great starting point. But if you're building systems for the long haul, that definition is incomplete. The intuitive definition is: a stable version of software that works and that you can rely on not to crash. The technical definition is: a stable version of software where the API will not change unexpectedly in future updates. A stable version is essentially a guarantee from the developers that the core interface, such as the functions, class names, data structures, and overall architecture you interact with, will remain consistent throughout that version's lifecycle. This means that if your code works with version 1.0.0, it should also work flawlessly with version 1.0.1, 1.0.2, and 1.1.0. Future updates will focus on bug fixes, security patches, and performance improvements, not on introducing breaking changes that force you to rewrite your existing code. My initial misunderstanding was thinking stability was about whether the software was bug-free or not. Similar to how we expect bugs to be present in a beta version. But there was still an upside to this confusion. It helped me avoid the hype cycle, especially with certain JavaScript frameworks. I remember being hesitant to commit to new versions of certain tools (like early versions of React, Angular, though this is true of many fast-moving frameworks and SDKs). Paradigms would shift rapidly from one version to the next. A key concept I'd mastered one month would be deprecated or replaced the next. While those frameworks sit at the cutting edge of innovation, they can also be the antithesis of stability. Stability is about long-term commitment. Rapid shifts force users to constantly evolve with the framework, making it difficult to stay on a single version without continual, large-scale upgrades. A truly stable software version is one you can commit to for a significant amount of time. The classic example of stability is Python 2. Yes, I know many wanted it to die by fire, but it was first released in 2000 and remained active, receiving support and maintenance until its final update in 2020. That's two decades of stability! I really enjoyed being able to pick up old scripts and run them without any fuss. While I'm not advocating that every tool should last that long, I do think that when we're building APIs or stable software, we should adopt the mindset that this is the last version we'll ever make. This forces us to carefully consider the long-term design of our software. Whenever I see LTS (Long-Term Support) next to an application, I know that the maintainers have committed to supporting, maintaining, and keeping it backward compatible for a defined, extended period. That's when I know I'm working with both reliable and stable software.

0 views
Neil Madden 2 weeks ago

Were URLs a bad idea?

When I was writing Rating 26 years of Java changes , I started reflecting on the new HttpClient library in Java 11. The old way of fetching a URL was to use URL.openConnection() . This was intended to be a generic mechanism for retrieving the contents of any URL: files, web resources, FTP servers, etc. It was a pluggable mechanism that could, in theory, support any type of URL at all. This was the sort of thing that was considered a good idea back in the 90s/00s, but has a bunch of downsides: The new HttpClient in Java 11 is much better at doing HTTP, but it’s also specific to HTTP/HTTPS. And that seems like a good thing? In fact, in the vast majority of cases the uniformity of URLs is no longer a desirable aspect. Most apps and libraries are specialised to handle essentially a single type of URL, and are better off because of it. Are there still cases where it is genuinely useful to be able to accept a URL of any (or nearly any) scheme? Fetching different types of URLs can have wildly different security and performance implications, and wildly different failure cases. Do I really want to accept a mailto: URL or a javascript: “URL” ? No, never. The API was forced to be lowest-common-denominator, so if you wanted to set options that are specific to a particular protocol then you had to cast the return URLConnection to a more specific sub-class (and therefore lose generality).

0 views
xenodium 2 weeks ago

Want a WhatsApp Emacs client? Will you fund it?

Like it or not, WhatsApp is a necessity for some of us. I wish it weren't the case, but here we are. Given the circumstances, I wish I could use WhatsApp a little more on my terms. And by that, I mean from an Emacs client, of course. Surely I'm not the only one who feels this way, right? Right?! Fortunately, I'm not alone . With that in mind, I've been hard at work prototyping, exploring what's feasible. Spoiler alert: it's totally possible, though will require a fair bit of work. Thankfully, two wonderful projects offer a huge leg up: wuzapi and whatsmeow . wuzapi offers a REST API on top of whatsmeow , a Go library leveraging WhatsApp's multi-device web API. Last week, I prototyped sending a WhatsApp message using 's API. I got there fairly quickly by onboarding myself on to using its web interface and wiring shell-maker to send an HTTP message request via . While these two were enough for a quick demo, they won't cut it for a polished Emacs experience. While I can make REST work, I would like a simpler integration under the hood. REST is fine for outgoing messages, but then I need to integrate webhooks for incoming events. No biggie, can be done, but now I have to deal with two local services opening a couple of ports. Can we simplify a little? Yes we can. You may have seen me talk about agent-shell , my Emacs package implementing Agent Client Protocol (ACP) … Why is this relevant, you may ask? Well, after building a native Emacs implementation, I learned a bit about json-rpc over standard I/O. The simplicity here is that we can bring bidirectional communication to an Emacs-owned process. No need for multiple channels handling incoming vs outgoing messages. So where's this all going? I've been prototyping some patches on top of wuzapi to expose over standard I/O (as an alternative to ). This prototype goes far beyond my initial experiment with sending messages, and yet the Emacs integration is considerably simpler, not to mention looking very promising. Here's a demo showing incoming WhatsApp messages, received via , all through a single Emacs-owned process. Look ma, no ports! These early prototypes are encouraging, but we've only scratched the surface. Before you can send and receive messages, you need to onboard users to the WhatsApp Emacs client. That is, you need to create a user, manage/connect to a session, authorize via a QR code, and more. You'll want this flow to be realiable and that's just onboarding. From there, you'll need to manage contacts, chats, multiple message types, incoming notifications… the list goes on. That's just the Emacs side. As mentioned, I've also been patching . My plan is to upstream these changes , rather than maintaining a fork. I've prototyped quite a few things now, including the onboarding experience with QR code scanning. At this point, I feel fairly optimistic about feasibility, which is all pretty exciting! But there's a bunch of work needed. Since going full-time indie dev, I have the time available (for now), but it's hard to justify this effort without aiming for some level of sustainability. If you're interested in making this a reality, please consider sponsoring the effort , and please reach out to voice your interest ( Mastodon / Twitter / Reddit / Bluesky ). Reckon a WhatsApp Emacs client would help you stay focused at work (less time on your phone)? Ask your employer to sponsor it too ;-)

0 views
Jim Nielsen 4 weeks ago

Browser APIs: The Web’s Free SaaS

Authentication on the web is a complicated problem. If you’re going to do it yourself, there’s a lot you have to take into consideration. But odds are, you’re building an app whose core offering has nothing to do with auth. You don’t care about auth. It’s an implementation detail. So rather than spend your precious time solving the problem of auth, you pay someone else to solve it. That’s the value of SaaS. What would be the point of paying for an authentication service, like workOS, then re-implementing auth on your own? They have dedicated teams working on that problem. It’s unlikely you’re going to do it better than them and still deliver on the product you’re building. There’s a parallel here, I think, to building stuff in the browser. Browsers provide lots of features to help you deliver good websites fast to an incredibly broad and diverse audience. Browser makers have teams of people who, day-in and day-out, are spending lots of time developing and optimizing new their offerings. So if you leverage what they offer you, that gives you an advantage because you don’t have to build it yourself. You could build it yourself. You could say “No thanks, I don’t want what you have. I’ll make my own.” But you don’t have to. And odds are, whatever you do build yourself, is not likely to be as fast as the highly-optimized subsystems you can tie together in the browser . And the best part? Unlike SasS, you don’t have to pay for what the browser offers you. And because you’re not paying, it can’t be turned off if you stop paying. , for example, is a free API that’ll work forever. That’s a great deal. Are you taking advantage? Reply via: Email · Mastodon · Bluesky

0 views
underlap 1 months ago

IPC channel multiplexing: what next?

About three months ago, I posted IPC channel multiplexing: next steps . Since then I’ve taken a complete break from the project to make the most of the summer and to go on holiday. As I consider how to proceed, I think those next steps still make sense, but that’s not the whole story. The basic problem the multiplexing prototype is trying to solve is as follows. If an IPC channel endpoint is sent over another IPC channel, when it is received, it consumes a file descriptor (at least on Unix variants). A new file descriptor is consumed even if the same IPC channel endpoint is received multiple times. This can crash the receiving process if it runs out of file descriptors. The thing that has changed in the intervening gap is my motivation. I really enjoyed implementing multiplexing of IPC channels as it was relatively self-contained. Extending the API to support more Servo usecases does not feel so fun. Also, I would like more assurance that if I invest the effort to make IPC channel multiplexing suitable for adoption by Servo, that there’s a reasonable chance it will actually be adopted. There seem to be relatively few Servo developers who understand IPC channel well enough to engage with adopting multiplexing. Plus they are likely to be very busy with other things. So there may simply be a lack of traction. Also, multiplexing isn’t a simple piece of code, so merging it into IPC channel will increase the project’s size and complexity and therefore its maintenance cost. There may be performance or usability issues in adopting multiplexing. I’m not aware of any such issues and I don’t anticipate these being significant if they crop up, but there’s still a risk. Currently, I’m working in isolation from the Servo team and I’d like some reassurance that the direction I’m heading in is likely to be adopted. The advantages of continuing are: The disadvantages of continuing are: On balance, I think I’ll continue. It would be possible to move the multiplexing prototype to a separate repository and crate which on the IPC channel crate. The advantages of this are: One possible disadvantage is that it would not be possible to reuse IPC channel internals. For example, if one of the missing features for multiplexing was essentially the same as that for vanilla IPC channel, I couldn’t just generalise the code and share it. I think the most effective way forward is to test the Servo team’s willingness to adopt multiplexing by focussing on a usecase that is known to exhibit the bug, reproducing the bug in isolation, showing that multiplexing fixes the bug, and proposing a fix for Servo. So I’ll start by looking at the bug reports, picking one, and looking at the IPC channel usecase in Servo which hits the bug. I’ll defer the decision of whether to package the prototype as a separate repository until I start to touch the prototype code again. This is contrary to the sunk cost fallacy. ↩︎ I’m not sure what else I would prefer to do with my spare mental capacity. ↩︎ I really dislike Microsoft’s policy of trawling github.com to build AI models. I’m also shocked about Microsoft’s willingness to create e-waste by dead-ending Windows 10 and not supporting older hardware with Windows 11, although they have delayed the deadline with the Windows 10 Extended Security Updates (ESU) programme . (On the other hand, maybe this move will push more people to adopt Linux. See END OF 10 .) ↩︎ Unfortunately, and it’s a big unfortunately, this still requires the repository to be mirrored to github.com . See Non-Github account creation . ↩︎ Capitalising on the effort already expended. [1] Potentially fixing the bug. Overcoming the difficulties involved would give a greater sense of achievement. I enjoy solving difficult problems and it would keep my brain active. Potentially wasting more effort. Now may be an opportunity to retire properly from my career in software development. [2] It could increase the profile of the prototype. I could host this on codeberg.org rather than github.com [3] Ease of code navigation, since the code would be pure Rust rather than multiplatform. Ease of CI: Linux only. Ease of promotion of changes, since it wouldn’t require the involvement of IPC channel committers. Publication to crates.io for ease of consumption by Servo. [4] Documentation could be centred on multiplexing. This is contrary to the sunk cost fallacy. ↩︎ I’m not sure what else I would prefer to do with my spare mental capacity. ↩︎ I really dislike Microsoft’s policy of trawling github.com to build AI models. I’m also shocked about Microsoft’s willingness to create e-waste by dead-ending Windows 10 and not supporting older hardware with Windows 11, although they have delayed the deadline with the Windows 10 Extended Security Updates (ESU) programme . (On the other hand, maybe this move will push more people to adopt Linux. See END OF 10 .) ↩︎ Unfortunately, and it’s a big unfortunately, this still requires the repository to be mirrored to github.com . See Non-Github account creation . ↩︎

0 views
Wreflection 1 months ago

Wrapping Your Head Around AI Wrappers

“That’s just an AI wrapper.” The put‑down is now familiar for anyone developing something new using Artificial Intelligence. The push-back is just as familiar. “Everything is a wrapper. OpenAI is a wrapper around Nvidia and Azure. Netflix is a wrapper around AWS. Salesforce is an Oracle database wrapper valued at $320 billion,” says Perplexity CEO Aravind Srinivas 1 . For those not familiar with the term “AI Wrapper,” here’s a good definition 2 . It is a dismissive term that refers to a lightweight application or service that uses existing AI models or APIs to provide specific functionality, typically with minimal effort or complexity involved in its creation. A popular example of an AI wrapper are apps that enable users to “chat” with a PDF. This type of AI application allows users to upload a PDF document, such as a research paper, and interact with an AI model to quickly analyze and obtain answers about the specific content. In the early days of ChatGPT, uploading documents as part of the prompt or creating a custom GPT was not possible, so these apps became very popular, very fast. AI Wrapper Meme: An API call to OpenAI under the hood. In my view, this AI wrapper debate misses a larger point. Wrappers are not all the same. Thin tricks enjoy a brief run and last only until big platforms bundle them into their suites. But products that live where users already work, write back to a proprietary system of record , and/or can make use of proprietary data can endure. The wrapper label is a distraction from what I think actually matters: (1) Is it a feature or a product, and (2) How big is the market segment. Thanks for reading Wreflection! Subscribe for free to receive new posts and support my work. Begin with the earlier example of a wrapper that lets you chat with a PDF. Such a tool solves one narrow problem - answering questions about a document. It does not create new documents or edit existing ones. It typically does not capture any unique data, or learn from user behavior. It is a means to an end; a capability rather than an end-to-end solution. As a result, this kind of feature belongs inside a document viewer or editor, or in the flagship applications of model providers. So when the foundation models themselves (OpenAI/ChatGPT, Anthropic/Claude, Google/Gemini) bundle this feature natively, the standalone tool becomes redundant. This is classic feature behavior - easy to copy, no end-to-end job, no moat or long-term defensibility. One caveat though; even those that are features can be an interesting indie businesses that make money until the platforms build it into their apps 3 . PDF.ai $500K MRR, PhotoAI $77K MRR, Chatbase $70K MRR, InteriorAI $53K MRR 4 . Jenni AI went from $2,000 to over $333,000 MRR in just 18 months 5 . Some wrappers are genuine products but live in market segments so large that model builders and big tech platforms cannot ignore them. Two vectors of competition come into play: (1) model access, and (2) distribution. Coding assistants illustrate both. Tools like Cursor turned a wrapper into a development environment that reads the repo, edits files, writes code, reverts changes, runs agents, and reimagines the developer experience for the AI-era. The market justifies the attention. Software developers represent roughly 30% of the workforce at the world’s five largest market cap companies, all of which are technology firms as of 2025 6 . Development tools that boost productivity by even modest percentages unlock billions in value. That makes this segment a prime target for both model builders and incumbents that already own distribution channels. But Cursor and other such tools depend almost entirely on accessing Anthropic, OpenAI and Gemini models. Developer forums are filled with complaints about rate limits from paying subscribers. In my own experiences, I exhausted my Claude credits in Cursor mid-project and despite preferring Cursor’s user interface and design, I migrated to Claude Code (and pay ten times more to avoid rate limits). The interface is good, but model access proved decisive. This foundation model competition extends to every category that OpenAI Applications CEO flagged as strategic (Knowledge/Tutoring, Health, Creative Expression, and Shopping) as well as other large market segments such as Writing Assistants, Legal Assistants, etc. Distribution poses the second threat. Even where model builders stay out, startups face a different race - can they build a user base faster than incumbents with existing products and distribution can add AI features? This is the classic Microsoft Teams vs. Slack Dynamic 7 . The challenge is in establishing a loyal customer base before Microsoft embeds Copilot in Excel/PowerPoint, or Google weaves Gemini into Workspace, or Adobe integrates AI across its creative suite. A standalone AI wrapper for spreadsheets or presentations must overcome not just feature parity but bundling/distribution advantages and switching costs. This distribution competition from incumbents also holds in other large markets such as healthcare and law. In these markets, regulatory friction and control of systems of record favor established players such as Epic Systems in healthcare. For e.g. A clinical note generator that cannot write to the Electronic Health Record (EHR) will likely come up against Epic’s distribution advantages sooner or later. Three caveats here: (1) First, speed to market can create exit options even without long-term defensibility; tools like Cursor may lack control over its core dependency (model access), but rapid growth make them attractive targets for model builders seeking instant market presence. (2) Second, superior execution occasionally beats structural advantage; Midjourney’s product quality convinced Meta to use it despite Meta’s substantially larger budget and distribution power. (3) Third, foundation models may avoid certain markets despite their size; regulatory burden in healthcare and legal, or reputational damage from AI companions or pornographic adult content may provide opportunities for operators willing to face extreme regulatory scrutiny or controversy. The opportunity remains large 8 , but competition (and/or acquisition) can come knocking. Cursor went from zero to $100 million in recurring revenue in 18 months, and became the subject of recurring OpenAI acquisition rumors. Windsurf , another coding assistant, received a $2.4B acquisition licensing deal from Google. Gamma reached $50 million in revenue in about a year. Lovable hit $50 million in revenue in just six months. Galileo AI acquired by Google for an undisclosed amount. Not every market gap attracts model builders or big tech. A long tail of jobs exists that are too small for venture scale but large enough to support multimillion-dollar businesses. These niches suit frugal founders with disciplined scope and lean operations. Consider those Manifestation or Horoscopes or Dream Interpreter AI apps. A dream interpreter that lets users record dreams each morning, generates AI videos based on them, maintains some kind of dream journal, and surfaces patterns over time solves a complete job. Yes, users could describe dreams to ChatGPT and it even stores history/memory, but a dedicated app can structure the dream capture with specific fields (recurring people, places, things, themes etc.) and integrate with sleep tracking data in ways a general chatbot likely cannot. Such a niche is small enough to avoid model attention but large enough to sustain a profitable indie business. While the previous categories frame opportunities for new ventures, incumbents face their own strategic choices in the wrapper debate when model builders arrive. Those that navigate model builder competition share two characteristics. First, they own the outcome even when they don’t own the model. Applications already embedded in user workflows (Gmail/Calendar, Sheets, EHR/EMR, Figma) require no new habit formation, and building these platforms from scratch is much harder than adding AI capability to existing ones. When these applications ship actions directly into a proprietary system of record (managing the calendar, filing the claim, creating the purchase order, and so on), “done” happens inside the incumbent’s environment. AI becomes another input to an existing workflow rather than a replacement for it. Second, successful incumbents build proprietary data from customer usage. Corrections, edge cases, and approvals become training data that refines the product over time, that a frontier model will not have access to. Cursor, though not an incumbent and despite its dependence on external models, plans to compete by capturing developer behavior patterns as CEO Michael Truell notes in his Stratechery interview : Ben: Is that a real sustainable advantage for you going forward, where you can really dominate the space because you have the usage data, it’s not just calling out to an LLM, that got you started, but now you’re training your own models based on people using Cursor. You started out by having the whole context of the code, which is the first thing you need to do to even accomplish this, but now you have your own data to train on. Michael: Yeah, I think it’s a big advantage, and I think these dynamics of high ceiling, you can kind of pick between products and then this kind of third dynamic of distribution then gets your data, which then helps you make the product better. I think all three of those things were shared by search at the end of the 90s and early 2000s, and so in many ways I think that actually, the competitive dynamics of our market mirror search more than normal enterprise software markets. Both critics and defenders of AI wrappers have a point, and both miss something crucial. The critics are right that some wrappers lack defensibility and will disappear when platforms absorb their features. The defenders are right that every successful software company wraps something. But the real insight lies between these positions. Even if a new application starts as a wrapper, it can endure if it embeds itself in existing workflows, writes to proprietary systems of record, or builds proprietary data and learns from usage. These are the same traits that separate lasting products from fleeting features. Perplexity AI CEO, Aravind Srinivas pushing back on criticism about the business potential of Perplexity: https://medium.com/@alvaro_72265/the-misunderstood-ai-wrapper-opportunity-afabb3c74f31 https://ai.plainenglish.io/wrappers-win-why-your-ai-startup-doesnt-need-to-reinvent-the-wheel-6a6d59d23a9a https://aijourn.com/how-ai-wrappers-are-creating-multi-million-dollar-businesses/ https://growthpartners.online/stories/how-jenni-ai-went-from-0-to-333k-mrr Microsoft bundled Teams into Office 365 subscriptions at no extra cost, using its dominant enterprise distribution to surpass Slack’s paid standalone product within three years despite Slack’s earlier launch and product innovation. See https://venturebeat.com/ai/microsoft-teams-has-13-million-daily-active-users-beating-slack https://a16z.com/revenue-benchmarks-ai-apps/ AI Wrapper Meme: An API call to OpenAI under the hood. In my view, this AI wrapper debate misses a larger point. Wrappers are not all the same. Thin tricks enjoy a brief run and last only until big platforms bundle them into their suites. But products that live where users already work, write back to a proprietary system of record , and/or can make use of proprietary data can endure. The wrapper label is a distraction from what I think actually matters: (1) Is it a feature or a product, and (2) How big is the market segment. Thanks for reading Wreflection! Subscribe for free to receive new posts and support my work. Feature Or Product Begin with the earlier example of a wrapper that lets you chat with a PDF. Such a tool solves one narrow problem - answering questions about a document. It does not create new documents or edit existing ones. It typically does not capture any unique data, or learn from user behavior. It is a means to an end; a capability rather than an end-to-end solution. As a result, this kind of feature belongs inside a document viewer or editor, or in the flagship applications of model providers. So when the foundation models themselves (OpenAI/ChatGPT, Anthropic/Claude, Google/Gemini) bundle this feature natively, the standalone tool becomes redundant. This is classic feature behavior - easy to copy, no end-to-end job, no moat or long-term defensibility. One caveat though; even those that are features can be an interesting indie businesses that make money until the platforms build it into their apps 3 . PDF.ai $500K MRR, PhotoAI $77K MRR, Chatbase $70K MRR, InteriorAI $53K MRR 4 . Jenni AI went from $2,000 to over $333,000 MRR in just 18 months 5 . Cursor went from zero to $100 million in recurring revenue in 18 months, and became the subject of recurring OpenAI acquisition rumors. Windsurf , another coding assistant, received a $2.4B acquisition licensing deal from Google. Gamma reached $50 million in revenue in about a year. Lovable hit $50 million in revenue in just six months. Galileo AI acquired by Google for an undisclosed amount. First, they own the outcome even when they don’t own the model. Applications already embedded in user workflows (Gmail/Calendar, Sheets, EHR/EMR, Figma) require no new habit formation, and building these platforms from scratch is much harder than adding AI capability to existing ones. When these applications ship actions directly into a proprietary system of record (managing the calendar, filing the claim, creating the purchase order, and so on), “done” happens inside the incumbent’s environment. AI becomes another input to an existing workflow rather than a replacement for it. Second, successful incumbents build proprietary data from customer usage. Corrections, edge cases, and approvals become training data that refines the product over time, that a frontier model will not have access to. Cursor, though not an incumbent and despite its dependence on external models, plans to compete by capturing developer behavior patterns as CEO Michael Truell notes in his Stratechery interview : Ben: Is that a real sustainable advantage for you going forward, where you can really dominate the space because you have the usage data, it’s not just calling out to an LLM, that got you started, but now you’re training your own models based on people using Cursor. You started out by having the whole context of the code, which is the first thing you need to do to even accomplish this, but now you have your own data to train on. Michael: Yeah, I think it’s a big advantage, and I think these dynamics of high ceiling, you can kind of pick between products and then this kind of third dynamic of distribution then gets your data, which then helps you make the product better. I think all three of those things were shared by search at the end of the 90s and early 2000s, and so in many ways I think that actually, the competitive dynamics of our market mirror search more than normal enterprise software markets.

0 views
Jefferson Heard 2 months ago

The best worst hack that saved our bacon

No-one really likes engineering war stories, but this one's relevant because there's a moral to it. I've talked before about defining technical debt as technical decisions that provide immediate value, but with long-term negative impact if they aren't cleaned up. Sometimes introducing technical debt is necessary and you do it consciously to avoid a disaster. As long as you provide yourself enough room to clean it up, it's just part of the regular course of business when millions of people count on your software to get through their days. Twelve years of calendar appointments on our platform, and the data model was starting to show some wear and tear. Specifically, though, our occurrence table was created with a plain integer primary key, and we were approaching two billion occurrences on the calendar. Well, specifically, the primary key was rapidly approaching 2,147,483,647 – the magic number that is the maximum value for a signed 32-bit integer. We had actually known about this for some time, and we had done most of the work to fix it already. Our backend code was upgraded to bigints and the actual column itself had a migration set to upgrade it to a big integer. The plan had been in the works for a month and a half, and we almost ran with it. But then, roughly a week before we were going to deploy it (and maybe only a month before the keys ran out), someone, maybe me, I don't recall, noticed that these integer keys were exposed in one of our public APIs. You can count on one thing in SaaS software. If you provide an integration API to your customers or vendors and it exposes an attribute, that attribute is crucial to someone, somewhere. And in our case the people using the integrations often had to rely on their university's IT department to do the integration itself. Those backlogs are counted in months, and so we couldn't deploy something that would potentially break customer integrations. What to do? Well, Postgres integer primary keys are signed. So there's this WHOLE other half of the 32-bit word that you're not using if you're just auto-incrementing keys. My simple (read stupid) solution, which absolutely worked was to set the sequence on that primary key to -2,147,483,648 and let it continue to auto-increment , taking up the other half of that integer space. It was so dumb that I think we met like three times together with SRE to say things like, "Is it really this simple? Is this really likely to work? Are we really doing something this dumb?" and the conclusion was yes, and that it would buy us up to 3 years of time to migrate, but we would do it within 6-8 months so all IT departments can make alternative arrangements for their API integrations. The long term solution was the BigInt, yes, but it was also to expose all keys as opaque handles rather than integers to avoid dictionary attacks and so that we could use any type we needed to on the backend without API users having to account for it. It was also to work through the Customer Success team and make sure no-one counted on the integer-ness (integrality?) of the keys or better that no-one was using the occurrence IDs at all. In the end we had a smooth transition because of quick thinking and willingness to apply a baldfaced hack to our production (and staging) database. We had a fixed timeline we all acknowledged where the tech debt had to be addressed, and we'd firmly scoped out the negative consequences of not addressing it. It wasn't hard, but it meant that no matter who was in charge or what team changes were made, the cleanup would get done in time and correctly. It was the right thing to do. A few customers had been counting on those IDs and we were able to advise their IT departments about how to change their code and to show them what the new API response would look like long before they actually were forced to use it. In the meantime, everything just worked. Do I advise that you use negative primary keys to save room on your database? No. Was it the right choice of technical debt for the time? Absolutely.

0 views
Lea Verou 2 months ago

In the economy of user effort, be a bargain, not a scam

Alan Kay [source] One of my favorite product design principles is Alan Kay’s “Simple things should be simple, complex things should be possible” . [1] I had been saying it almost verbatim long before I encountered Kay’s quote. Kay’s maxim is deceptively simple, but its implications run deep. It isn’t just a design ideal — it’s a call to continually balance friction, scope, and tradeoffs in service of the people using our products. This philosophy played a big part in Prism’s success back in 2012, helping it become the web’s de facto syntax highlighter for years, with over 2 billion npm downloads. Highlighting code on a page took including two files. No markup changes. Styling used readable CSS class names. Even adding new languages — the most common “complex” use case — required far less knowledge and effort than alternatives. At the same time, Prism exposed a deep extensibility model so plugin authors could patch internals and dramatically alter behavior. These choices are rarely free. The friendly styling API increased clash risk, and deep extensibility reduced encapsulation. These were conscious tradeoffs, and they weren’t easy. Simple refers to use cases that are simple from the user’s perspective , i.e. the most common use cases. They may be hard to implement, and interface simplicity is often inversely correlated with implementation simplicity. And which things are complex , depends on product scope . Instagram’s complex cases are vastly different than Photoshop’s complex cases, but as long as there is a range, Kay’s principle still applies. Since Alan Kay was a computer scientist, his quote is typically framed as a PL or API design principle, but that sells it short. It applies to a much, much broader class of interfaces. This class hinges on the distribution of use cases . Products often cut scope by identifying the ~20% of use cases that drive ~80% of usage — aka the Pareto Principle . Some products, however, have such diverse use cases that Pareto doesn’t meaningfully apply to the product as a whole. There are common use cases and niche use cases, but no clean 20-80 split. The long tail of niche use cases is so numerous, it becomes significant in aggregate . For lack of a better term, I’ll call these long‑tail UIs . Nearly all creative tools are long-tail UIs. That’s why it works so well for programming languages and APIs — both are types of creative interfaces. But so are graphics editors, word processors, spreadsheets, and countless other interfaces that help humans create artifacts — even some you would never describe as creative. Yes, programming languages and APIs are user interfaces . If this surprises you, watch my DotJS 2024 talk titled “API Design is UI Design” . It’s only 20 minutes, but covers a lot of ground, including some of the ideas in this post. I include both code and GUI examples to underscore this point; if the API examples aren’t your thing, skip them and the post will still make sense. You wouldn’t describe Google Calendar as a creative tool, but it is a tool that helps humans create artifacts (calendar events). It is also a long-tail product: there is a set of common, conceptually simple cases (one-off events at a specific time and date), and a long tail of complex use cases (recurring events, guests, multiple calendars, timezones, etc.). Indeed, Kay’s maxim has clearly been used in its design. The simple case has been so optimized that you can literally add a one hour calendar event with a single click (using a placeholder title). A different duration can be set after that first click through dragging [2] . But almost every edge case is also catered to — with additional user effort. Google Calendar is also an example of an interface that digitally encodes real-life, demonstrating that complex use cases are not always power user use cases . Often, the complexity is driven by life events. E.g. your taxes may be complex without you being a power user of tax software, and your family situation may be unusual without you being a power user of every form that asks about it. The Pareto Principle is still useful for individual features , as they tend to be more narrowly defined. E.g. there is a set of spreadsheet formulas (actually much smaller than 20%) that drives >80% of formula usage. While creative tools are the poster child of long-tail UIs, there are long-tail components in many transactional interfaces such as e-commerce or meal delivery (e.g. result filtering & sorting, product personalization interfaces, etc.). Filtering UIs are another big category of long-tail UIs, and they involve so many tradeoffs and tough design decisions you could literally write a book about just them. Airbnb’s filtering UI here is definitely making an effort to make simple things easy with (personalized! 😍) shortcuts and complex things possible via more granular controls. Picture a plane with two axes: the horizontal axis being the complexity of the desired task (again from the user’s perspective, nothing to do with implementation complexity), and the vertical axis the cognitive and/or physical effort users need to expend to accomplish their task using a given interface. Following Kay’s maxim guarantees these two points: But even if we get these two points — what about all the points in between? There are a ton of different ways to connect them, and they produce vastly different overall user experiences. How does your interface fare when a use case is only slightly more complex? Are users yeeted into the deep end of interface complexity (bad), or do they only need to invest a proportional, incremental amount of effort to achieve their goal (good)? Meet the complexity-to-effort curve , the most important usability metric you’ve never heard of. For delightful user experiences, making simple things easy and complex things possible is not enough — the transition between the two should also be smooth. You see, simple use cases are the spherical cows in space of product design . They work great for prototypes to convince stakeholders, or in marketing demos, but the real world is messy . Most artifacts that users need to create to achieve their real-life goals rarely fit into your “simple” flows completely, no matter how well you’ve done your homework. They are mostly simple — with a liiiiitle wart here and there. For a long-tail interface to serve user needs well in practice , we also need to design the curve, not just its endpoints . A model with surprising predictive power is to treat user effort as a currency that users are spending to buy solutions to their problems. Nobody likes paying it; in an ideal world software would read our mind and execute perfectly with zero user effort. Since we don’t live in such a world, users are typically willing to pay more in effort when they feel their use case warrants it. Just like regular pricing, actual user experience often depends more on the relationship between cost and expectation (budget) than on the absolute cost itself. If you pay more than you expected, you feel ripped off. You may still pay it because you need the product in the moment, but you’ll be looking for a better deal in the future. And if you pay less than you expected, you feel like you got a bargain, with all the delight and loyalty that entails. Incremental user effort cost should be proportional to incremental value gained. Suppose you’re ordering pizza. You want a simple cheese pizza with ham and mushrooms. You use the online ordering system, and you notice that adding ham to your pizza triples its price. We’re not talking some kind of fancy ham where the pigs were fed on caviar and bathed in champagne, just a regular run-of-the-mill pizza topping. You may still order it if you’re starving and no other options are available, but how does it make you feel? It’s not that different when the currency is user effort. The all too familiar “ But I just wanted to _________, why is it so hard? ”. When a slight increase in complexity results in a significant increase in user effort cost, we have a usability cliff . Usability cliffs make users feel resentful, just like the customers of our fictitious pizza shop. A usability cliff is when a small increase in use case complexity requires a large increase in user effort. Usability cliffs are very common in products that make simple things easy and complex things possible through entirely separate flows with no integration between them: a super high level one that caters to the most common use case with little or no flexibility, and a very low-level one that is an escape hatch: it lets users do whatever, but they have to recreate the solution to the simple use case from scratch before they can tweak it. Simple things are certainly easy: all we need to get a video with a nice sleek set of controls that work well on every device is a single attribute: . We just slap it on our element and we’re done with a single line of HTML: Now suppose use case complexity increases just a little . Maybe I want to add buttons to jump 10 seconds back or forwards. Or a language picker for subtitles. Or just to hide the volume control on a video that has no audio track. None of these are particularly niche, but the default controls are all-or-nothing: the only way to change them is to reimplement the whole toolbar from scratch, which takes hundreds of lines of code to do well. Simple things are easy and complex things are possible. But once use case complexity crosses a certain (low) threshold, user effort abruptly shoots up. That’s a usability cliff. For Instagram’s photo editor, the simple use case is canned filters, whereas the complex ones are those requiring tweaking through individual low-level controls. However, they are implemented as separate flows: you can tweak the filter’s intensity , but you can’t see or adjust the primitives it’s built from. You can layer both types of edits on the same image, but they are additive, which doesn’t work well. Ideally, the two panels would be integrated, so that selecting a filter would adjust the low-level controls accordingly, which would facilitate incremental tweaking AND would serve as a teaching aid for how filters work. My favorite end-user facing product that gets this right is Coda , a cross between a document editor, a spreadsheet, and a database. All over its UI, it supports entering formulas instead of raw values, which makes complex things possible. To make simple things easy, it also provides the GUI you’d expect even without a formula language. But here’s the twist: these presets generate formulas behind the scenes that users can tweak ! Whenever users need to go a little beyond what the UI provides, they can switch to the formula editor and adjust what was generated — far easier than writing it from scratch. Another nice touch: “And” is not just communicating how multiple filters are combined, but is also a control that lets users edit the logic. Defining high-level abstractions in terms of low-level primitives is a great way to achieve a smooth complexity-to-effort curve, as it allows you to expose tweaking at various intermediate levels and scopes. The downside is that it can sometimes constrain the types of high-level solutions that can be implemented. Whether the tradeoff is worth it depends on the product and use cases. If you like eating out, this may be a familiar scenario: — I would like the rib-eye please, medium-rare. — Thank you sir. How would you like your steak cooked? Keep user effort close to the minimum necessary to declare intent Annoying, right? And yet, this is how many user interfaces work; expecting users to communicate the same intent multiple times in slightly different ways. If incremental value should require incremental user effort , an obvious corollary is that things that produce no value should not require user effort . Using the currency model makes this obvious: who likes paying without getting anything in return? Respect user effort. Treat it as a scarce resource — just like regular currency — and keep it close to the minimum necessary to declare intent . Do not require users to do work that confers them no benefit, and could have been handled by the UI. If it can be derived from other input, it should be derived from other input. Source: NNGroup (adapted). A once ubiquitous example that is thankfully going away, is the credit card form which asks for the type of credit card in a separate dropdown. Credit card numbers are designed so that the type of credit card can be determined from the first four digits. There is zero reason to ask for it separately. Beyond wasting user effort, duplicating input that can be derived introduces an unnecessary error condition that you now need to handle: what happens when the entered type is not consistent with the entered number? User actions that meaningfully communicate intent to the interface are signal . Any other step users need to take to accomplish their goal, is noise . This includes communicating the same input more than once, providing input separately that could be derived from other input with complete or high certainty, transforming input from their mental model to the interface’s mental model, and any other demand for user effort that does not serve to communicate new information about the user’s goal. Some noise is unavoidable. The only way to have 100% signal-to-noise ratio would be if the interface could mind read. But too much noise increases friction and obfuscates signal. A short yet demonstrative example is the web platform’s methods for programmatically removing an element from the page. To signal intent in this case, the user needs to communicate two things: (a) what they want to do (remove an element), and (b) which element to remove. Anything beyond that is noise. The modern DOM method has an extremely high signal-to-noise ratio. It’s hard to imagine a more concise way to signal intent. However, the older method that it replaced had much worse ergonomics. It required two parameters: the element to remove, and its parent. But the parent is not a separate source of truth — it would always be the child node’s parent! As a result, its actual usage involved boilerplate , where developers had to write a much noisier [3] . Boilerplate is repetitive code that users need to include without thought, because it does not actually communicate intent. It’s the software version of red tape : hoops you need to jump through to accomplish your goal, that serve no obvious purpose in furthering said goal except for the fact that they are required. In this case, the amount of boilerplate may seem small, but when viewed as a percentage of the total amount of code, the difference is staggering. The exact ratio (81% vs 20% here) varies based on specifics such as variable names, but when the difference is meaningful, it transcends these types of low-level details. Of course, it was usually encapsulated in utility functions, which provided a similar signal-to-noise ratio as the modern method. However, user-defined abstractions don’t come for free, there is an effort (and learnability) tax there, too. Improving signal-to-noise ratio is also why the front-end web industry gravitated towards component architectures: they increase signal-to-noise ratio by encapsulating boilerplate. As an exercise for the reader, try to calculate the signal-to-noise ratio of a Bootstrap accordion (or any other complex Bootstrap component). Users are much more vocal about things not being possible, than things being hard. When pointing out friction issues in design reviews , I have sometimes heard “ users have not complained about this ”. This reveals a fundamental misunderstanding about the psychology of user feedback . Users are much more vocal about things not being possible, than about things being hard. The reason becomes clear if we look at the neuroscience of each. Friction is transient in working memory (prefrontal cortex). After completing a task, details fade. The negative emotion persists and accumulates, but filing a complaint requires prefrontal engagement that is brief or absent. Users often can’t articulate why the software feels unpleasant: the specifics vanish; the feeling remains. Hard limitations, on the other hand, persist as conscious appraisals. The trigger doesn’t go away, since there is no workaround, so it’s far more likely to surface in explicit user feedback. Both types of pain points cause negative emotions, but friction is primarily processed by the limbic system (emotion), whereas hard limitations remain in the prefrontal cortex (reasoning). This also means that when users finally do reach the breaking point and complain about friction, you better listen. Friction is primarily processed by the limbic system, whereas hard limitations remain in the prefrontal cortex Second, user complaints are filed when there is a mismatch in expectations . Things are not possible but the user feels they should be, or interactions cost more user effort than the user had budgeted, e.g. because they know that a competing product offers the same feature for less (work). Often, users have been conditioned to expect poor user experiences, either because all options in the category are high friction, or because the user is too novice to know better [4] . So they begrudgingly pay the price, and don’t think they have the right to complain, because it’s just how things are. You might ask, “If all competitors are equally high-friction, how does this hurt us?” An unmet need is a standing invitation to disruption that a competitor can exploit at any time. Because you’re not only competing within a category; you’re competing with all alternatives — including nonconsumption (see Jobs‑to‑be‑Done ). Even for retention, users can defect to a different category altogether (e.g., building native apps instead of web apps). Historical examples abound. When it comes to actual currency, a familiar example is Airbnb : Until it came along, nobody would complain that a hotel of average price is expensive — it was just the price of hotels. If you couldn’t afford it, you just couldn’t afford to travel, period. But once Airbnb showed there is a cheaper alternative for hotel prices as a whole , tons of people jumped ship. It’s no different when the currency is user effort. Stripe took the payment API market by storm when it demonstrated that payment APIs did not have to be so high friction. iPhone disrupted the smartphone market when it demonstrated that no, you did not have to be highly technical to use a smartphone. The list goes on. Unfortunately, friction is hard to instrument. With good telemetry you can detect specific issues (e.g., dead clicks), but there is no KPI to measure friction as a whole. And no, NPS isn’t it — and you’re probably using it wrong anyway . Instead, the emotional residue from friction quietly drags many metrics down (churn, conversion, task completion), sending teams in circles like blind men touching an elephant . That’s why dashboards must be paired with product vision and proactive, first‑principles product leadership . Steve Jobs exemplified this posture: proactively, aggressively eliminating friction presented as “inevitable.” He challenged unnecessary choices, delays, and jargon, without waiting for KPIs to grant permission. Do mice really need multiple buttons? Does installing software really need multiple steps? Do smartphones really need a stylus? Of course, this worked because he had the authority to protect the vision; most orgs need explicit trust to avoid diluting it. So, if there is no metric for friction, how do you identify it? Reducing friction rarely comes for free, just because someone had a good idea. These cases do exist, and they are great, but it usually takes sacrifices. And without it being an organizational priority, it’s very hard to steer these tradeoffs in that direction. The most common tradeoff is implementation complexity. Simplifying user experience is usually a process of driving complexity inwards and encapsulating it in the implementation. Explicit, low-level interfaces are far easier to implement, which is why there are so many of them. Especially as deadlines loom, engineers will often push towards externalizing complexity into the user interface, so that they can ship faster. And if Product leans more data-driven than data-informed, it’s easy to look at customer feedback and conclude that what users need is more features ( it’s not ) . The first faucet is a thin abstraction : it exposes the underlying implementation directly, passing the complexity on to users, who now need to do their own translation of temperature and pressure into amounts of hot and cold water. It prioritizes implementation simplicity at the expense of wasting user effort. The second design prioritizes user needs and abstracts the underlying implementation to support the user’s mental model. It provides controls to adjust the water temperature and pressure independently, and internally translates them to the amounts of hot and cold water. This interface sacrifices some implementation simplicity to minimize user effort. This is why I’m skeptical of blanket calls for “simplicity.”: they are platitudes. Everyone agrees that, all else equal, simpler is better. It’s the tradeoffs between different types of simplicity that are tough. In some cases, reducing friction even carries tangible financial risks, which makes leadership buy-in crucial. This kind of tradeoff cannot be made by individual designers — it requires usability as a priority to trickle down from the top of the org chart. The Oslo airport train ticket machine is the epitome of a high signal-to-noise interface. You simply swipe your credit card to enter and you swipe your card again as you leave the station at your destination. That’s it. No choices to make. No buttons to press. No ticket. You just swipe your card and you get on the train. Today this may not seem radical, but back in 2003, it was groundbreaking . To be able to provide such a frictionless user experience, they had to make a financial tradeoff: it does not ask for a PIN code, which means the company would need to simply absorb the financial losses from fraudulent charges (stolen credit cards, etc.). When user needs are prioritized at the top, it helps to cement that priority as an organizational design principle to point to when these tradeoffs come along in the day-to-day. Having a design principle in place will not instantly resolve all conflict, but it helps turn conflict about priorities into conflict about whether an exception is warranted, or whether the principle is applied correctly, both of which are generally easier to resolve. Of course, for that to work everyone needs to be on board with the principle. But here’s the thing with design principles (and most principles in general): they often seem obvious in the abstract, so it’s easy to get alignment in the abstract. It’s when the abstract becomes concrete that it gets tough. The Web Platform has its own version of this principle, which is called Priority of Constituencies : “User needs come before the needs of web page authors, which come before the needs of user agent implementors, which come before the needs of specification writers, which come before theoretical purity.” This highlights another key distinction. It’s more nuanced than users over developers; a better framing is consumers over producers . Developers are just one type of producer. The web platform has multiple tiers of producers: Even within the same tier there are producer vs consumer dynamics. When it comes to web development libraries, the web developers who write them are producers and the web developers who use them are consumers. This distinction also comes up in extensible software, where plugin authors are still consumers when it comes to the software itself, but producers when it comes to their own plugins. It also comes up in dual sided marketplace products (e.g. Airbnb, Uber, etc.), where buyer needs are generally higher priority than seller needs. In the economy of user effort, the antithesis of overpriced interfaces that make users feel ripped off are those where every bit of user effort required feels meaningful and produces tangible value to them. The interface is on the user’s side, gently helping them along with every step, instead of treating their time and energy as disposable. The user feels like they’re getting a bargain : they get to spend less than they had budgeted for! And we all know how motivating a good bargain is. User effort bargains don’t have to be radical innovations; don’t underestimate the power of small touches. A zip code input that auto-fills city and state, a web component that automatically adapts to its context without additional configuration, a pasted link that automatically defaults to the website title (or the selected text, if any), a freeform date that is correctly parsed into structured data, a login UI that remembers whether you have an account and which service you’ve used to log in before, an authentication flow that takes you back to the page you were on before. Sometimes many small things can collectively make a big difference. In some ways, it’s the polar opposite of death by a thousand paper cuts : Life by a thousand sprinkles of delight! 😀 In the end, “ simple things simple, complex things possible ” is table stakes. The key differentiator is the shape of the curve between those points. Products win when user effort scales smoothly with use case complexity, cliffs are engineered out, and every interaction declares a meaningful piece of user intent . That doesn’t just happen by itself. It involves hard tradeoffs, saying no a lot, and prioritizing user needs at the organizational level . Treating user effort like real money, forces you to design with restraint. A rule of thumb is place the pain where it’s best absorbed by prioritizing consumers over producers . Do this consistently, and the interface feels delightful in a way that sticks. Delight turns into trust. Trust into loyalty. Loyalty into product-market fit. Kay himself replied on Quora and provided background on this quote . Don’t you just love the internet? ↩︎ Yes, typing can be faster than dragging, but minimizing homing between input devices improves efficiency more, see KLM ↩︎ Yes, today it would have been , which is a little less noisy, but this was before the optional chaining operator. ↩︎ When I was running user studies at MIT, I’ve often had users exclaim “I can’t believe it! I tried to do the obvious simple thing and it actually worked!” ↩︎

1 views
マリウス 2 months ago

📨🚕

📨🚕 ( MSG.TAXI ) is a multi-protocol push notification router. You post to it via a webhook URL and it flings that data to your configured targets . It’s the missing glue between your code and your notification channels, whether that’s your smart home, your CI pipeline, your RPG guild’s Matrix room, or just your phone at 3AM when your server falls over (again). Push notifications from anything, to anything. Intro Updates Website

0 views
Sean Goedecke 2 months ago

The whole point of OpenAI's Responses API is to help them hide reasoning traces

About six months ago, OpenAI released their Responses API , which replaced their previous /chat/completions API for inference. The old API was very simple: you pass in an array of messages representing a conversation between the model and a user, and get the model’s next response back. The new Responses API is more complicated

0 views
Sean Goedecke 2 months ago

An unofficial FAQ for Stripe's new "Tempo" blockchain

Stripe just announced Tempo , a “L1 blockchain” for “stablecoin payments”. What does any of this mean. In 2021, I was interested enough in blockchain to write a simple explainer and a technical description of Bitcoin specifically . But I’ve never been a blockchain fan . Both my old and new “what kind of work I want” posts state that I’m ethically opposed to proof-of-work blockchain

0 views
DuckTyped 3 months ago

An Illustrated Guide to OAuth

OAuth was first introduced in 2007. It was created at Twitter because Twitter wanted a way to allow third-party apps to post tweets on users' behalf. Take a second to imagine designing something like that today. How would you do it? One way would just be to ask the user for their username and password. So you create an unofficial Twitter client, and present the user a login screen that says "log in with Twitter". The user does so, but instead of logging into Twitter, they're actually sending their data to you, this third-party service which logs into Twitter for them. This is bad for a lot of reasons. Even if you trust a third-party app, what if they don't store your password correctly and someone steals it? You should never give your password to a third-party website like this. Another way you might be thinking is, what about API keys? Because you're hitting Twitter's API to post data for a user, and for an API , you use API keys . But API keys are general. What you need is an API key specific to a user. To solve these problems, OAuth was created. You'll see how it solves all these problems, but the crux of OAuth is an access token , which is sort of like an API key for a specific user. An app gets an access token, and then they can use that to take actions on the user's behalf, or access data for a user. OAuth can be used in a lot of different ways, one of the reasons it is so hard to understand. In this post, we’re going to look at a typical OAuth flow. The example I'm going to use is YNAB. If you haven't used it, YNAB is like a paid version of Mint. You connect it to a bank account, and then it pulls all your transactions from that account, and shows them to you with very pretty charts. You can categorize your spending, and then it tells you for example, hey, you're spending too much on groceries. It helps you manage your finances. So, I want to use YNAB, and I want to connect it to Chase Bank, but I don't want to give it my Chase password. So instead, I'm going to use OAuth. Let's look at the flow first, and then let's understand what's going on. We're actually going to look at the flow twice , because I think you need to look through an OAuth flow at least two times to understand what's going on. So to start, I'm at YNAB, and I want to connect Chase as a source. The OAuth flow looks like this: YNAB redirects me to Chase. At Chase, I log in with my username and password. Chase shows me a screen saying "YNAB wants to connect to Chase. Pick what accounts you want to give YNAB access to". It'll show me a list of all my accounts. Let's say I pick just my checking account, to give YNAB read access to this account, and hit OK. From Chase, I'm redirected back to YNAB, and now, magically, YNAB is connected to Chase. This is the experience from a user's perspective. But what happened there? What magic happened in the background, so that YNAB somehow has access to my data on Chase? Remember, the end goal of OAuth is for YNAB to end up with an access token , so it can access my data from Chase. Somehow, as I went through this flow, YNAB ended up with an access token. I'll spoil the surprise by telling you how it got the access token, and then I'll walk you through what happened in more detail. How does Chase give YNAB the access token? When you were redirected from Chase back to YNAB, Chase could have just added the access token in the URL. It could have redirected you back to a URL like this: and then YNAB would be able to get the access token. An access token is supposed to be secret, but URLs can end up in your browser's history or some server logs, in which case it's easy for anyone to see your access token. So Chase could technically redirect you back to YNAB with the access token in the URL , and then YNAB would have the access token. End of OAuth flow. But we don’t do it this way, because sending an access token in the URL is not secure. When you were redirected from Chase back to YNAB, Chase sent YNAB an authorization code in the URL. An authorization code is not an access token! Chase sends YNAB an authorization code, and YNAB exchanges the authorization code for an access token . It does this by making a backend request to Chase, a backend POST request over HTTPS, which means no one can see the access token. And then YNAB has the access token. End of OAuth flow. OAuth success. Let's talk about what we just saw. At a high level, there are two parts to an OAuth flow. The first is the user consent flow , which is where you, the user, log in and pick what to give access to. This is a critical part of OAuth, because in OAuth, we always want the user to be actively involved and in control. The other part is the authorization code flow . This is the flow where YNAB actually gets this access token . Let's talk about more details of exactly how this works. And let's also talk about some terminology, because OAuth has very specific terminology. Instead of user, we say resource owner . Instead of app, we say OAuth client or OAuth app . The server where you log in is called the authorization server . The server where you get user data from is called the resource server (This could be the same as the authorization server). On the authorization server, when the user picks what's allowed, those are called scopes . I'll try to use that terminology, because you'll need to get familiar with it if you're going to read more OAuth documentation. So let’s look at this high level again, with the new terms. You have OAuth clients. An OAuth client wants to access data on a resource server, and the data belongs to the resource owner. To do that, the OAuth client redirects to the authorization server. The user logs in, user agrees to scopes (what this token is allowed to access), and the user gets redirected back to the OAuth client with an authorization code in the URL. On the back end, the OAuth client sends the authorization code and client secret (we'll talk about client secrets shortly) to the authorization server, and the authorization server responds with the access token. That's the exact same flow, but using the new terminology we just discussed. Now let's talk specifics. We've seen what this flow looks like from the user's point of view, let's look at what it looks like from the developer's point of view. To use OAuth, you first need to register a new app. So for example, GitHub provides OAuth. If you want to create a new app for GitHub, you first register it. Different services require different types of data in the app registration, but every service will require at least an app name, because when the user goes to GitHub, for example, GitHub needs to be able to say "Amazon Web Services is requesting read access to your repos and gists" A redirect URI. And we'll talk about what that is shortly. GitHub will respond with: A client ID. This is a public ID that you'll be using to make requests A client secret. You'll be using this to authenticate your request. Awesome, you have registered your OAuth application. Let's say your app is YNAB, and one of your users wants to connect to Chase. So you start a new OAuth flow... your very first one! Step one: You will redirect them to Chase's authorization server's OAuth endpoint, passing these parameters in the URL: Client ID, which we just talked about. The redirect URI. Once the user is done on Chase, this is where Chase will redirect them back to. This will be a YNAB url, since you're the YNAB app. Response type, which is usually "code", because we usually want to get back an authorization code, not an access token, which is less secure. Scopes. So what scopes are we requesting? i.e. what user data do we want to access? This is enough information for the authorization server to validate the request and show the user a message like "YNAB is requesting read access to your accounts". How does the authorization server validate the request? Well, if the client ID isn't valid, the request is invalid right away. If the client ID is valid, the authorization server needs to check the redirect URI. Basically, since the client ID is public, anyone could go get the YNAB client ID, and create their own OAuth flow that hits Chase, but then returns the user back to, let's say, evildude.com. But that's why when you register your app, you have to tell Chase what a valid redirect URI looks like. At that point, you would tell Chase that only YNAB.com URIs are valid, thus preventing this evildude.com scenario. If everything is valid, the authorization server can use the client ID to get the app name, maybe the app icon, and then show a user consent screen. The user will click which accounts they want to give YNAB access to, and hit okay. Chase will redirect them back to the redirect URI that you gave, lets say ynab.com/oauth-callback?authorization_code=xyz. Side note: you might be wondering, what is the difference between URI and URL? Because I'm kind of using both. Well, a URL is any website URL that we know and love. URI is more general. URL is a type of URI, but there are many other types of URIs. The reason I'm saying redirect URI instead of redirect URL is because mobile apps won't have a URL. They'll just have a URI, which is a protocol they have made up that might look something like . So if you're only doing web work, whenever you read URI, you can read it as URL. And if you're doing mobile work, you can read URI and know that yes, your use case is supported too. So user is redirected back to ynab.com/oauth-callback?authorization_code=xyz, and now your app has an authorization code. You send that authorization code to the Chase authorization server, along with your client secret. Why include the client secret? Because again, the authorization code is in the URL. So anyone can see it and anyone could try to exchange it for the access token. That's why we need to send the client secret, so Chase's server can say "Oh yes I remember I had generated this code for this client ID, and the client secret matches. This is a valid request." And then it returns the access token. Note how in every step of the OAuth flow, they have thought through how someone could exploit the flow, and added safeguards*. That is a big reason why it's so complicated. *I'm reliably informed by a friend in security that the OAuth designers learned a bunch of lessons the hard way, and that is another reason why it is so complicated: because it had to be patched repeatedly. The other big reason is because we want the user to be involved. That makes it complicated because all the user stuff has to be frontend, which is insecure, because anyone can see it. And then all the secure stuff has to be on the back end. I keep saying frontend and back-end, but in the OAuth docs, they say front-channel and back-channel instead. Let's talk about why. Front-channel and back-channel So, OAuth doesn't use the terms frontend and back-end, it uses front-channel and back-channel. Front-channel means GET requests, where anyone can see the parameter in the URL, and back-channel means POST requests, where that data is encrypted (as part of the POST body). The reason OAuth doesn't use frontend or backend is, because you could make POST requests using JavaScript! So, theoretically, you could exchange your authorization code for an access token right on the frontend, in JavaScript, by making a POST fetch request. Now, there is a big problem with this, which is you also need the client secret to make that request. And of course, once the secret is on the frontend and accessible in JavaScript, it's not secret anymore. Anyone can access it. So, instead of using the client secret, there's a different way to do it called PKCE , spelled P-K-C-E, pronounced “pixie” (seriously). It's not as secure as doing it on the backend with the client secret, but if backend is not an option for you, you can do it using PKCE. So just know that if you have an app without a back-end, you can still do OAuth. I may cover PKCE in a future post, as it is now recommended for the standard flow as well, since it helps protect against auth code interception. Same problem for mobile apps. Unless you have a mobile app that has a backend component, like a backend server somewhere, if you're putting your client secret in a mobile app, well, anyone can get that because there are tons of tools to extract strings from mobile apps. So, instead of including your client secret in your app, you should again use PKCE to get that access token. So those are two other terms that are good to know: front-channel and back-channel . At this point, you've seen what the OAuth flow looks like from the user's perspective, and from the developer's perspective, and you have seen the components that make it secure. One last thing I want to mention is OAuth can look like a lot of different ways. I covered the main recommended OAuth flow above, but some people may do OAuth by passing back an access token in the redirect instead of the authorization token (doing that is called the "implicit flow"). Some people may do it using PKCE. There's even a way to do OAuth without the user consent part, but that really is not recommended. The other part of OAuth we didn't cover is that tokens expire and you need to refresh them. And that happens through a refresh flow. Also, OAuth is all about authorization, but some workflows use OAuth to log in, such as when you use a “sign-on with Google” feature. This uses OpenID Connect, or OIDC, which is a layer on top of OAuth that also returns user data instead of just an access token. I'm mentioning this here because when you look for OAuth on the web, you'll see a lot of different flows, and you may be confused as to why they're all different. And the reason is, OAuth is not straightforward like HTTP, OAuth can look a lot of different ways. Now you're good to go out and do your own OAuthing. Good luck! Thanks for reading DuckTyped! Subscribe for free to receive new posts and support my work. OAuth was first introduced in 2007. It was created at Twitter because Twitter wanted a way to allow third-party apps to post tweets on users' behalf. Take a second to imagine designing something like that today. How would you do it? One way would just be to ask the user for their username and password. So you create an unofficial Twitter client, and present the user a login screen that says "log in with Twitter". The user does so, but instead of logging into Twitter, they're actually sending their data to you, this third-party service which logs into Twitter for them. This is bad for a lot of reasons. Even if you trust a third-party app, what if they don't store your password correctly and someone steals it? You should never give your password to a third-party website like this. Another way you might be thinking is, what about API keys? Because you're hitting Twitter's API to post data for a user, and for an API , you use API keys . But API keys are general. What you need is an API key specific to a user. To solve these problems, OAuth was created. You'll see how it solves all these problems, but the crux of OAuth is an access token , which is sort of like an API key for a specific user. An app gets an access token, and then they can use that to take actions on the user's behalf, or access data for a user. How OAuth works OAuth can be used in a lot of different ways, one of the reasons it is so hard to understand. In this post, we’re going to look at a typical OAuth flow. The example I'm going to use is YNAB. If you haven't used it, YNAB is like a paid version of Mint. You connect it to a bank account, and then it pulls all your transactions from that account, and shows them to you with very pretty charts. You can categorize your spending, and then it tells you for example, hey, you're spending too much on groceries. It helps you manage your finances. So, I want to use YNAB, and I want to connect it to Chase Bank, but I don't want to give it my Chase password. So instead, I'm going to use OAuth. Let's look at the flow first, and then let's understand what's going on. We're actually going to look at the flow twice , because I think you need to look through an OAuth flow at least two times to understand what's going on. OAuth flow, take 1 So to start, I'm at YNAB, and I want to connect Chase as a source. The OAuth flow looks like this: YNAB redirects me to Chase. At Chase, I log in with my username and password. Chase shows me a screen saying "YNAB wants to connect to Chase. Pick what accounts you want to give YNAB access to". It'll show me a list of all my accounts. Let's say I pick just my checking account, to give YNAB read access to this account, and hit OK. From Chase, I'm redirected back to YNAB, and now, magically, YNAB is connected to Chase. This is the experience from a user's perspective. But what happened there? What magic happened in the background, so that YNAB somehow has access to my data on Chase? The end goal is to end up with an access token Remember, the end goal of OAuth is for YNAB to end up with an access token , so it can access my data from Chase. Somehow, as I went through this flow, YNAB ended up with an access token. I'll spoil the surprise by telling you how it got the access token, and then I'll walk you through what happened in more detail. A quick word on security How does Chase give YNAB the access token? When you were redirected from Chase back to YNAB, Chase could have just added the access token in the URL. It could have redirected you back to a URL like this: and then YNAB would be able to get the access token. BAD IDEA!! An access token is supposed to be secret, but URLs can end up in your browser's history or some server logs, in which case it's easy for anyone to see your access token. So Chase could technically redirect you back to YNAB with the access token in the URL , and then YNAB would have the access token. End of OAuth flow. But we don’t do it this way, because sending an access token in the URL is not secure. When you were redirected from Chase back to YNAB, Chase sent YNAB an authorization code in the URL. An authorization code is not an access token! Chase sends YNAB an authorization code, and YNAB exchanges the authorization code for an access token . It does this by making a backend request to Chase, a backend POST request over HTTPS, which means no one can see the access token. And then YNAB has the access token. End of OAuth flow. OAuth success. Two parts of OAuth Let's talk about what we just saw. At a high level, there are two parts to an OAuth flow. The first is the user consent flow , which is where you, the user, log in and pick what to give access to. This is a critical part of OAuth, because in OAuth, we always want the user to be actively involved and in control. The other part is the authorization code flow . This is the flow where YNAB actually gets this access token . Let's talk about more details of exactly how this works. And let's also talk about some terminology, because OAuth has very specific terminology. Instead of user, we say resource owner . Instead of app, we say OAuth client or OAuth app . The server where you log in is called the authorization server . The server where you get user data from is called the resource server (This could be the same as the authorization server). On the authorization server, when the user picks what's allowed, those are called scopes . I'll try to use that terminology, because you'll need to get familiar with it if you're going to read more OAuth documentation. So let’s look at this high level again, with the new terms. OAuth flow, take 2 You have OAuth clients. An OAuth client wants to access data on a resource server, and the data belongs to the resource owner. To do that, the OAuth client redirects to the authorization server. The user logs in, user agrees to scopes (what this token is allowed to access), and the user gets redirected back to the OAuth client with an authorization code in the URL. On the back end, the OAuth client sends the authorization code and client secret (we'll talk about client secrets shortly) to the authorization server, and the authorization server responds with the access token. That's the exact same flow, but using the new terminology we just discussed. Now let's talk specifics. We've seen what this flow looks like from the user's point of view, let's look at what it looks like from the developer's point of view. Registering a new app To use OAuth, you first need to register a new app. So for example, GitHub provides OAuth. If you want to create a new app for GitHub, you first register it. Different services require different types of data in the app registration, but every service will require at least an app name, because when the user goes to GitHub, for example, GitHub needs to be able to say "Amazon Web Services is requesting read access to your repos and gists" A redirect URI. And we'll talk about what that is shortly. A client ID. This is a public ID that you'll be using to make requests A client secret. You'll be using this to authenticate your request. Client ID, which we just talked about. The redirect URI. Once the user is done on Chase, this is where Chase will redirect them back to. This will be a YNAB url, since you're the YNAB app. Response type, which is usually "code", because we usually want to get back an authorization code, not an access token, which is less secure. Scopes. So what scopes are we requesting? i.e. what user data do we want to access?

0 views
Sean Goedecke 3 months ago

Everything I know about good API design

Most of what modern software engineers do 1 involves APIs: public interfaces for communicating with a program, like this one from Twilio. I’ve spent a lot of time working with APIs, both building and using them

0 views

AI Can't Read Your Docs

By now, nearly every engineer has seen an AI assistant write a perfect unit test or churn out flawless boilerplate. For simple, greenfield work, these tools are incredibly effective. But ask it to do something real, like refactor a core service that orchestrates three different libraries, and a frustrating glass ceiling appears. The agent gets lost, misses context, and fails to navigate the complex web of dependencies that make up a real-world system. Faced with this complexity, our first instinct is to write more documentation. We build mountains of internal documents, massive s, and detailed READMEs, complaining that the AI is "not following my docs" when it inevitably gets stuck. This strategy is a trap. It expects the AI to learn our messy, human-centric systems, putting an immense load on the agent and dooming it to fail. To be clear, documentation is a necessary first step , but it's not sufficient to make agents effective. Claude Code figuring out your monorepo. Image by ChatGPT. The near-term, most effective path isn’t about throwing context at the AI to be better at navigating our world; it’s about redesigning our software, libraries, and APIs with the AI agent as the primary user. This post 1 applies a set of patterns learned from designing and deploying AI agents in complex environments to building software for coding agents like Claude Code. You may also be interested in a slightly higher level article on AI-powered Software Engineering . Thanks for reading Shrivu’s Substack! Subscribe for free to receive new posts and support my work. The core principle is simple: reduce the need for external context and assumptions. An AI agent is at its best when the next step is obvious and the tools are intuitive. This framework builds from the most immediate agent interaction all the way up to the complete system architecture. This isn’t to say today's agents can’t reason or do complex things. But to unlock the full potential of today’s models—to not just solve problems, but do so consistently—these are your levers. In an agentic coding environment, every interaction with a tool is a turn in a conversation. The tool's output—whether it succeeds or fails—should be designed as a helpful, guiding prompt for the agent's next turn. A traditional CLI command that succeeds often returns very little: a resource ID, a silent exit code 0, or a simple "OK." For an agent, this is a dead end. An AI-friendly successful output is conversational. It not only confirms success but also suggests the most common next steps, providing the exact commands and IDs needed to proceed. Do (AI-Friendly): This is the other side of the same coin. For an AI agent, an error message must be a prompt for its next action. A poorly designed error is a dead end; a well-designed one is a course correction. A perfect, AI-friendly error message contains three parts: What went wrong: A clear, readable description of the failure. How to resolve it: Explicit instructions for fixing the issue, like a direct command to run or the runbook you already wrote but documented somewhere else. What to do next: Guidance on the next steps after resolution. By designing both your successful and failed outputs as actionable prompts, you transform your tools from simple utilities into interactive partners that actively guide the agent toward its goal. The best documentation is the documentation the agent doesn't need to read. If an error message is the agent's reactive guide, embedded documentation is its proactive one. When intuition isn't enough, integrate help as close to the point of use as possible. The CLI: Every command should have a comprehensive flag that serves as the canonical source of truth. This should be detailed enough to replace the need for other usage documentation. Claude already knows is where it should start first. The Code: Put a comment block at the top of critical files explaining its purpose, key assumptions, and common usage patterns. This not only helps the agent while exploring the code but also enables IDE-specific optimizations like codebase indexing. If an agent has to leave its current context to search a separate knowledge base, you’ve introduced a potential point of failure. Keep the necessary information local. After establishing what we communicate to the agent, we must define how we communicate. The protocol for agent interaction is a critical design choice. CLI ( Command-line interface ) via : This is a flexible, raw interface powerful for advanced agents like Claude Code that have strong scripting abilities. The agent can pipe commands, chain utilities, and perform complex shell operations. CLI-based tools can also be context-discovered rather than being exposed directly to the agent via its system prompt (which limits the max total tools in the MCP case). The downside is that it's less structured and the agent may need to take multiple tool calls to get the syntax correctly. MCP ( Model Context Protocol ): It provides a structured, agent-native way to expose your tools directly to the LLM's API. This gives you fine-grained control over the tool's definition as seen by the model and is better for workflows that rely on well-defined tool calls. This is particularly useful for deep prompt optimization, security controls, and to take advantage some of the more recent fancy UX features that MCP provides . MCP today can also be a bit trickier for end-users to install and authorize compared to existing install setups for cli tools (e.g. or just adding a new to your ). Overall, I’m starting to come to the conclusion that for developer tools—agents that can already interact with the file system and run commands—CLI-based is often the better and easier approach 2 . LLMs have a deep, pre-existing knowledge of the world’s most popular software. You can leverage this massive prior by designing your own tools as metaphors for these well-known interfaces. Building a testing library? Structure your assertions and fixtures to mimic . Creating a data transformation tool? Make your API look and feel like . Designing an internal deployment service? Model the CLI commands after the or syntax. When an agent encounters a familiar pattern, it doesn't need to learn from scratch. It can tap into its vast training data to infer how your system works, making your software exponentially more useful. This is logical for a human developer who can hold a complex mental map, but it’s inefficient for an AI agent (and for a human developer who isn't a domain expert) that excels at making localized, sequential changes. An AI-friendly design prioritizes workflows. The principle is simple: co-locate code that changes together. Here’s what this looks like in practice: Monorepo Structure: Instead of organizing by technical layer ( , ), organize by feature ( ). When an agent is asked to "add a filter to search," all the relevant UI and API logic is in one self-contained directory. Backend Service Architecture: Instead of a strict N-tier structure ( , , ), group code by domain. A directory would contain , , and , making the common workflow of "adding a new field to a product" a highly localized task. Frontend Component Files: Instead of separating file types ( , , ), co-locate all assets for a single component. A directory should contain , , and . This is best applied to organization-specific libraries and services. Being too aggressive with this type of optimization when it runs counter to well-known industry standards (e.g., completely changing the boilerplate layout of a Next.js app) can lead to more confusion. For a human, a message is a signal to ask for a code review. For an AI agent, it's often a misleading signal of completion. Unit tests are not enough. To trust an AI’s contribution enough to merge it, you need automated assurance that is equivalent to a human’s review. The goal is programmatic verification that answers the question: "Is this change as well-tested as if I had done it myself?" This requires building a comprehensive confidence system that provides the agent with rich, multi-layered evidence of correctness: It must validate not just the logic of individual functions, but also the integrity of critical user workflows from end-to-end . It must provide rich, multi-modal feedback. Instead of just a boolean , the system might return a full report including logs, performance metrics, and even a screen recording of the AI’s new feature being used in a headless browser . When an AI receives this holistic verification, it has the evidence it needs to self-correct or confidently mark its work as complete, automating not just the implementation, but the ever-increasing bottleneck of human validation on every change. How do you know if you've succeeded? The ultimate integration test for an AI-friendly codebase is this: Can you give the agent a real customer feature request and have it successfully implement the changes end-to-end? When you can effectively "vibe code" a solution—providing a high-level goal and letting the agent handle the implementation, debugging, and validation—you've built a truly AI-friendly system. The transition won't happen overnight. It starts with small, low-effort changes. For example: Create CLI wrappers for common manual operations. Improve one high frequency error message to make it an actionable prompt. Add one E2E test that provides richer feedback for a key user workflow. This is a new discipline, merging the art of context engineering with the science of software architecture. The teams that master it won't just be 10% more productive; they'll be operating in a different league entirely. The future of software isn't about humans writing code faster; it's about building systems that the next generation of AI agents can understand and build upon. Thanks for reading Shrivu’s Substack! Subscribe for free to receive new posts and support my work. In the spirit of reducing the manual effort to write posts while preserving quality I used a new AI workflow for writing this post. Using Superwhisper and Gemini, I gave a voice recorded lecture on all the things I thought would be useful to include in the post and had Gemini clean that up. I then had Gemini grill me on things that didn’t make sense (prompting it to give me questions and then voice recording my interview back to it), and then I grilled Gemini based on the draft of the post it wrote. I did this a few times until I was happy with the post and reduced the time-to-draft from ~5 hours to ~1 hour. If folks have feedback on the formatting of this post in particular (too much AI smell, too verbose, etc), please let me know! I’m not knocking MCP generally, I think the CLI-based approach works because these developer agents already have access to the codebase and can run these types of commands and Claude just happens to be great at this. For non-coding agent use cases, MCP is critical for bridging the gap between agent interfaces (e.g., ChatGPT) and third-party data/context providers. Although who knows, maybe the future of tool-calling is bash scripting . Claude Code figuring out your monorepo. Image by ChatGPT. The near-term, most effective path isn’t about throwing context at the AI to be better at navigating our world; it’s about redesigning our software, libraries, and APIs with the AI agent as the primary user. This post 1 applies a set of patterns learned from designing and deploying AI agents in complex environments to building software for coding agents like Claude Code. You may also be interested in a slightly higher level article on AI-powered Software Engineering . Thanks for reading Shrivu’s Substack! Subscribe for free to receive new posts and support my work. Six Patterns for AI-Friendly Design The core principle is simple: reduce the need for external context and assumptions. An AI agent is at its best when the next step is obvious and the tools are intuitive. This framework builds from the most immediate agent interaction all the way up to the complete system architecture. This isn’t to say today's agents can’t reason or do complex things. But to unlock the full potential of today’s models—to not just solve problems, but do so consistently—these are your levers. Pattern 1: Every Output is a Prompt In an agentic coding environment, every interaction with a tool is a turn in a conversation. The tool's output—whether it succeeds or fails—should be designed as a helpful, guiding prompt for the agent's next turn. The Successful Output A traditional CLI command that succeeds often returns very little: a resource ID, a silent exit code 0, or a simple "OK." For an agent, this is a dead end. An AI-friendly successful output is conversational. It not only confirms success but also suggests the most common next steps, providing the exact commands and IDs needed to proceed. Don't: Do (AI-Friendly): The Failure Output This is the other side of the same coin. For an AI agent, an error message must be a prompt for its next action. A poorly designed error is a dead end; a well-designed one is a course correction. A perfect, AI-friendly error message contains three parts: What went wrong: A clear, readable description of the failure. How to resolve it: Explicit instructions for fixing the issue, like a direct command to run or the runbook you already wrote but documented somewhere else. What to do next: Guidance on the next steps after resolution. The CLI: Every command should have a comprehensive flag that serves as the canonical source of truth. This should be detailed enough to replace the need for other usage documentation. Claude already knows is where it should start first. The Code: Put a comment block at the top of critical files explaining its purpose, key assumptions, and common usage patterns. This not only helps the agent while exploring the code but also enables IDE-specific optimizations like codebase indexing. CLI ( Command-line interface ) via : This is a flexible, raw interface powerful for advanced agents like Claude Code that have strong scripting abilities. The agent can pipe commands, chain utilities, and perform complex shell operations. CLI-based tools can also be context-discovered rather than being exposed directly to the agent via its system prompt (which limits the max total tools in the MCP case). The downside is that it's less structured and the agent may need to take multiple tool calls to get the syntax correctly. MCP ( Model Context Protocol ): It provides a structured, agent-native way to expose your tools directly to the LLM's API. This gives you fine-grained control over the tool's definition as seen by the model and is better for workflows that rely on well-defined tool calls. This is particularly useful for deep prompt optimization, security controls, and to take advantage some of the more recent fancy UX features that MCP provides . MCP today can also be a bit trickier for end-users to install and authorize compared to existing install setups for cli tools (e.g. or just adding a new to your ). Building a testing library? Structure your assertions and fixtures to mimic . Creating a data transformation tool? Make your API look and feel like . Designing an internal deployment service? Model the CLI commands after the or syntax. Monorepo Structure: Instead of organizing by technical layer ( , ), organize by feature ( ). When an agent is asked to "add a filter to search," all the relevant UI and API logic is in one self-contained directory. Backend Service Architecture: Instead of a strict N-tier structure ( , , ), group code by domain. A directory would contain , , and , making the common workflow of "adding a new field to a product" a highly localized task. Frontend Component Files: Instead of separating file types ( , , ), co-locate all assets for a single component. A directory should contain , , and . It must validate not just the logic of individual functions, but also the integrity of critical user workflows from end-to-end . It must provide rich, multi-modal feedback. Instead of just a boolean , the system might return a full report including logs, performance metrics, and even a screen recording of the AI’s new feature being used in a headless browser . Create CLI wrappers for common manual operations. Improve one high frequency error message to make it an actionable prompt. Add one E2E test that provides richer feedback for a key user workflow.

0 views
Can ELMA 3 months ago

Postmortem: How I Crashed an API with a Cloudflare Compression Rule

Sometimes the most valuable lessons come from our biggest mistakes. This is the story of how a single misconfigured Cloudflare compression rule broke our Server-Sent Events (SSE) streaming and brought down an entire API for several hours. Date : August 15, 2025 Duration : 4 hours 23 minutes Impact : ~20% API downtime, 15,000+ affected users Root Cause : Cloudflare Compression Rule Breaking SSE Streaming I was working on performance optimization for our API endpoints. The goal was to reduce bandwidth usage and improve response times by enabling Cloudflare's compression features. I enabled the Cloudflare compression rule: The issue wasn't immediately apparent. The compression rule looked safe, but I had forgotten a critical detail: our API used Server-Sent Events (SSE) for real-time streaming, and Cloudflare's compression breaks SSE . Cloudflare Compression Breaking SSE : The compression rule was enabled without understanding that it buffers data, breaking real-time streaming. This incident taught us that compression isn't always beneficial — it can break real-time protocols like SSE. The key lesson is to understand how infrastructure changes affect your specific use cases, especially streaming protocols.

0 views
W. Jason Gilmore 4 months ago

Notes On the Present State of MCP Servers

I've had the opportunity to spend the last several days immersed in researching the Model Context Protocol and the present state of MCP servers. My early conclusion is this technology is for real and has the potential to entirely change how we use the Internet. That said, like any emerging technology it is most definitely in a state of rapid evolution and so I've compiled a few points here that may be useful to others exploring this topic. It is presently a messy and chaotic space, with both server and client implementations unable to keep up with the rapidly evolving spec. A great example of this is Anthropic deprecating and then removing SSE from transport options ( https://modelcontextprotocol.io/specification/2025-06-18/basic/transports ) while simultaneously advertising their partner extensions which are SSE-based ( https://www.anthropic.com/engineering/desktop-extensions ). That said, I don't think anybody cares, including the major tech companies listed in that partner link, whether their extensions are presently SSE- or Streamable HTTP-based. It is just noise in the grand scheme of things, however SSE will eventually unquestionably be phased out, and doesn't even show up in the latest spec version. MCP client support for critical server features remains uneven. What works in VS Code (server Prompts) does not presently work in Cursor. My personal experiments show Prompts to be a fascinating feature which introduce opportunities for user interactivity not otherwise possible using solely Tools. Not for lack of trying, it remains unclear to me (and apparently almost everybody else, including AWS architects , how OAuth is implemented in MCP servers. Claude Desktop seems to have the best support, as evidenced by the directory they launched a few days ago. Other MCP clients have varying support, and require the use of experimental hacks such as mcp-remote for certain use cases. That said, the exploding mcp-remote weekly download chart is indicative of just how strong the demand presently is for at least experimenting with this new technology. And further, given the obvious advantages OAuth has to offer for enterprises it will only be a matter of time before OAuth is standard. You can already see Anthropic moving in this direction thanks to their recent publication of documents such as this . API key-based authentication works very well across popular clients (VS Code, Cursor, Claude Desktop, etc), and when coupled with a capable authorization solution such as DreamFactory it's already possible to build some really compelling and practical extensions to existing products. To see a concrete example of what I'm talking about, check out this great video by my friend and colleague Terence Bennett. While adding API keys (and MCP servers for that matter) to most clients presently requires a minimal level of technical expertise (modifying a JSON file), my experiments with Claude Desktop extensions (next point) shows installation woes will shortly be a thing of the past. Anthropic (Claude) is emerging as the clear leader in all things MCP which is no surprise considering they invented the concept. Among other things their new Desktop extension spec ( https://www.anthropic.com/engineering/desktop-extensions ) is very cool and I've already successfully built one. I'd love to see this approach adopted on a wider scale because it dramatically lowers the barrier-of-entry in terms of installing MCP servers. Somebody has already started an Awesome Claude Desktop Extensions page which is worth a look. The pace of evolution is such that if you're reading this even a few weeks or months after the publication date, then some or possibly all of what is stated above is outdated. Follow me on Twitter for ongoing updates as I expect to remain immersed in this topic for the foreseeable future. It is presently a messy and chaotic space, with both server and client implementations unable to keep up with the rapidly evolving spec. A great example of this is Anthropic deprecating and then removing SSE from transport options ( https://modelcontextprotocol.io/specification/2025-06-18/basic/transports ) while simultaneously advertising their partner extensions which are SSE-based ( https://www.anthropic.com/engineering/desktop-extensions ). That said, I don't think anybody cares, including the major tech companies listed in that partner link, whether their extensions are presently SSE- or Streamable HTTP-based. It is just noise in the grand scheme of things, however SSE will eventually unquestionably be phased out, and doesn't even show up in the latest spec version. MCP client support for critical server features remains uneven. What works in VS Code (server Prompts) does not presently work in Cursor. My personal experiments show Prompts to be a fascinating feature which introduce opportunities for user interactivity not otherwise possible using solely Tools. Not for lack of trying, it remains unclear to me (and apparently almost everybody else, including AWS architects , how OAuth is implemented in MCP servers. Claude Desktop seems to have the best support, as evidenced by the directory they launched a few days ago. Other MCP clients have varying support, and require the use of experimental hacks such as mcp-remote for certain use cases. That said, the exploding mcp-remote weekly download chart is indicative of just how strong the demand presently is for at least experimenting with this new technology. And further, given the obvious advantages OAuth has to offer for enterprises it will only be a matter of time before OAuth is standard. You can already see Anthropic moving in this direction thanks to their recent publication of documents such as this . API key-based authentication works very well across popular clients (VS Code, Cursor, Claude Desktop, etc), and when coupled with a capable authorization solution such as DreamFactory it's already possible to build some really compelling and practical extensions to existing products. To see a concrete example of what I'm talking about, check out this great video by my friend and colleague Terence Bennett. While adding API keys (and MCP servers for that matter) to most clients presently requires a minimal level of technical expertise (modifying a JSON file), my experiments with Claude Desktop extensions (next point) shows installation woes will shortly be a thing of the past. Anthropic (Claude) is emerging as the clear leader in all things MCP which is no surprise considering they invented the concept. Among other things their new Desktop extension spec ( https://www.anthropic.com/engineering/desktop-extensions ) is very cool and I've already successfully built one. I'd love to see this approach adopted on a wider scale because it dramatically lowers the barrier-of-entry in terms of installing MCP servers. Somebody has already started an Awesome Claude Desktop Extensions page which is worth a look.

0 views
tonsky.me 4 months ago

Gaslight-driven development

Any person who has used a computer in the past ten years knows that doing meaningless tasks is just part of the experience. Millions of people create accounts, confirm emails, dismiss notifications, solve captchas, reject cookies, and accept terms and conditions—not because they particularly want to or even need to. They do it because that’s what the computer told them to do. Like it or not, we are already serving the machines. Well, now there is a new way to serve our silicon overlords. LLMs started to have opinions on how your API should look, and since 90% of all code will be written by AI comes September , we have no choice but to oblige. You might’ve heard a story of Soundslice adding a feature because ChatGPT kept telling people it exists . We see the same at Instant: for example, we used for both inserting and updating entities, but LLMs kept writing instead. Guess what: we now have , too. Is it good or is it bad? It definitely feels strange. In a sense, it’s helpful: LLMs here have seen millions of other APIs and are suggesting the most obvious thing, something every developer would think of first, too. It’s also a unique testing device: if developers use your API wrong, they blame themselves, read the documentation, and fix their code. In the end, you might never learn that they even had the problem. But with ChatGPT, you yourself can experience “newbie’s POV” at any time. Of course, this approach doesn’t work if you are trying to do something new and unique. LLMs just won’t “get it”. But how many of us are doing something new and unique? Maybe, API is not the place to get clever? Maybe, for most cases, it’s truly best if you did the most obvious thing? So welcome to the new era. AI is not just using tools we gave it. It now has opinions about how these tools should’ve been made. And instead of asking nicely, it gaslights everybody into thinking that’s how it’s always been.

0 views

Claude Code with Kimi K2

It looks like Moonshot AI have an Anthropic-compatible API endpoint for their new open frontier model, K2. Since Anthropic lets you set a custom base URL for their API, it's relatively straightforward to set up Claude Code to use K2. Some folks on GitHub put together a workflow to set things up , but...it's a little bit sketchy (and is broken for me). Also, I'm not that excited about instructions that tell you to run commands that pipe to from entities with 'red team' in their names. It also doesn't work that well if you're already a Claude Code user because Claude Code isn't really built to let you swap between different API providers in different sessions. They don't have an easy way to move the systemwide config directory. Thankfully, on Unixlike operating systems, it's pretty easy to...just swap your directory out from under the OS. Head on over to (https://platform.moonshot.ai/console)[https://platform.moonshot.ai/console] and sign up for an account. As of this moment, you'll get $5 in credit for free. Make a directory for 's homedir: Make a shell script :

0 views
Fakeman Show 4 months ago

Lessons learned by building my own cookiecutter for REST APIs

During my university days, I avoided building CRUD apps, not because I couldn’t, but because they felt boring. To me, it was just a client (web or mobile) talking to a server that ran some SQL queries. I dodged them in every project I could. Fast forward a three years, after graduating and working at Oracle, I’ve realized that almost everything in software is just a fancy CRUD app. And you know what?

0 views