Latest Posts (20 found)
Zak Knill Yesterday

SSE token streaming is easy, they said

I wrote about AI having ‘durable sessions’ to support async agentic applications, and in the comments everyone said: “Token streaming over SSE is easy” . …so I figured I’d dig into that claim. Agents used to be a thing you talked to synchronously. Now they’re a thing that runs in the background while you work. When you make that change, the transport breaks.

0 views
Zak Knill 4 days ago

All your agents are going async

Agents used to be a thing you talked to synchronously. Now they’re a thing that runs in the background while you work. When you make that change, the transport breaks. For most of the time LLMs have been around, you use them by opening a chat-style window and typing a prompt. The LLM streams the response back token-by-token. It’s how ChatGPT, claude.ai, and Claude Code work. It’s also how the demos work for basically every AI SDK or AI Library. It’s easy to think that LLM chatbots are the ‘art of the possible’ for AI right now. But that’s not the case.

0 views
Zak Knill 1 months ago

You are the bottleneck

The agent can produce code faster than you can review it. That’s the bottleneck now, not the keyboard, not the compiler. You. Before agents, the constraint was how fast you could write code. Now it’s how fast you can review it. The agent ships. You approve. And the agent is faster than you. You’re not the producer anymore. You’re the reviewer. And that changes everything about how you should spend your time.

0 views
Zak Knill 1 months ago

If code is cheap, intent is the currency

Apparently writing code is cheap now . So since the barrier to producing code is gone, the intent behind the code is the most important bit. Intent is the new scarce resource, and commit messages are where that intent lives. Agents are still, for now, working inside human processes. The software development lifecycle (I’m getting flashbacks to every agile coach ever!) is still the same: we still have commits, pull requests, code review. We still have humans responsible for the agent’s output. But generating the code is cheaper, so the code review carries more of the weight and responsibility for good code .

0 views
Zak Knill 2 months ago

A chatbot's worst enemy is page refresh

How is is possible that we’ve made incredible gains in the performance of models, but virtually no gains in the infrastructure that supports them?. .. or what I like to call: the worst enemy of chatbots is page refresh. There are some large GIFs in this article, let them load :) If a picture speaks a thousand words, here is a GIF of the Claude UI taken on 11th Feb 2026.

0 views
Zak Knill 2 months ago

Only use agents for tasks you already know how to do

We’ve all seen the complaints. The burden of reviewing AI ‘output’ is shifting onto project maintainers and team members. Folks can easily generate lots of code using AI, that code might even be functional (in that it passes the tests also written by the AI). But that doesn’t necessarily make the code good or correct . So if you want to be a good team member, here’s my rule for coding with AI agents:

0 views
Zak Knill 4 months ago

SSE sucks for transporting LLM tokens

I’m just going to cut to the chase here. SSE as a transport mechanism for LLM tokens is naff. It’s not that it can’t work, obviously it can, because people are using it and SDKs are built around it. But it’s not a great fit for the problem space. The basic SSE flow goes something like this: Sure the approach has some benefits, like simplicity and compatibility with existing HTTP infrastructure. But it still sucks. Client makes an HTTP POST request to the server with a prompt Server responds with a 200 OK and keeps the connection open Server streams tokens back to the client as they are generated, using the SSE format Client processes the tokens as they arrive on the long-lived HTTP connection

0 views
Zak Knill 5 months ago

So you want to build AI agent group chat?

Disclaimer, I work for Ably; so I’m intimately familiar with the tech I mention here. Opinions are my own, etc. On Nov 13th Open AI announced the pilot of group chats in ChatGPT . This post looks at the existing patterns for interacting with models, and how they make it hard to build similar features. The Open AI group chat feature allows multiple users to join a chat with an AI model, and have a conversation together. Responses from each user are visible to all participants, and the model responds to the entire group. Building this with existing model and sdk transports patterns is hard.

0 views
Zak Knill 1 years ago

Patterns for building realtime features

Realtime features make apps feel modern, collaborative, and up-to-date. The features predominantly require sharing changes triggered by one user to other users, as the changes are happening. This typically means your server needs to send data to some set of clients, where those clients don’t know they are missing the data. These patterns rely on a connection between the client and the server, where the server can notify the client of some data. This connection could be websockets, sse, event-streams, or polling (long or short). The connection just needs to allow the server to send data to the client without the client knowing that there is new data.

0 views
Zak Knill 1 years ago

Phone call asymmetry

You get a phone call, but you’re away from your phone or you can’t answer it right at that moment. You call the number back and hear an automated voice say: Thank you for calling [some business], for accounts press 1, to place a new order press 2…. Perhaps, by sheer luck (or skill) you manage to navigate the labyrinth of options and talk to a real human (sidebar: there’s a circle of hell reserved for the flow-chart designer that creates a branch that ends up in them hanging up on you).

0 views
Zak Knill 1 years ago

Every programmer should know

Programmers should know a lot.. apparently. Programming paradigms Lockless concurrency Floating point More algorithms More latency More more latency More more more latency More more more more latency More memory Regular expressions Programming Vim commands Time complexities Optical fibre

0 views
Zak Knill 2 years ago

How to adopt Realtime updates in your app

…and why you really should! Realtime updates rely on two main technologies: You might also think of polling or long polling as a mechanism for fetching ‘Realtime’ data from your backend. Polling is not Realtime. Websockets : A stateful, persistent, bi-directional ‘channel’ of communication. Server sent events (SSE) : Built on top of HTTP, opens a long-running HTTP connection where multiple independent messages are written to the response over time.

0 views
Zak Knill 2 years ago

You don't need CRDTs for collaborative experiences

You don’t need CRDTs for collaborative experiences. First lets get the ‘what-about-ery’ out the way… Hold, on.. that all sounds great, but.. Offline first – this is wayy harder to get useful behaviour with out CRDTs. If you don’t use them, you’re pretty much destined to have LWW (which is actually a CRDT behaviour), and one user is likely to overwrite the changes of another. This isn’t a great experience for anyone involved. Text editing – everyone’s gonna say “but hey, google docs uses operational transform not CRDTs”.. OK yes, but you are not google . Martin Kleppmann has a great round-up of the various people who though they implemented OT correctly, but actually didn’t. The reason that you need CRDTs for text editing collaboration is that it’s a really extreme example of collaboration. The nature of text editing is that any tiny errors in the placement of characters by the convergence algorithm is going to create incorrect words, and incorrect words are incredibly obvious. Text editing has a high rate of edits (as you type), and the edits need to interleave perfectly or you get incorrect words, and errors in the interleaving are super obvious (incorrect words)!

0 views
Zak Knill 2 years ago

Giving up my smartphone - Duoqin F22 Pro

I was first attracted to the dumbphones after seeing a series of articles on Hacker News. I like the idea of using - and relying on - my phone less and less. No plan survives contact with the enemy, and I knew that I wouldn’t manage in life with a stripped down phone that could only do calls, texts, and maybe some music. Eventually I stumbled across the dumbphones subreddit (/r/dumbphones). On this subreddit I discovered ’transition phones’, that is a phone that can do some smartphone things, but with dumbphone characteristics. I found that you could have a dumbphone form factor but still install all the smartphone apps you might need.

0 views
Zak Knill 2 years ago

Do developers really want to give over their data?

There’s a rise in hosted database companies like Supabase , Neon , Turso , etc. When I look at those companies, here’s the thing I’ve been struggling with: Do developers really want to give over their data? Your making a trade-off by choosing one of these companies, and the tradeoff is this: They will solve some boring infrastructure and security problems, and in return, they get all your data. Not in the Cambridge-Analytica/Facebook style of “get all your data”. More like the S3 style; where the cost (in dollars 💲) or the cost (in time/effort 🕔) are high enough to dissuade you from trying to leave. There’s a strong lock-in effect.

0 views
Zak Knill 2 years ago

So you want to build Miro and Figma style collaboration?

Miro and Figma have a bunch of collaboration features, in this post I’m going to break down two of those features and look at what you’d have to think about when building these into your own apps. Disclaimer: I work for a company in this product space, which is why I care about these problems . Lets start with.. Collaborative cursors allow multiple users to interact on the same page of a website, and for each participant to see where the other participants are pointing or moving their cursors.

0 views
Zak Knill 2 years ago

Streaming data aggregation

Imagine you’re presented with this problem: Design a system that can show the top 10 most popular songs over the last 10 seconds on the homepage of a music streaming service. You have access to a queue of events representing song ‘plays’ with a tuple. The data should update, and be as fresh as possible. We are given this to work with, we need to design a system that satisfies the requirements, replacing the “❓”:

0 views
Zak Knill 2 years ago

The egg test: a model for reversible and irreversible decision making

Some talk of one way and two way doors, but I prefer the ’egg test’ . Do you want to know if it’s an egg? … Drop it, if it cracks and you see an egg, then it’s an egg. Clearly the problem here is that it’s now not possible to use that egg. This is an irreversible decision . That egg is never going to be hole again. And it’s going to be super hard to separate the egg, the shell, and whatever else was on the surface you dropped the egg on.

0 views
Zak Knill 2 years ago

Standard lib structured logging in Go 1.21

Go 1.21 includes structured logging (with levels) as a standard library package. In this post we look at the package, and what it provides. TL;DR - here’s a package to make using easier: github.com/zknill/slogmw comes with 4 levels: The functions providing each level come with the same pair of exported method calls. Here’s the example for :

0 views
Zak Knill 2 years ago

SQLedge - Postgres on the edge

SQLedge uses Postgres logical replication to stream the changes in a source Postgres database to an SQLite database that can run on the edge. SQLedge serves reads from its local SQLite database, and forwards writes to the upstream Postgres server that it’s replicating from. This lets you run your apps on the edge, and have local, fast, and eventually consistent access to your data. Check out the repo on Github

0 views