Posts in Bash (20 found)
André Arko 3 days ago

Announcing <code>rv</code> 0.2

With the help of many new contributors, and after many late nights wrestling with make, we are happy to (slightly belatedly) announce the 0.2 release of rv ! This version dramatically expands support for Rubies, shells, and architectures. Rubies: we have added Ruby 3.3, as well as re-compiled all Ruby 3.3 and 3.4 versions with YJIT. On Linux, YJIT increases our glibc minimum version to 2.35 or higher. That means most distro releases from 2022 or later should work, but please let us know if you run into any problems. Shells: we have added support for bash, fish, and nushell in addition to zsh. Architectures: we have added Ruby compiled for macOS on x86, in addition to Apple Silicon, and added Ruby compiled for Linux on ARM, in addition to x86. Special thanks to newest member of the maintainers’ team @adamchalmers for improving code and tests, adding code coverage and fuzzing, heroic amounts of issue triage, and nushell support. Additional thanks are due to all the new contributors in version 0.2, including @Thomascountz , @lgarron , @coezbek , and @renatolond . To upgrade, run , or check the release notes for other options.

0 views

LLMs Eat Scaffolding for Breakfast

We just deleted thousands of lines of code. Again. Each time a new LLM model comes out, that’s the same story. LLMs have limitations so we build scaffolding around them. Each models introduce new capabilities so that old scaffoldings must be deleted and new ones be added. But as we move closer to super intelligence, less scaffoldings are needed. This post is about what it takes to build successfully in AI today. Every line of scaffolding is a confession: the model wasn’t good enough. LLMs can’t read PDF? Let’s build a complex system to convert PDF to markdown LLMs can’t do math? Let’s build compute engine to return accurate numbers LLMs can’t handle structured output? Let’s build complex JSON validators and regex parsers LLMs can’t read images? Let’s use a specialized image to text model to describe the image to the LLM LLMs can’t read more than 3 pages? Let’s build a complex retrieval pipeline with a search engine to feed the best content to the LLM. LLMs can’t reason? Let’s build chain-of-thought logic with forced step-by-step breakdowns, verification loops, and self-consistency checks. etc, etc... millions of lines of code to add external capabilities to the model. But look at models today: GPT-5 is solving frontier mathematics, Grok-4 Fast can read 3000+ pages with its 2M context window, Claude 4.5 sonnet can ingest images or PDFs, all models have native reasoning capabilities and support structured outputs. The once essential scaffolding are now obsolete. Those tools are backed in the model capabilities. It’s nearly impossible to predict what scaffolding will become obsolete and when. What appears to be essential infrastructure and industry best practice today can transform into legacy technical debt within months. The best way to grasp how fast LLMs are eating scaffolding is to look at their system prompt (the top-level instruction that tells the AI how to behave). Looking at the prompt used in Codex, OpenAI coding agent from GPT-o3 model to GPT-5 is mind-blowing. GPT-o3 prompt: 310 lines GPT-5 prompt: 104 lines The new prompt removed 206 lines. A 66% reduction. GPT-5 needs way less handholding. The old prompt had complex instructions on how to behave as a coding agent (personality, preambles, when to plan, how to validate). The new prompt assumes GPT-5 already knows this and only specifies the Codex-specific technical requirements (sandboxing, tool usage, output formatting). The new prompt removed all the detailed guidance about autonomously resolving queries, coding guidelines, git usage. It’s also less prescriptive. Instead of “do this and this” it says “here are the tools at your disposal.” As we move closer to super intelligence, the models require more freedom and leeway (scary, lol!). Advanced models require simple instructions and tooling. Claude Code, the most sophisticated agent today, relies on a simple filesystem instead of a complex index and use bash commands (find, read, grep, glob) instead of complex tools. It moves so fast. Each model introduces a new paradigm shift. If you miss a paradigm shift, you’re dead. Having an edge in building AI applications require deep technical understanding, insatiable curiosity, and low ego. By the way, because everything changes, it’s good to focus on what won’t change Context window is how much text you can feed the model in a single conversation. Early model could only handle a couple of pages. Now it’s thousands of pages and it’s growing fast. Dario Amodei the founder of Anthropic expects 100M+ context windows while Sam Altman hinted at billions of context tokens . It means the LLMs can see more context so you need less scaffolding like retrieval augmented generation. November 2022 : GPT-3.5 could handle 4K context November 2023 : GPT-4 Turbo with 128K context June 2024 : Claude 3.5 Sonnet with 200K context June 2025 : Gemini 2.5 Pro with 1M context September 2025 : Grok-4 Fast with 2M context Models used to stream at 30-40 tokens per second. Today’s fastest models like Gemini 2.5 Flash and Grok-4 Fast hit 200+ tokens per second. A 5x improvement. On specialized AI chips (LPUs), providers like Cerebras push open-source models to 2,000 tokens per second. We’re approaching real-time LLM: full responses on complex task in under a second. LLMs are becoming exponentially smarter. With every new model, benchmarks get saturated. On the path to AGI, every benchmark will get saturated. Every job can be done and will be done by AI. As with humans, a key factor in intelligence is the ability to use tools to accomplish an objective. That is the current frontier: how well a model can use tools such as reading, writing, and searching to accomplish a task over a long period of time. This is important to grasp. Models will not improve their language translation skills (they are already at 100%), but they will improve how they chain translation tasks over time to accomplish a goal. For example, you can say, “Translate this blog post into every language on Earth,” and the model will work for a couple of hours on its own to make it happen. Tool use and long-horizon tasks are the new frontier. The uncomfortable truth: most engineers are maintaining infrastructure that shouldn’t exist. Models will make it obsolete and the survival of AI apps depends on how fast you can adapt to the new paradigm. That’s what startups have an edge over big companies. Bigcorp are late by at least two paradigms. Some examples of scaffolding that are on the decline: Vector databases : Companies paying thousands/month for when they could now just put docs in the prompt or use agentic-search instead of RAG ( my article on the topic ) LLM frameworks : These frameworks solved real problems in 2023. In 2025? They’re abstraction layers that slow you down. The best practice is now to use the model API directly. Prompt engineering teams : Companies hiring “prompt engineers” to craft perfect prompts when now current models just need clear instructions with open tools Model fine-tuning : Teams spending months fine-tuning models only for the next generation of out of the box models to outperform their fine-tune (cf my 2024 article on that ) Custom caching layers : Building Redis-backed semantic caches that add latency and complexity when prompt caching is built into the API. This cycle accelerates with every model release. The best AI teams master have critical skills: Deep model awareness : They understand exactly what today’s models can and cannot do, building only the minimal scaffolding needed to bridge capability gaps. Strategic foresight : They distinguish between infrastructure that solves today’s problems versus infrastructure that will survive the next model generation. Frontier vigilance : They treat model releases like breaking news. Missing a single capability announcement from OpenAI, Anthropic, or Google can render months of work obsolete. Ruthless iteration : They celebrate deleting code. When a new model makes their infrastructure redundant, they pivot in days, not months. It’s not easy. Teams are fighting powerful forces: Lack of awareness : Teams don’t realize models have improved enough to eliminate scaffolding (this is massive btw) Sunk cost fallacy : “We spent 3 years building this RAG pipeline!” Fear of regression : “What if the new approach is simple but doesn’t work as well on certain edge cases?” Organizational inertia : Getting approval to delete infrastructure is harder than building it Resume-driven development : “RAG pipeline with vector DB and reranking” looks better on a resume than “put files in prompt” In AI the best team builds for fast obsolescence and stay at the edge. Software engineering sits on top of a complex stack. More layers, more abstractions, more frameworks. Complexity was a sophistication. A simple web form in 2024? React for UI, Redux for state, TypeScript for types, Webpack for bundling, Jest for testing, ESLint for linting, Prettier for formatting, Docker for deployment…. AI is inverting this. The best AI code is simple and close to the model. Experienced engineers look at modern AI codebases and think: “This can’t be right. Where’s the architecture? Where’s the abstraction? Where’s the framework?” The answer: The model ate it bro, get over it. The worst AI codebases are the ones that were best practices 12 months ago. As models improve, the scaffolding becomes technical debt. The sophisticated architecture becomes the liability. The framework becomes the bottleneck. LLMs eat scaffolding for breakfast and the trend is accelerating. Thanks for reading! Subscribe for free to receive new posts and support my work. LLMs can’t read PDF? Let’s build a complex system to convert PDF to markdown LLMs can’t do math? Let’s build compute engine to return accurate numbers LLMs can’t handle structured output? Let’s build complex JSON validators and regex parsers LLMs can’t read images? Let’s use a specialized image to text model to describe the image to the LLM LLMs can’t read more than 3 pages? Let’s build a complex retrieval pipeline with a search engine to feed the best content to the LLM. LLMs can’t reason? Let’s build chain-of-thought logic with forced step-by-step breakdowns, verification loops, and self-consistency checks. Vector databases : Companies paying thousands/month for when they could now just put docs in the prompt or use agentic-search instead of RAG ( my article on the topic ) LLM frameworks : These frameworks solved real problems in 2023. In 2025? They’re abstraction layers that slow you down. The best practice is now to use the model API directly. Prompt engineering teams : Companies hiring “prompt engineers” to craft perfect prompts when now current models just need clear instructions with open tools Model fine-tuning : Teams spending months fine-tuning models only for the next generation of out of the box models to outperform their fine-tune (cf my 2024 article on that ) Custom caching layers : Building Redis-backed semantic caches that add latency and complexity when prompt caching is built into the API. Deep model awareness : They understand exactly what today’s models can and cannot do, building only the minimal scaffolding needed to bridge capability gaps. Strategic foresight : They distinguish between infrastructure that solves today’s problems versus infrastructure that will survive the next model generation. Frontier vigilance : They treat model releases like breaking news. Missing a single capability announcement from OpenAI, Anthropic, or Google can render months of work obsolete. Ruthless iteration : They celebrate deleting code. When a new model makes their infrastructure redundant, they pivot in days, not months. Lack of awareness : Teams don’t realize models have improved enough to eliminate scaffolding (this is massive btw) Sunk cost fallacy : “We spent 3 years building this RAG pipeline!” Fear of regression : “What if the new approach is simple but doesn’t work as well on certain edge cases?” Organizational inertia : Getting approval to delete infrastructure is harder than building it Resume-driven development : “RAG pipeline with vector DB and reranking” looks better on a resume than “put files in prompt”

0 views
baby steps 1 weeks ago

SymmACP: extending Zed's ACP to support Composable Agents

This post describes SymmACP – a proposed extension to Zed’s Agent Client Protocol that lets you build AI tools like Unix pipes or browser extensions. Want a better TUI? Found some cool slash commands on GitHub? Prefer a different backend? With SymmACP, you can mix and match these pieces and have them all work together without knowing about each other. This is pretty different from how AI tools work today, where everything is a monolith – if you want to change one piece, you’re stuck rebuilding the whole thing from scratch. SymmACP allows you to build out new features and modes of interactions in a layered, interoperable way. This post explains how SymmACP would work by walking through a series of examples. Right now, SymmACP is just a thought experiment. I’ve sketched these ideas to the Zed folks, and they seemed interested, but we still have to discuss the details in this post. My plan is to start prototyping in Symposium – if you think the ideas I’m discussing here are exciting, please join the Symposium Zulip and let’s talk! I’m going to explain the idea of “composable agents” by walking through a series of features. We’ll start with a basic CLI agent 1 tool – basically a chat loop with access to some MCP servers so that it can read/write files and execute bash commands. Then we’ll show how you could add several features on top: The magic trick is that each of these features will be developed as separate repositories. What’s more, they could be applied to any base tool you want, so long as it speaks SymmACP. And you could also combine them with different front-ends, such as a TUI, a web front-end, builtin support from Zed or IntelliJ , etc. Pretty neat. My hope is that if we can centralize on SymmACP, or something like it, then we could move from everybody developing their own bespoke tools to an interoperable ecosystem of ideas that can build off of one another. SymmACP begins with ACP, so let’s explain what ACP is. ACP is a wonderfully simple protocol that lets you abstract over CLI agents. Imagine if you were using an agentic CLI tool except that, instead of communication over the terminal, the CLI tool communicates with a front-end over JSON-RPC messages, currently sent via stdin/stdout. When you type something into the GUI, the editor sends a JSON-RPC message to the agent with what you typed. The agent responds with a stream of messages containing text and images. If the agent decides to invoke a tool, it can request permission by sending a JSON-RPC message back to the editor. And when the agent has completed, it responds to the editor with an “end turn” message that says “I’m ready for you to type something else now”. OK, let’s tackle our first feature. If you’ve used a CLI agent, you may have noticed that they don’t know what time it is – or even what year it is. This may sound trivial, but it can lead to some real mistakes. For example, they may not realize that some information is outdated. Or when they do web searches for information, they can search for the wrong thing: I’ve seen CLI agents search the web for “API updates in 2024” for example, even though it is 2025. To fix this, many CLI agents will inject some extra text along with your prompt, something like . This gives the LLM the context it needs. So how could use ACP to build that? The idea is to create a proxy . This proxy would wrap the original ACP server: This proxy will take every “prompt” message it receives and decorate it with the date and time: Simple, right? And of course this can be used with any editor and any ACP-speaking tool. Let’s look at another feature that basically “falls out” from ACP: injecting personality. Most agents give you the ability to configure “context” in various ways – or what Claude Code calls memory . This is useful, but I and others have noticed that if what you want is to change how Claude “behaves” – i.e., to make it more collaborative – it’s not really enough. You really need to kick off the conversation by reinforcing that pattern. In Symposium, the “yiasou” prompt (also available as “hi”, for those of you who don’t speak Greek 😛) is meant to be run as the first thing in the conversation. But there’s nothing an MCP server can do to ensure that the user kicks off the conversation with or something similar. Of course, if Symposium were implemented as an ACP Server, we absolutely could do that: Some of you may be saying, “hmm, isn’t that what hooks are for?” And yes, you could do this with hooks, but there’s two problems with that. First, hooks are non-standard, so you have to do it differently for every agent. The second problem with hooks is that they’re fundamentally limited to what the hook designer envisioned you might want. You only get hooks at the places in the workflow that the tool gives you, and you can only control what the tool lets you control. The next feature starts to show what I mean: as far as I know, it cannot readily be implemented with hooks the way I would want it to work. Let’s move on to our next feature, long-running asynchronous tasks. This feature is going to have to go beyond the current capabilities of ACP into the expanded “SymmACP” feature set. Right now, when the server invokes an MCP tool, it executes in a blocking way. But sometimes the task it is performing might be long and complicated. What you would really like is a way to “start” the task and then go back to working. When the task is complete, you (and the agent) could be notified. This comes up for me a lot with “deep research”. A big part of my workflow is that, when I get stuck on something I don’t understand, I deploy a research agent to scour the web for information. Usually what I will do is ask the agent I’m collaborating with to prepare a research prompt summarizing the things we tried, what obstacles we hit, and other details that seem relevant. Then I’ll pop over to claude.ai or Gemini Deep Research and paste in the prompt. This will run for 5-10 minutes and generate a markdown report in response. I’ll download that and give it to my agent. Very often this lets us solve the problem. 2 This research flow works well but it is tedious and requires me to copy-and-paste. What I would ideally want is an MCP tool that does the search for me and, when the results are done, hands them off to the agent so it can start processing immediately. But in the meantime, I’d like to be able to continue working with the agent while we wait. Unfortunately, the protocol for tools provides no mechanism for asynchronous notifications like this, from what I can tell. So how would I do it with SymmACP? Well, I would want to extend the ACP protocol as it is today in two ways: In that case, we could implement our Research Proxy like so: What’s cool about this is that the proxy encapsulates the entire flow: it knows how to do the research, and it manages notifying the various participants when the research completes. (Also, this leans on one detail I left out, which is that ) Let’s explore our next feature, Q CLI’s mode . This feature is interesting because it’s a simple (but useful!) example of history editing. The way works is that, when you first type , Q CLI saves your current state. You can then continue as normal but when you next type , your state is restored to where you were. This, as the name suggests, lets you explore a side conversation without polluting your main context. The basic idea for supporting tangent in SymmACP is that the proxy is going to (a) intercept the tangent prompt and remember where it began; (b) allow the conversation to continue as normal; and then (c) when it’s time to end the tangent, create a new session and replay the history up until the point of the tangent 3 . You can almost implement “tangent” in ACP as it is, but not quite. In ACP, the agent always owns the session history. The editor can create a new session or load an older one; when loading an older one, the agent “replays” “replays” the events so that the editor can reconstruct the GUI. But there is no way for the editor to “replay” or construct a session to the agent . Instead, the editor can only send prompts, which will cause the agent to reply. In this case, what we want is to be able to say “create a new chat in which I said this and you responded that” so that we can setup the initial state. This way we could easily create a new session that contains the messages from the old one. So how this would work: One of the nicer features of Symposium is the ability to do interactive walkthroughs . These consist of an HTML sidebar as well as inline comments in the code: Right now, this is implemented by a kind of hacky dance: It works, but it’s a giant Rube Goldberg machine. With SymmACP, we would structure the passthrough mechanism as a proxy. Just as today, it would provide an MCP tool to the agent to receive the walkthrough markdown. It would then convert that into the HTML to display on the side along with the various comments to embed in the code. But this is where things are different. Instead of sending that content over IPC, what I would want to do is to make it possible for proxies to deliver extra information along with the chat. This is relatively easy to do in ACP as is, since it provides for various capabilities, but I think I’d want to go one step further I would have a proxy layer that manages walkthroughs. As we saw before, it would provide a tool. But there’d be one additional thing, which is that, beyond just a chat history, it would be able to convey additional state. I think the basic conversation structure is like: but I think it’d be useful to (a) be able to attach metadata to any of those things, e.g., to add extra context about the conversation or about a specific turn (or even a specific prompt ), but also additional kinds of events. For example, tool approvals are an event . And presenting a walkthrough and adding annotations are an event too. The way I imagine it, one of the core things in SymmACP would be the ability to serialize your state to JSON. You’d be able to ask a SymmACP paricipant to summarize a session. They would in turn ask any delegates to summarize and then add their own metadata along the way. You could also send the request in the other direction – e.g., the agent might present its state to the editor and ask it to augment it. This would mean a walkthrough proxy could add extra metadata into the chat transcript like “the current walkthrough” and “the current comments that are in place”. Then the editor would either know about that metadata or not. If it doesn’t, you wouldn’t see it in your chat. Oh well – or perhaps we do something HTML like, where there’s a way to “degrade gracefully” (e.g., the walkthrough could be presented as a regular “response” but with some metadata that, if you know to look, tells you to interpret it differently). But if the editor DOES know about the metadata, it interprets it specially, throwing the walkthrough up in a panel and adding the comments into the code. With enriched histories, I think we can even say that in SymmACP, the ability to load, save, and persist sessions itself becomes an extension, something that can be implemented by a proxy; the base protocol only needs the ability to conduct and serialize a conversation. Let me sketch out another feature that I’ve been noodling on that I think would be pretty cool. It’s well known that there’s a problem that LLMs get confused when there are too many MCP tools available. They get distracted. And that’s sensible, so would I, if I were given a phonebook-size list of possible things I could do and asked to figure something out. I’d probably just ignore it. But how do humans deal with this? Well, we don’t take the whole phonebook – we got a shorter list of categories of options and then we drill down. So I go to the File Menu and then I get a list of options, not a flat list of commands. I wanted to try building an MCP tool for IDE capabilities that was similar. There’s a bajillion set of things that a modern IDE can “do”. It can find references. It can find definitions. It can get type hints. It can do renames. It can extract methods. In fact, the list is even open-ended, since extensions can provide their own commands. I don’t know what all those things are but I have a sense for the kinds of things an IDE can do – and I suspect models do too. What if you gave them a single tool, “IDE operation”, and they could use plain English to describe what they want? e.g., . Hmm, this is sounding a lot like a delegate, or a sub-agent. Because now you need to use a second LLM to interpret that request – you probably want to do something like, give it a list of sugested IDE capabilities and the ability to find out full details and ask it to come up with a plan (or maybe directly execute the tools) to find the answer. As it happens, MCP has a capability to enable tools to do this – it’s called (somewhat oddly, in my opinion) “sampling”. It allows for “callbacks” from the MCP tool to the LLM. But literally nobody implements it, from what I can tell. 4 But sampling is kind of limited anyway. With SymmACP, I think you could do much more interesting things. The key is that ACP already permits a single agent to “serve up” many simultaneous sessions. So that means that if I have a proxy, perhaps one supplying an MCP tool definition, I could use it to start fresh sessions – combine that with the “history replay” capability I mentioned above, and the tool can control exactly what context to bring over into that session to start from, as well, which is very cool (that’s a challenge for MCP servers today, they don’t get access to the conversation history). Ok, this post sketched a variant on ACP that I call SymmACP. SymmACP extends ACP with Most of these are modest extensions to ACP, in my opinion, and easily doable in a backwards fashion just by adding new capabilities. But together they unlock the ability for anyone to craft extensions to agents and deploy them in a composable way. I am super excited about this. This is exactly what I wanted Symposium to be all about. It’s worth noting the old adage: “with great power, comes great responsibility”. These proxies and ACP layers I’ve been talking about are really like IDE extensions. They can effectively do anything you could do. There are obvious security concerns. Though I think that approaches like Microsoft’s Wassette are key here – it’d be awesome to have a “capability-based” notion of what a “proxy layer” is, where everything compiles to WASM, and where users can tune what a given proxy can actually do . I plan to start sketching a plan to drive this work in Symposium and elsewhere. My goal is to have a completely open and interopable client, one that can be based on any agent (including local ones) and where you can pick and choose which parts you want to use. I expect to build out lots of custom functionality to support Rust development (e.g., explaining and diagnosting trait errors using the new trait solver is high on my list…and macro errors…) but also to have other features like walkthroughs, collaborative interaction style, etc that are all language independent – and I’d love to see language-focused features for other langauges, especially Python and TypeScript (because “the new trifecta” ) and Swift and Kotlin (because mobile). If that vision excites you, come join the Symposium Zulip and let’s chat! One question I’ve gotten when discussing this is how it compares to the other host of protocols out there. Let me give a brief overview of the related work and how I understand its pros and cons: Everybody uses agents in various ways. I like Simon Willison’s “agents are models using tools in a loop” definition; I feel that an “agentic CLI tool” fits that definition, it’s just that part of the loop is reading input from the user. I think “fully autonomous” agents are a subset of all agents – many agent processes interact with the outside world via tools etc. From a certain POV, you can view the agent “ending the turn” as invoking a tool for “gimme the next prompt”.  ↩︎ Research reports are a major part of how I avoid hallucination. You can see an example of one such report I commissioned on the details of the Language Server Protocol here ; if we were about to embark on something that required detailed knowledge of LSP, I would ask the agent to read that report first.  ↩︎ Alternatively: clear the session history and rebuild it, but I kind of prefer the functional view of the world, where a given session never changes.  ↩︎ I started an implementation for Q CLI but got distracted – and, for reasons that should be obvious, I’ve started to lose interest.  ↩︎ Yes, you read that right. There is another ACP. Just a mite confusing when you google search. =)  ↩︎ Addressing time-blindness by helping the agent know what time it is. Injecting context and “personality” to the agent. Spawning long-running, asynchronous tasks. A copy of Q CLI’s mode that lets you do a bit of “off the books” work that gets removed from your history later. Implementing Symposium’s interactive walkthroughs , which give the agent a richer vocabulary for communicating with you than just text. Smarter tool delegation. I’d like the ACP proxy to be able to provide tools that the proxy will execute. Today, the agent is responsible for executing all tools; the ACP protocol only comes into play when requesting permission . But it’d be trivial to have MCP tools where, to execute the tool, the agent sends back a message over ACP instead. I’d like to have a way for the agent to initiate responses to the editor . Right now, the editor always initiatives each communication session with a prompt; but, in this case, the agent might want to send messages back unprompted. The agent invokes an MCP tool and sends it the walkthrough in markdown. This markdown includes commands meant to be placed on particular lines, identified not by line number (agents are bad at line numbers) but by symbol names or search strings. The MCP tool parses the markdown, determines the line numbers for comments, and creates HTML. It sends that HTML over IPC to the VSCode extension. The VSCode receives the IPC message, displays the HTML in the sidebar, and creates the comments in the code. Conversation Turn User prompt(s) – could be zero or more Response(s) – could be zero or more Tool use(s) – could be zero or more the ability for either side to provide the initial state of a conversation, not just the server the ability for an “editor” to provide an MCP tool to the “agent” the ability for agents to respond without an initial prompt the ability to serialize conversations and attach extra state (already kind of present) Model context protocol (MCP) : The queen of them all. A protocol that provides a set of tools, prompts, and resources up to the agent. Agents can invoke tools by supplying appropriate parameters, which are JSON. Prompts are shorthands that users can invoke using special commands like or , they are essentially macros that expand “as if the user typed it” (but they can also have parameters and be dynamically constructed). Resources are just data that can be requested. MCP servers can either be local or hosted remotely. Remote MCP has only recently become an option and auth in particular is limited. Comparison to SymmACP: MCP provides tools that the agent can invoke. SymmACP builds on it by allowing those tools to be provided by outer layers in the proxy chain. SymmACP is oriented at controlling the whole chat “experience”. Zed’s Agent Client Protocol (ACP) : The basis for SymmACP. Allows editors to create and manage sessions. Focused only on local sessions, since your editor runs locally. Comparison to SymmACP: That’s what this post is all about! SymmACP extends ACP with new capabilities that let intermediate layers manipulate history, provide tools, and provide extended data upstream to support richer interaction patterns than jus chat. PS I expect we may want to support more remote capabilities, but it’s kinda orthogonal in my opinion (e.g., I’d like to be able to work with an agent running over in a cloud-hosted workstation, but I’d probably piggyback on ssh for that). Google’s Agent-to-Agent Protocol (A2A) and IBM’s Agent Communication Protocol (ACP) 5 : From what I can tell, Google’s “agent-to-agent” protocol is kinda like a mix of MCP and OpenAPI. You can ping agents that are running remotely and get them to send you “agent cards”, which describe what operations they can perform, how you authenticate, and other stuff like that. It looks to me quite similar to MCP except that it has richer support for remote execution and in particular supports things like long-running communication, where an agent may need to go off and work for a while and then ping you back on a webhook. Comparison to MCP: To me, A2A looks like a variant of MCP that is more geared to remote execution. MCP has a method for tool discovery where you ping the server to get a list of tools; A2A has a similar mechanism with Agent Cards. MCP can run locally, which A2A cannot afaik, but A2A has more options about auth. MCP can only be invoked synchronously, whereas A2A supports long-running operations, progress updates, and callbacks. It seems like the two could be merged to make a single whole. Comparison to SymmACP: I think A2A is orthogonal from SymmACP. A2A is geared to agents that provide services to one another. SymmACP is geared towards building new development tools for interacting with agents. It’s possible you could build something like SymmACP on A2A but I don’t know what you would really gain by it (and I think it’d be easy to do later). Everybody uses agents in various ways. I like Simon Willison’s “agents are models using tools in a loop” definition; I feel that an “agentic CLI tool” fits that definition, it’s just that part of the loop is reading input from the user. I think “fully autonomous” agents are a subset of all agents – many agent processes interact with the outside world via tools etc. From a certain POV, you can view the agent “ending the turn” as invoking a tool for “gimme the next prompt”.  ↩︎ Research reports are a major part of how I avoid hallucination. You can see an example of one such report I commissioned on the details of the Language Server Protocol here ; if we were about to embark on something that required detailed knowledge of LSP, I would ask the agent to read that report first.  ↩︎ Alternatively: clear the session history and rebuild it, but I kind of prefer the functional view of the world, where a given session never changes.  ↩︎ I started an implementation for Q CLI but got distracted – and, for reasons that should be obvious, I’ve started to lose interest.  ↩︎ Yes, you read that right. There is another ACP. Just a mite confusing when you google search. =)  ↩︎

0 views
Robin Moffatt 1 weeks ago

Stumbling into AI: Part 5—Agents

A short series of notes for myself as I learn more about the AI ecosystem as of Autumn [Fall] 2025. The driver for all this is understanding more about Apache Flink’s Flink Agents project, and Confluent’s Streaming Agents . I started off this series —somewhat randomly, with hindsight—looking at Model Context Protocol ( MCP ) . It’s a helper technology to make things easier to use and provide a richer experience. Next I tried to wrap my head around Models —mostly LLMs, but also with an addendum discussing other types of model too. Along the lines of MCP, Retrieval Augmented Generation ( RAG ) is another helper technology that on its own doesn’t do anything but combined with an LLM gives it added smarts. I took a brief moment in part 4 to try and build a clearer understanding of the difference between ML and AI . So whilst RAG and MCP combined make for a bunch of nice capabilities beyond models such as LLMs alone, what I’m really circling around here is what we can do when we combine all these things: Agents ! But…what is an Agent, both conceptually and in practice? Let’s try and figure it out. Let’s begin with Wikipedia’s definition : In computer science, a software agent is a computer program that acts for a user or another program in a relationship of agency . We can get more specialised if we look at Wikipedia’s entry for an Intelligent Agent : In artificial intelligence, an intelligent agent is an entity that perceives its environment, takes actions autonomously to achieve goals , and may improve its performance through machine learning or by acquiring knowledge . Citing Wikipedia is perhaps the laziest ever blog author’s trick, but I offer no apologies 😜. Behind all the noise and fuss, this is what we’re talking about: a bit of software that’s going to go and do something for you (or your company) autonomously . LangChain have their own definition of an Agent, explicitly identifying the use of an LLM: An AI agent is a system that uses an LLM to decide the control flow of an application. The blog post from LangChain as a whole gives more useful grounding in this area and is worth a read. In fact, if you want to really get into it, the LangChain Academy is free and the Introduction to LangGraph course gives a really good primer on Agents and more. Meanwhile, the Anthropic team have a chat about their definition of an Agent . In a blog post Anthropic differentiates between Workflows (that use LLMs) and Agents: Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks. Independent researcher Simon Willison also uses the LLM word in his definition: An LLM agent runs tools in a loop to achieve a goal. He explores the definition in a recent blog post: I think “agent” may finally have a widely enough agreed upon definition to be useful jargon now , in which Josh Bickett’s meme demonstrates how much of a journey this definition has been on: That there’s still discussion and ambiguity nearly two years after this meme was created is telling. My colleague Sean Falconer knows a lot more this than I do. He was a guest on a recent podcast episode in which he spells things out: [Agentic AI] involves AI systems that can reason, dynamically choose tasks, gather information, and perform actions as a more complete software system. [ 1 ] [Agents] are software that can dynamically decide its own control flow: choosing tasks, workflows, and gathering context as needed. Realistically, current enterprise agents have limited agency[…]. They’re mostly workflow automations rather than fully autonomous systems . [ 2 ] In many ways […] an agent [is] just a microservice . [ 3 ] A straightforward software Agent might do something like: Order more biscuits when there are only two left The pseudo-code looks like this: We take this code, stick it on a server and leave it to run. One happy Agent, done. An AI Agent could look more like this: Other examples of AI Agents include: Coding Agents . Everyone’s favourite tool (when used right). It can reason about code, it can write code, it can review PRs. One of the trends that I’ve noticed recently (October 2025) is the use of Agents to help with some of the up-front jobs in software engineering (such as data modelling and writing tests ), rather than full-blown code that’s going to ship to production. That’s not to say that coding Agents aren’t being used for that, but by using AI to accelerate certain tasks whilst retaining human oversight (a.k.a. HITL ) it makes it easier to review the output rather than just trusting to luck that reams and reams of code are correct. There’s a good talk from Uber on how they’re using AI in the development process, including code conversion, and testing. Travel booking . Perhaps you tell it when you want to go, the kind of vacation you like, and what your budget is; it then goes and finds where it’s nice at that time of year, figures out travel plans within your budget, and either proposes an itinerary or even books it for you. Another variation could be you tell it where , and then it integrates with your calendar to figure out the when . This is a canonical example that is oft-cited; I’d be interested if anyone can point me to an actual implementation of it, even if just a toy one . I saw this in a blog post from Simon Willison that made me wince, but am leaving the above in anyway just to serve as an example of the confusion/hype that exists in this space: comes from plus , the latter meaning of, relating to, or characterised by . So is simply AI that is characterised by an Agent, or Agency. Contrast that to AI that’s you sat at the ChatGPT prompt asking it to draw pictures of a duck dressed as a clown . Nothing Agentic about that—just a human-led and human-driven interaction. "AI Agents" becomes a bit of a mouthful with the qualifier, so much of the current industry noise is simply around "Agents". That said, "Agentic AI" sounds cool, so gets used as the marketing term in place of "AI" alone. So we’ve muddled our way through to some kind of understanding of what an Agent is, and what we mean by Agentic AI. But how do we actually build one? All we need is an LLM (such as access to the API for OpenAI or Claude ), something to call that API (there are worse choices than !), and a way to call external services (e.g. MCP servers) if the LLM determines that it needs to use them. So in theory we could build an Agent with some lines of bash, some API calls, and a bunch of sticky-backed plastic . This is a grossly oversimplified example (and is missing elements such as memory)—but it hopefully illustrates what we’re building at the core of an Agent. On top of this goes all the general software engineering requirements of any system that gets built (suitable programming language and framework, error handling, LLM output validation, guard rails, observability, tests, etc etc). The other nuance that I’ve noticed is that whilst the above simplistic diagram is 100% driven by an LLM (it decides what tools to call, it decides when to iterate) there are plenty of cases where an Agent is to some degree rules-driven. So perhaps the LLM does some of the autonomous work, but then there’s a bunch of good ol' statements in there too. This is also borne out by the notion of "Workflows" when people talk about Agents. An Agent doesn’t wake up in the morning and set out on its day serving only to fulfill its own goals and enrichment. More often than not an Agent is going to be tightly bound into a pre-defined path with a limited range of autonomy. What if you want to actually build this kind of thing for real? That’s where tools like LangGraph and LangChain come in. Here’s a notebook with an example of an actual Agent built with these tools. LlamaIndex is another framework, with details of building an Agent in their docs. As we build up from the so-simple-it-is-laughable strawman example of an Agent above, one of the features we’ll soon encounter is the concept of memory. The difference between a crappy response and a holy-shit-that’s-magic response from an LLM is often down to context . The richer the context, the better a chance it has at generating a more accurate output. So if an Agent can look back on what it did previously, determining what worked well and what didn’t, perhaps even taking into account human feedback, it can then generate a more successful response the next time. You can read a lot more about memory in this chapter of Agentic Design Patterns by Antonio Gulli . This blog post from "The BIG DATA guy" is also useful: Agentic AI, Agent Memory, & Context Engineering This diagram from Generative Agents: Interactive Simulacra of Human Behavior (J.S. Park, J.C. O’Brien, C.J. Cai, M.R. Morris, P. Liang, M.S. Bernstein) gives a good overview of a much richer definition of an Agent’s implementation. The additional concepts include memory (discussed briefly above), planning, and reflection: Also check out Paul Iusztin’s talk from QCon London 2025 on The Data Backbone of LLM Systems . Around the 35-minute mark he goes into some depth around Agent architectures. Just as you can build computer systems as monoliths (everything done in one place) or microservices (multiple programs, each responsible for a discrete operation or domain), you can also have one big Agent trying to do everything (probably not such a good idea) or individual Agents each good at their particular thing that are then hooked together into what’s known as a Multi-Agent System (MAS). Sean Falconer’s family meal planning demo is a good example of a MAS. One Agent plans the kids' meals, one the adults' meals, another combines the two into a single plan, and so on. This is a term you’ll come across referring to the fact that Agents might be pretty good, but they’re not infallible. In the travel booking example above, do we really trust the Agent to book the best holiday for us? Almost certainly we’d want—at a minimum—the option to sign off on the booking before it goes ahead and sinks £10k on an all-inclusive trip to Bognor Regis. Then again, we’re probably happy enough for an Agent to access our calendars without asking permission, and as to whether they need permission or not to create a meeting is up to us and how much we trust them. When it comes to coding, having an Agent write code, test it, fix the broken tests, compare it to a spec, and iterate is really neat. On the other hand, letting it decide to run …less so 😅. Every time an Agent requires HITL, it reduces its autonomy and/or responsiveness to situations. As well as simply using smarter models that make fewer mistakes, there are other things that an Agent can do to reduce the need for HITL such as using guardrails to define acceptable parameters. For example, an Agent is allowed to book travel but only up to a defined threshold. That way the user gets to trade off convenience (no HITL) with risk (unintended first-class flight to Hawaii). 📃 Generative Agents: Interactive Simulacra of Human Behavior 🎥 Paul Iusztin - The Data Backbone of LLM Systems - QCon London 2025 📖 Antonio Gulli - Agentic Design Patterns 📖 Sean Falconer - https://seanfalconer.medium.com/ Coding Agents . Everyone’s favourite tool (when used right). It can reason about code, it can write code, it can review PRs. One of the trends that I’ve noticed recently (October 2025) is the use of Agents to help with some of the up-front jobs in software engineering (such as data modelling and writing tests ), rather than full-blown code that’s going to ship to production. That’s not to say that coding Agents aren’t being used for that, but by using AI to accelerate certain tasks whilst retaining human oversight (a.k.a. HITL ) it makes it easier to review the output rather than just trusting to luck that reams and reams of code are correct. There’s a good talk from Uber on how they’re using AI in the development process, including code conversion, and testing. Travel booking . Perhaps you tell it when you want to go, the kind of vacation you like, and what your budget is; it then goes and finds where it’s nice at that time of year, figures out travel plans within your budget, and either proposes an itinerary or even books it for you. Another variation could be you tell it where , and then it integrates with your calendar to figure out the when . This is a canonical example that is oft-cited; I’d be interested if anyone can point me to an actual implementation of it, even if just a toy one . I saw this in a blog post from Simon Willison that made me wince, but am leaving the above in anyway just to serve as an example of the confusion/hype that exists in this space: 📃 Generative Agents: Interactive Simulacra of Human Behavior 🎥 Paul Iusztin - The Data Backbone of LLM Systems - QCon London 2025 📖 Antonio Gulli - Agentic Design Patterns 📖 Sean Falconer - https://seanfalconer.medium.com/

0 views
Dayvster 1 weeks ago

Is Odin Just a More Boring C?

## Why I Tried Odin ### Background My recent posts have been diving deep into Zig and C, a shift from my earlier focus on React and JavaScript. This isn’t a pivot but a return to my roots. I started programming at 13 with C and C++, and over the years, I’ve built a wide range of projects in systems programming languages like C, C++, Rust, and now Zig. From hobby experiments and custom Linux utilities to professional embedded systems work think vehicle infotainment, tracking solutions, and low-level components I’ve always been drawn to the power and precision of systems programming. Alongside this, I’ve crafted tools for my own environment and tackled plenty of backend engineering, blending my full-stack expertise with a passion for low-level control. ### Why Odin Caught My Eye I like many others initially dismissed Odin as that language that was primarily intended for game development. It took me a moment or should I say many moments to realize just how stupid that notion was. Because let's analyze what game development actually means, it means building complex systems that need to be efficient, performant and reliable. It means working with graphics, physics, input handling, networking and more. It means dealing with concurrency, memory management and low level optimizations. In other words, game development is a perfect fit for a systems programming language like Odin. So basically if it's intended for game development, then it should be a great fit for general systems programming, desktop applications and since game dev usually means manual memory management without a garbage collector, it should also be possible to some extent to use it for embedded systems. So after I've gave myself a good slap on the forehead for being a bit of an idiot. I decided why not give Odin a fair shot and build something useful with it. ## The Project Now I may have been a bit liberal with the word useful there, what I actually decided to build was something that I usually like to build whenever I wanna try out a new language, namely a tiny key-value store with a pub/sub system. It won't win any awards for originality and I'm pretty sure the folks over at redis aren't exactly shaking in their boots. It is the most basic most barebones implementation of both lacking any real useful features that would make it usable in a production environment. But it is a good exercise in understanding the language and its capabilities. Mainly because it involves a few different aspects of programming that are relevant to systems programming. It involves data structures, memory management, concurrency and networking. And even if you create something as basic and lacking as I have in this example, you still have room for experimentation and exploration to add more features. ### Building a Tiny KV Store With Pub/Sub My initial minimal proof of concept was simple and straightforward. ```odin package main import "core:fmt" import "core:time" KVStore :: struct { store: map[string]string, } kvstore_init :: proc() -> KVStore { return KVStore{store = map[string]string{}} } kv_put :: proc(kv: ^KVStore, key: string, value: string) { kv.store[key] = value } kv_get :: proc(kv: ^KVStore, key: string) -> string { if value, ok := kv.store[key]; ok { return value } return "" } PubSub :: struct { subscribers: map[string][]proc(msg: string), } pubsub_init :: proc() -> PubSub { return PubSub{subscribers = map[string][]proc(msg: string){}} } subscribe :: proc(ps: ^PubSub, topic: string, handler: proc(msg: string)) { if arr, ok := ps.subscribers[topic]; ok { new_arr := make([]proc(msg: string), len(arr)+1); for i in 0..<len(arr) { new_arr[i] = arr[i]; } new_arr[len(arr)] = handler; ps.subscribers[topic] = new_arr; } else { ps.subscribers[topic] = []proc(msg: string){handler}; } } publish :: proc(ps: ^PubSub, topic: string, msg: string) { if handlers, ok := ps.subscribers[topic]; ok { for handler in handlers { handler(msg); } } } kv: KVStore; main :: proc() { kv = kvstore_init(); ps := pubsub_init(); handler1 :: proc(msg: string) { fmt.println("Sub1 got:", msg); kv_put(&kv, "last_msg", msg); } handler2 :: proc(msg: string) { fmt.println("Sub2 got:", msg); } handler3 :: proc(msg: string) { fmt.println("Sub3 got:", msg); } subscribe(&ps, "demo", handler1); subscribe(&ps, "demo", handler2); subscribe(&ps, "demo", handler3); publish(&ps, "demo", "Welcome to dayvster.com"); time.sleep(2 * time.Second); publish(&ps, "demo", "Here's another message after 2 seconds"); last := kv_get(&kv, "last_msg"); fmt.println("Last in kvstore:", last); } ``` As you can see it currently lacks any real error handling, concurrency and persistence. But it does demonstrate the basic functionality of a key-value store with pub/sub capabilities. What I have done is created two main structures, `KVStore` and `PubSub`. The `KVStore` structure contains a map to store key-value pairs and provides functions to put and get values. The `PubSub` structure contains a map of subscribers for different topics and provides functions to subscribe to topics and publish messages. The `main` function initializes the key-value store and pub/sub system, defines a few handlers for incoming messages, subscribes them to a topic, and then publishes some messages to demonstrate the functionality. From this basic example we've explored how to handle memory management in Odin, how to work with data structures like maps and slices, and how to define and use procedures. ### Memory Management Like C and Zig, Odin employs manual memory management, but it offers user-friendly utilities to streamline the process, much like Zig, in contrast to C’s more rudimentary approach. For instance, the `make` function in Odin enables the creation of slices with a defined length and capacity, akin to Zig’s slice allocation. In the code above, `make([]proc(msg: string), len(arr)+1)` generates a slice of procedure pointers with a length of `len(arr)+1`. Essentially, it allocates memory on the heap and returns a slice header, which includes a pointer to the allocated memory, along with the length and capacity of the slice. **but how and when is that memory freed?** In this code, memory allocated by `make` (e.g., for the slice in `subscribe`) and for maps (e.g., `kv.store` and `ps.subscribers`) is not explicitly freed. Since this is a short-lived program, the memory is reclaimed by the operating system when the program exits. However, in a long-running application, you’d need to use Odin’s delete procedure to free slices and maps explicitly. For example: ```odin kvstore_deinit :: proc(kv: ^KVStore) { delete(kv.store); } pubsub_deinit :: proc(ps: ^PubSub) { for topic, handlers in ps.subscribers { delete(handlers); } delete(ps.subscribers); } ``` So let's add that in the `main` function before it exits to ensure we clean up properly: ```odin // ... existing code ... main :: proc() { // ... existing code ... pubsub_deinit(&ps); kvstore_deinit(&kv); } // end of main ``` Well would you look at that, we just added proper memory management to our tiny KV store with pub/sub system and all it took was a few lines of code. I'm still a huge fan of C but this does feel nice and clean, not to mention really readable and easy to understand. Is our code now perfect and fully memory safe? Not quite, it still needs error handling and thread safety(way later) for production use, but it’s a solid step toward responsible memory management. ### Adding concurrency Enhancing Pub/Sub with Concurrency in Odin To make our pub/sub system more realistic, we've introduced concurrency to the publish procedure using Odin's core:thread library. This allows subscribers to process messages simultaneously, mimicking real-world pub/sub behavior. Since handler1 modifies kv.store via kv_put, we've added a mutex to KVStore to ensure thread-safe access to the shared map. Here's how it works: - **Concurrent Execution with Threads**: The publish procedure now runs each handler in a separate thread created with thread.create. Each thread receives the handler and message via t.user_args, and thread.start kicks off execution. Threads are collected in a dynamic array (threads), which is cleaned up using defer delete(threads). The thread.join call ensures the program waits for all threads to finish, and thread.destroy frees thread resources. This setup enables handler1, handler2, and handler3 to process messages concurrently, with output order varying based on thread scheduling. - **Thread Safety with Mutex**: Since handler1 updates kv.store via kv_put, concurrent access could lead to race conditions, as Odin's maps aren't inherently thread-safe. To address this, a sync.Mutex is added to KVStore. The kv_put and kv_get procedures lock the mutex during map access, ensuring only one thread modifies or reads kv.store at a time. The mutex is initialized in kvstore_init and destroyed in kvstore_deinit. ```odin publish :: proc(ps: ^PubSub, topic: string, msg: string) { if handlers, ok := ps.subscribers[topic]; ok { threads := make([dynamic]^thread.Thread, 0, len(handlers)) defer delete(threads) // Allocate ThreadArgs for each handler thread_args := make([dynamic]^ThreadArgs, 0, len(handlers)) defer { for args in thread_args { free(args) } delete(thread_args) } for handler in handlers { msg_ptr := new(string) msg_ptr^ = msg t := thread.create(proc(t: ^thread.Thread) { handler := cast(proc(msg: string)) t.user_args[0] msg_ptr := cast(^string) t.user_args[1] handler(msg_ptr^) free(msg_ptr) }) t.user_args[0] = rawptr(handler) t.user_args[1] = rawptr(msg_ptr) thread.start(t) append(&threads, t) } for t in threads { thread.join(t) thread.destroy(t) } } } ``` This implementation adds concurrency by running each handler in its own thread, allowing parallel message processing. The mutex ensures thread safety for kv.store updates in handler1, preventing race conditions. Odin's core:thread library simplifies thread management, offering a clean, pthread-like experience. Odin’s threading feels like a bit like C’s pthreads but without the usual headache, and it’s honestly a breeze to read and write. For this demo, the mutex version keeps everything nice and tidy, However in a real application, you'd still want to consider more robust error handling and possibly a thread pool for efficiency and also some way to handle thread lifecycle and errors and so on... ## Adding Persistence I haven't added persistence to this code-block personally because I feel that would quickly spiral the demo that I wanted to keep simple and focused into something much more complex. But if you wanted to add persistence, you could use Odin's `core:file` library to read and write the `kv.store` map to a file. You would need to serialize the map to a string format (like `JSON` or `CSV`) when saving and deserialize it when loading. Luckily odin has `core:encoding/json` and `core:encoding/csv` libraries that can help with this. Which should at the very least make that step fairly trivial. So if you feel like it, give it a shot and let me know how it goes. Do note that this step is a lot harder than it may seem especially if you want to do it properly and performantly. ## Now to Compile and Run Now here's the thing the first time I ran `odin build .` I thought I messed up somewhere because, it basically took a split second and produced no output no warnings no nothing. But I did see that a binary was produced named after the folder I was in. So I ran it with ```bash ❯ ./kvpub Sub1 got: Welcome to dayvster.com Sub2 got: Welcome to dayvster.com Sub3 got: Welcome to dayvster.com Sub1 got: Here's another message after 2 seconds Sub2 got: Here's another message after 2 seconds Sub3 got: Here's another message after 2 seconds Last in kvstore: Here's another message after 2 seconds ``` And there you have it, a tiny key-value store with pub/sub capabilities built in Odin. That compiled bizarrely fast, in fact I used a util ([pulse](https://github.com/dayvster/pulse)) I wrote to benchmark processes and their execution time and it clocked in at a blazing 0.4 seconds to compile ```bash ❯ pulse --benchmark --cmd 'odin build .' --runs 3 ┌──────────────┬──────┬─────────┬─────────┬─────────┬───────────┬────────────┐ │ Command ┆ Runs ┆ Avg (s) ┆ Min (s) ┆ Max (s) ┆ Max CPU% ┆ Max RAM MB │ ╞══════════════╪══════╪═════════╪═════════╪═════════╪═══════════╪════════════╡ │ odin build . ┆ 3 ┆ 0.401 ┆ 0.401 ┆ 0.401 ┆ 0.00 ┆ 0.00 │ └──────────────┴──────┴─────────┴─────────┴─────────┴───────────┴────────────┘ ``` Well I couldn't believe that so I ran it again this time with `--runs 16` to get a better average and it still came in at a very respectable `0.45` (MAX) seconds. **OK that is pretty impressive.** but consistent maybe my tool is broken? I'm not infallible after all. So I re-confirmed it why `hyperfine` and it came out at: ```bash ❯ hyperfine "odin build ." Benchmark 1: odin build . Time (mean ± σ): 385.1 ms ± 12.5 ms [User: 847.1 ms, System: 354.6 ms] Range (min … max): 357.3 ms … 400.1 ms 10 runs ``` God damn that is fast, now I know the program is tiny and simple but still that is impressive and makes me wonder how it would handle a larger codebase. Please if you have any feedback or insights on this let me know I am really curious. just for sanitysake I also ran `time odin build .` and it came out at you've guessed it `0.4` seconds. ### Right so it's fast, but how's the experience? Well I have to say it was pretty smooth overall. The compiler is fast and the error messages are generally clear and helpful if not perhaps a bit... verbose for my taste **For example** I've intentionally introduced a simple typo in the `map` keyword and named is `masp` to showcase what I mean: ```bash ❯ odin build . /home/dave/Workspace/TMP/odinest/main.odin(44:31) Error: Expected an operand, got ] subscribers: masp[string][]proc(msg: string), ^ /home/dave/Workspace/TMP/odinest/main.odin(44:32) Syntax Error: Expected '}', got 'proc' subscribers: masp[string][]proc(msg: string), ^ /home/dave/Workspace/TMP/odinest/main.odin(44:40) Syntax Error: Expected ')', got ':' subscribers: masp[string][]proc(msg: string), ^ /home/dave/Workspace/TMP/odinest/main.odin(44:41) Syntax Error: Expected ';', got identifier subscribers: masp[string][]proc(msg: string), ^ ``` I chose specifically this map because I wanted to showcase how Odin handles errors when you try to build, it could simply say `Error: Unknown type 'masp'` but instead it goes on to produce 4 separate errors that all stem from the same root cause. This is obviously because the parser gets confused and can't make sense of the code anymore. So essentially you get every single error that results from the initial mistake even if they are on the same line. Now would I love to see them condensed into a single error message? Because it stems from the same line and the same root cause? Yes I would. But that's just my personal preference. ## Where Odin Shines ### Simplicity and Readability Odin kinda feels like a modernized somehow even more boring C but in the best way possible. It's simple, straightforward and easy to read. It does not try to have some sort of clever syntax or fancy features, it really feels like a no-nonsense no frills language that wants you to start coding and being productive as quickly as possible. In fact this brings me to my next point. ### The Built in Libraries Galore I was frankly blown away with just how much is included in the standard and vendored(more on that later) libraries. I mean it has everything you'd expect from a modern systems programming language but it also comes with a ton of complete data structures, algorithms and utilities that you would usually have to rely on third-party libraries for in C or even Zig. For more info just look at [Odin's Core Library](https://pkg.odin-lang.org/core/) and I mean really look at it and read it do not just skim it. Here's an example [flags](https://pkg.odin-lang.org/core/flags/) which is a complete command line argument parser, or even [rbtree](https://pkg.odin-lang.org/core/container/rbtree/) which is a complete implementation of a red-black tree data structure that you can just import and use right away But what really blew me away was ### The Built in Vendor Libraries / Packages Odin comes with a set of vendor libraries that basically give you useful bindings to stuff like `SDL2/3`, `OpenGL`, `Vulkan`, `Raylib`, `DirectX` and more. This is really impressive because it means you can start building games or graphics applications right away without having to worry about setting up bindings or dealing with C interop. Now I'm not super sure if these vendor bindings are all maintained and created by the Odin team from what I could gather so far, it would certainly seem so but I could be wrong. If you know more about this please let me know. But all that aside these bindings are really well done and easy to use. For example here's how you can create a simple window with SDL2 in Odin: ```odin package main import sdl "vendor:sdl2" main :: proc() { sdl.Init(sdl.INIT_VIDEO) defer sdl.Quit() window := sdl.CreateWindow( "Odin SDL2 Black Window", sdl.WINDOWPOS_CENTERED, sdl.WINDOWPOS_CENTERED, 800, 600, sdl.WINDOW_SHOWN, ) defer sdl.DestroyWindow(window) renderer := sdl.CreateRenderer(window, -1, sdl.RENDERER_ACCELERATED) defer sdl.DestroyRenderer(renderer) event: sdl.Event running := true for running { for sdl.PollEvent(&event) { if event.type == sdl.EventType.QUIT { running = false } } sdl.SetRenderDrawColor(renderer, 0, 0, 0, 255) sdl.RenderClear(renderer) sdl.RenderPresent(renderer) } } ``` This code creates a simple window with a black background using SDL2. It's pretty straightforward and easy to understand, especially if you're already familiar with SDL2 or SDL3. ### C Interop Odin makes it trivially easy to interop with C libraries, as long as that. This is done via their `foreign import` where you'd create an import name and link to the library file and `foreign` blocks to link to declared individual function or types. I could explain it with examples here but Odin's own documentation does a way better job and will keep this post from getting even longer than it already is. So please check out [Odin's C interop](https://odin-lang.org/news/binding-to-c/) documentation for more info. ## Where Odin Feels Awkward ### Standard Library Gaps While Odin's standard library is quite comprehensive, there are still some gaps and missing features that can make certain tasks more cumbersome. For example, while it has basic file I/O capabilities, it lacks more advanced features like file watching or asynchronous I/O. Additionally, while it has a decent set of data structures, it lacks some more specialized ones like tries or bloom filters I'd also love to see a b+ tree implementation in the core library. But those are at most nitpicks and finding third-party libraries or writing your own implementations is usually straightforward. However... ### No Package Manager I really like languages that come with their own package manager, it makes it so much easier to discover, install and manage third-party libraries / dependencies. Odin currently lacks a built-in package manager, which means you have to manually download and include third-party libraries in your projects. This can be a bit of a hassle, especially I'd imagine for larger projects with multiple dependencies. ### Smaller Nitpicks - **dir inconsistencies**: I love how it auto named my binary after the folder I was in but I wish it did the same whenever I ran `odin run` and `odin build` I had to explicitly specify `odin run .` and `odin build .` that felt a bit inconsistent to me because if it knows the folder we are in why not just use that as the default value when we wanna tell it to run or build in the current directory? - **Error messages**: As mentioned earlier, while Odin's error messages are generally clear, they can sometimes be overly verbose, especially when multiple errors stem from a single root cause. It would be nice to see more concise error reporting in such cases. So to fix this I'd love to either see error messages collapsed into a single message with an array of messages from the same line, or somehow grouped together into blocks. ### Pointers are ^ and not * I'm on a German keyboard and the `^` character is a bit of a pain to type, especially when compared to the `*` character which is right next to the `Enter` key on my keyboard. I get that Odin wants to differentiate itself from C and C++ but this small change feels unnecessary and adds a bit of friction to the coding experience. These are as the title says just minor nitpicks and in no way detract from the overall experience of using Odin, just minor annoyances that I personally had while using the language your experience may differ vastly and none of these may even bother you. ## So is Odin just a More Boring C? In a way, yes kind of. I mean it's very similar in approach and philosophy but with more "guard rails" and helpful utilities to make the experience smoother and more enjoyable and the what I so far assume are first party bindings to popular libraries via the vendors package really makes it stand out in a great way, where you get a lot more consistency and predictability than you would if you were to use C with those same libraries. And I guess that's the strength of Odin, it's so boring that it just let's you be a productive programmer without getting in your way or trying to be too clever or fancy. I use boring here in an affectionate way, if you've ever read any of my other posts you'll know that I do not appreciate complexity and unnecessary cleverness in programming which is why I suppose I'm often quite critical of rust even though I do like it for certain use cases. In this case I'd say Odin is very similar to Go both are fantastic boring languages that let you get stuff done without much fuss or hassle. The only difference is that Go decided to ship with a garbage collector and Odin did not, which honestly for me personally makes Odin vastly more appealing. ### Syntax and Ergonomics Odin’s syntax is like C with a modern makeover clean, readable, and less prone to boilerplate. It did take me quite a while to get used to replacing my muscle memory for `*` with `^` for pointers and `func`, `fn`, `fun`, `function` with `proc` for functions. But once I got over that initial hump, it felt pretty natural. Also `::` for type declarations is a bit unusual and took me longer than I care to admit, as I'm fairly used to `::` being used for scope resolution in languages like C++ and Rust. But again, once I got used to it, it felt fine. Everything else about the syntax felt pretty intuitive and straightforward. ## Who Odin Might Be Right For ### Ideal Use Cases - **Game Development**: Honestly I totally see where people are coming from when they say Odin is great for game development. The built-in vendor libraries for SDL2/3, OpenGL, Vulkan, Raylib and more make it super easy to get started with game development. Plus the language's performance and low-level capabilities are a great fit for the demands of game programming. - **Systems Programming**: Odin's manual memory management, low-level access, and performance make it a solid choice for systems programming tasks like writing operating systems, device drivers, or embedded systems. I will absolutely be writing some utilities for my Linux setup in Odin in the near future. - **Desktop Applications**: Again this is where those vendor libraries shine, making it easy to build cross-platform desktop applications with graphical interfaces as long as you're fine with doing some manual drawing of components, I'd love to see a binding for something like `GTK` or `Qt` in the vendor packages in the future. - **General Purpose Programming**: This brings me back to my intro where I said that it took me a while to realize that if Odin is good for game development then realistically by all means it should basically be good for anything and everything you wish to create with it. So yea give it a shot make something cool with it. ### Where It’s Not a Good Fit Yet - **Web Development**: The Net library is pretty darn nice and very extensive, however it does seem like it's maybe a bit more fit for general networking tasks rather than simplifying your life as a web backend developer. I'm sure there's already a bunch of third party libraries for this, but if you're a web dev you are almost spoiled for choice at the moment by languages that support web development out of the box with all the fixings and doodads. ## Final Thoughts ### Would I Use It Again? Absolutely in fact I will, I've already started planning some small utilities for my Linux setup in Odin. I really like the simplicity and readability of the language, as well as the comprehensive standard and vendor libraries. The performance is also impressive, especially the fast compile times. ### Source Code and Feedback You can find the complete source code for the tiny key-value store with pub/sub capabilities on my GitHub: [dayvster/odin-kvpub](https://github.com/dayvster/odin-kvpubsub) If you create anything cool with it I'd love to see it so do hit me up on any of my socials. I'd love to hear your thoughts and experiences with Odin, whether you've used it before or are considering giving it a try. Feel free to leave a comment or reach out to me on Twitter [@dayvster](https://twitter.com/dayvsterdev). Appreciate the time you took to read this post, and happy coding!

1 views
Nick Khami 2 weeks ago

Use the Accept Header to serve Markdown instead of HTML to LLMs

Agents don't need to see websites with markup and styling; anything other than plain Markdown is just wasted money spent on context tokens. I decided to make my Astro sites more accessible to LLMs by having them return Markdown versions of pages when the header has or preceding . This was very heavily inspired by this post on X from bunjavascript . Hopefully this helps SEO too, since agents are a big chunk of my traffic. The Bun team reported a 10x token drop for Markdown and frontier labs pay per token, so cheaper pages should get scraped more, be more likely to end up in training data, and give me a little extra lift from assistants and search. Note: You can check out the feature live by running or in your terminal. Static site generators like Astro and Gatsby already generate a big folder of HTML files, typically in a or folder through an command. The only thing missing is a way to convert those HTML files to markdown. It turns out there's a great CLI tool for this called html-to-markdown that can be installed with and run during a build step using . Here's a quick Bash script an LLM wrote to convert all HTML files in to Markdown files in , preserving the directory structure: Once you have the conversion script in place, the next step is to make it run as a post-build action. Here's an example of how to modify your scripts section: Moving all HTML files to first is only necessary if you're using Cloudflare Workers, which will serve existing static assets before falling back to your Worker. If you're using a traditional reverse proxy, you can skip that step and just convert directly from to . Note: I learned after I finished the project that I could have added to my so I didn't have to move any files around. That field forces the worker to always run frst. Shoutout to the kind folks on reddit for telling me. I pushed myself to go out of my comfort zone and learn Cloudflare Workers for this project since my company uses them extensively. If you're using a traditional reverse proxy like Nginx or Caddy, you can skip this section (and honestly, you'll have a much easier time). If you're coming from traditional reverse proxy servers, Cloudflare Workers force you into a different paradigm. What would normally be a simple Nginx or Caddy rule becomes custom configuration, moving your entire site to a shadow directory so Cloudflare doesn't serve static assets by default, writing JavaScript to manually check headers and using to serve files. SO MANY STEPS TO MAKE A SIMPLE FILE SERVER! This experience finally made Next.js 'middleware' click for me. It's not actually middleware in the traditional sense of a REST API; it's more like 'use this where you would normally have a real reverse proxy.' Both Cloudflare Workers and Next.js Middleware are essentially JavaScript-based reverse proxies that intercept requests before they hit your application. While I'd personally prefer Terraform with a hyperscaler or a VPS for a more traditional setup, new startups love this pattern, so it's worth understanding. Here's an example of a working file to refer to a new worker script and also bind your build output directory as a static asset namespace: Below is a minimal worker script that inspects the header and serves markdown when requested, otherwise falls back to HTML: Pro tip: make the root path serve your sitemap.xml instead of markdown content for your homepage such that an agent visiting your root URL can see all the links on your site. It's likely much easier to set this system up with a traditional reverse proxy file server like Caddy or Nginx. Here's a simple Caddyfile configuration that does the same thing: I will leave Nginx configuration as an exercise for the reader or perhaps the reader's LLM of choice. By serving lean, semantic Markdown to LLM agents, you can achieve a 10x reduction in token usage while making your content more accessible and efficient for the AI systems that increasingly browse the web. This optimization isn't just about saving money; it's about GEO (Generative Engine Optimization) for a changed world where millions of users discover content through AI assistants. Astro's flexibility made this implementation surprisingly straightforward. It only took me a couple of hours to get both the personal blog you're reading now and patron.com to support this feature. If you're ready to make your site agent-friendly, I encourage you to try this out. For a fun exercise, copy this article's URL and ask your favorite LLM to "Use the blog post to write a Cloudflare Worker for my own site." See how it does! You can also check out the source code for this feature at github.com/skeptrunedev/personal-site to get started. I'm excited to see the impact of this change on my site's analytics and hope it inspires others. If you implement this on your own site, I'd love to hear about your experience! Connect with me on X or LinkedIn .

1 views
iDiallo 1 months ago

Which LLM Should I Use as a Developer?

Early in my career, I worked alongside a seasoned C programmer who had finally embraced web development. The company had acquired a successful website built in Perl by someone who, while brilliant, wasn't entirely technical . The codebase was a fascinating mess. commands interwoven with HTML, complex business logic, database calls, and conditional statements all tangled together in ways that would make you want to restart from scratch! Then came the JavaScript requirements. Whenever frontend interactions needed to be added, animations, form validations, dynamic content, I became the go-to person. Despite his deep understanding of programming fundamentals, JavaScript simply didn't click for this experienced developer. The event-driven nature, the prototype-based inheritance, the asynchronous callbacks? They made no sense to him. I was young when I picked up JavaScript, still malleable enough to wrap my head around its quirks. Looking back, I sometimes wonder: if my introduction to programming had been through React or Vue rather than simple loops and conditionals, I would have chosen a different career path entirely. Fast-forward to today, and that same experienced programmer would likely sail through JavaScript challenges without breaking a sweat. Large Language Models have fundamentally changed the game for developers working with unfamiliar languages and frameworks. LLMs can scaffold entire functions, components, or even small applications in languages you've never touched. More importantly, if you're not lazy about it , you can read through the generated code and understand the patterns and idioms of that language. When you inevitably write buggy code in an unfamiliar syntax, LLMs excel at debugging and explaining what went wrong. They're like having a patient mentor who never gets tired of explaining the same concept in different ways. For example, I'm not very good with the awk command line tool, but I can write what I want in JavaScript. So I would often write how I want to parse content in JavaScript and ask an LLM to convert it to awk. Let's say I want to extract just the user agent from Apache log files. In JavaScript, I might think about it like this: The LLM converts this to: It uses the quote character as a field separator and prints the second-to-last field (since the last field after the final quote is empty). This kind of translation between mental models is where LLMs truly shine. Until recently, new developers would approach me with the question: "What's the best programming language to learn?" My response was always pragmatic: "It depends on what you're trying to build. It doesn't really matter." But they'd follow up with their real opinion: "I think Python is the best." The concept of a "best programming language" never entirely made sense to me. I might love Python's elegance, but I still need JavaScript for frontend work, SQL for databases, and bash for deployment scripts. The job dictates the tool, not the other way around. Today, something fascinating is happening. Developers aren't asking about the best programming language anymore. Instead, I hear: "Which is the best LLM for coding?" My answer remains the same: It doesn't really matter. The internet is awash with benchmarks, coding challenges, and elaborate metrics attempting to crown the ultimate coding LLM. These tests typically focus on "vibe coding". They try to generate complete solutions to isolated problems from scratch. If that's your primary use case, fine. But in my experience building real applications that serve actual users, I've rarely found opportunities to generate entire projects. Instead, I see myself asking questions to old pieces of code to figure out why the original developer made a decision to implement a function one way. I generate util functions with an LLM, I ask to generate those extremely annoying TypeScript interfaces (Life is too short to manually write that out). To my knowledge, all LLMs can perform these tasks at an acceptable level that I can immediately test and validate. You don't need AGI for this. After programming for over three decades, I've learned that picking up new languages isn't particularly challenging anymore. The fundamental goal of making computers do useful things, remains constant. What changes is syntax, idioms, and the philosophical approaches different language communities embrace. But when you're starting out, it genuinely feels like there must be one "optimal" language to learn. This illusion persists because beginners conflate syntax familiarity with programming competence. The truth is more nuanced. Even expert C programmers can struggle with JavaScript, not because they lack skill, but because each language embodies different mental models. The barriers between languages are dissolving, and the cost of experimenting with new technologies is approaching zero. The LLM you choose matters far less than developing the judgment to evaluate the code it produces. Pick any reputable LLM, learn to prompt it effectively, and focus on building things that matter. The rest is just syntax. Don't start with a language; start with a problem to be solved. — Matt Mullenweg

0 views
Karboosx 1 months ago

Continuous Delivery - The easy way

Skip the complex setups. Here's how to build a simple CD pipeline for your website using nothing but a GitHub webhook and a bash script.

0 views
Armin Ronacher 1 months ago

Your MCP Doesn’t Need 30 Tools: It Needs Code

I wrote a while back about why code performs better than MCP ( Model Context Protocol ) for some tasks. In particular, I pointed out that if you have command line tools available, agentic coding tools seem very happy to use those. In the meantime, I learned a few more things that put some nuance to this. There are a handful of challenges with CLI-based tools that are rather hard to resolve and require further examination. In this blog post, I want to present the (not so novel) idea that an interesting approach is using MCP servers exposing a single tool, that accepts programming code as tool inputs. The first and most obvious challenge with CLI tools is that they are sometimes platform-dependent, version-dependent, and at times undocumented. This has meant that I routinely encounter failures when using tools on first use. A good example of this is when the tool usage requires non-ASCII string inputs. For instance, Sonnet and Opus are both sometimes unsure how to feed newlines or control characters via shell arguments. This is unfortunate but ironically not entirely unique to shell tools either. For instance, when you program with C and compile it, trailing newlines are needed. At times, agentic coding tools really struggle with appending an empty line to the end of a file, and you can find some quite impressive tool loops to work around this issue. This becomes particularly frustrating when your tool is absolutely not in the training set and uses unknown syntax. In that case, getting agents to use it can become quite a frustrating experience. Another issue is that in some agents (Claude Code in particular), there is an extra pass taking place for shell invocations: the security preflight. Before executing a tool, Claude also runs it through the fast Haiku model to determine if the tool will do something dangerous and avoid the invocation. This further slows down tool use when multiple turns are needed. In general, doing multiple turns is very hard with CLI tools because you need to teach the agent how to manage sessions. A good example of this is when you ask it to use tmux for remote-controlling an LLDB session . It’s absolutely capable of doing it, but it can lose track of the state of its tmux session. During some tests, I ended up with it renaming the session halfway through, forgetting that it had a session (and thus not killing it). This is particularly frustrating because the failure case can be that it starts from scratch or moves on to other tools just because it got a small detail wrong. Unfortunately, when moving to MCP, you immediately lose the ability to compose without inference (at least today). One of the reasons lldb can be remote-controlled with tmux at all is that the agent manages to compose quite well. How does it do that? It uses basic tmux commands such as to send inputs or to get the output, which don’t require a lot of extra tooling. It then chains commands like and to ensure it doesn’t read output too early. Likewise, when it starts to fail with encoding more complex characters, it sometimes changes its approach and might even use . The command line really isn’t just one tool — it’s a series of tools that can be composed through a programming language: bash. The most interesting uses are when you ask it to write tools that it can reuse later. It will start composing large scripts out of these one-liners. All of that is hard with MCP today. It’s very clear that there are limits to what these shell tools can do. At some point, you start to fight those tools. They are in many ways only as good as their user interface, and some of these user interfaces are just inherently tricky. For instance, when evaluated, tmux performs better than GNU screen , largely because the command-line interface of tmux is better and less error-prone. But either way, it requires the agent to maintain a stateful session, and it’s not particularly good at this today. What is stateful out of the box, however, is MCP. One surprisingly useful way of running an MCP server is to make it an MCP server with a single tool (the ubertool) which is just a Python interpreter that runs with retained state . It maintains state in the background and exposes tools that the agent already knows how to use. I did this experiment in a few ways now, the one that is public is . It’s an MCP that exposes a single tool called . It is, however, in many ways a misnomer. It’s not really a tool — it’s a Python interpreter running out of a virtualenv that has installed. What is ? It is the Python port of the ancient command-line tool which allows one to interact with command-line programs through scripts. The documentation describes as a “program that ‘talks’ to other interactive programs according to a script.” What is special about is that it’s old, has a stable API, and has been used all over the place. You could wrap or with lots of different MCP tools like , , , and more. That’s because the class exposes 36 different API functions! That’s a lot. But many of these cannot be used in isolation well anyway. Take this motivating example from the docs: Even the most basic use here involves three chained tool calls. And that doesn’t include error handling, which one might also want to encode. So instead, a much more interesting way to have this entire thing run is to just have the command language to the MCP be Python. The MCP server turns into a stateful Python interpreter, and the tool just lets it send Python code that is evaluated with the same state as before. There is some extra support in the MCP server to make the experience more reliable (like timeout support), but for the most part, the interface is to just send Python code. In fact, the exact script from above is what an MCP client is expected to send. The tool description just says this: This works because the interface to the MCP is now not just individual tools it has never seen — it’s a programming language that it understands very well, with additional access to an SDK ( ) that it has also seen and learned all the patterns from. We’re relegating the MCP to do the thing that it does really well: session management and guiding the tool through a built-in prompt. More importantly, the code that it writes is very similar to what it might put into a reusable script. There is so little plumbing in the actual MCP that you can tell the agent after the session to write a reusable pexpect script from what it learned in the session. That works because all the commands it ran are just Python — they’re still in the context, and the lift from that to a reusable Python script is low. Now I don’t want to bore you too much with lots of Claude output, but I took a crashing demo app that Mario wrote and asked it to debug with LLDB through . Here is what that looked like: Afterwards I asked it to dump it into a reusable Python script to be run later: And from a fresh session we can ask it to execute it once more: That again works because the code it writes into the MCP is very close to the code that it would write into a Python script. And the difference is meaningful. The initial debug takes about 45 seconds on my machine and uses about 7 tool calls. The re-run with the dumped playbook takes one tool call and finishes in less than 5 seconds. Most importantly: that script is standalone. I can run it as a human, even without the MCP! Now the above example works beautifully because these models just know so much about . That’s hardly surprising in a way. So how well does this work when the code that it should write is entirely unknown to it? Well, not quite as well. However, and this is the key part, because the meta input language is Python, it means that the total surface area that can be exposed from an ubertool is pretty impressive. A general challenge with MCP today is that the more tools you have, the more you’re contributing to context rot. You’re also limited to rather low amounts of input. On the other hand, if you have an MCP that exposes a programming language, it also indirectly exposes a lot of functionality that it knows from its training. For instance, one of the really neat parts about this is that it knows , , , and other stuff. Heck, it even knows about . This means that you can give it very rudimentary instructions about how its sandbox operates and what it might want to do to learn more about what is available to it as needed. You can also tell it in the prompt that there is a function it can run to learn more about what’s available when it needs help! So when you build something that is completely novel, at least the programming language is known. You can, for instance, write a tiny MCP that dumps out the internal state of your application, provides basic query helpers for your database that support your sharding setup, or provides data reading APIs. It will discover all of this anyway from reading the code, but now it can also use a stateful Python or JavaScript session to run these tools and explore more. This is also a fun feature when you want to ask the agent to debug the MCP itself. Because Python and JavaScript are so powerful, you can, for instance, also ask it to debug the MCP’s state itself when something went wrong. The elephant in the room for all things agentic coding is security. Claude mostly doesn’t delete your machine and maybe part of that is the Haiku preflight security check. But isn’t all of this a sham anyway? I generally love to watch how Claude and other agents maneuver their way around protections in pretty creative ways. Clearly it’s potent and prompt-injectable. By building an MCP that just runs , we might be getting rid of some of the remaining safety here. But does it matter? We are seemingly okay with it writing code and running tests, which is the same kind of bad as running . I’m sure the day of reckoning will come for all of us, but right now we’re living in this world where protections don’t matter and we can explore what these things can do. I’m honestly not sure how to best protect these things. They are pretty special in that they are just inherently unsafe and impossible to secure. Maybe the way to really protect them would be to intercept every system call and have some sort of policy framework/sandbox around the whole thing. But even in that case, what prevents an ever more clever LLM from circumventing all these things? It has internet access, it can be prompt-injected, and all interfaces we have for them are just too low-level to support protection well. So to some degree, I think the tail risks of code execution are here to stay. But I would argue that they are not dramatically worse when the MCP executes Python code. In this particular case, consider that itself runs programs. There is little point in securing the MCP if what the MCP can run is any bash command. As interesting as the case is, that was not my original motivation. What I started to look into is replacing Playwright’s MCP with an MCP that just exposes the Playwright API via JavaScript. This is an experiment I have been running for a while, and the results are somewhat promising but also not promising enough yet. If you want to play with it, the MCP is called “ playwrightess ” is pretty simple. It just lets it execute JavaScript code against a sync playwright client. Same idea. Here, the tool usage is particularly nice because it gets down from ~30 tool definitions to 1: The other thing that is just much nicer about this approach is how many more ways it has to funnel data out. For instance from both the browser as well as the playwright script are forwarded back to the agent automatically. There is no need for the agent to ask for that information, it comes automatically. It also has a variable that it can use to accumulate extra information between calls which it liberally uses if you for instance ask it to collect data from multiple pages in a pagination. It can do that without any further inference, because the loop happens within JavaScript. Same with — you can easily get it to dump out a script for later that circumvents a lot of MCP calls with something it already saw. Particularly when you are debugging a gnarly issue and you need to restart the debugging more than once, that shows some promise. Does it perform better than Playwright MCP? Not in the current form, but I want to see if this idea can be taken further. It is quite verbose in the scripts that it writes, and it is not really well tuned between screenshots and text extraction.

0 views
Julia Evans 3 months ago

New zine: The Secret Rules of the Terminal

Hello! After many months of writing deep dive blog posts about the terminal, on Tuesday I released a new zine called “The Secret Rules of the Terminal”! You can get it for $12 here: https://wizardzines.com/zines/terminal , or get an 15-pack of all my zines here . Here’s the cover: Here’s the table of contents: I’ve been using the terminal every day for 20 years but even though I’m very confident in the terminal, I’ve always had a bit of an uneasy feeling about it. Usually things work fine, but sometimes something goes wrong and it just feels like investigating it is impossible, or at least like it would open up a huge can of worms. So I started trying to write down a list of weird problems I’ve run into in terminal and I realized that the terminal has a lot of tiny inconsistencies like: If you use the terminal daily for 10 or 20 years, even if you don’t understand exactly why these things happen, you’ll probably build an intuition for them. But having an intuition for them isn’t the same as understanding why they happen. When writing this zine I actually had to do a lot of work to figure out exactly what was happening in the terminal to be able to talk about how to reason about it. It turns out that the “rules” for how the terminal works (how do you edit a command you type in? how do you quit a program? how do you fix your colours?) are extremely hard to fully understand, because “the terminal” is actually made of many different pieces of software (your terminal emulator, your operating system, your shell, the core utilities like , and every other random terminal program you’ve installed) which are written by different people with different ideas about how things should work. So I wanted to write something that would explain: Terminal internals are a mess. A lot of it is just the way it is because someone made a decision in the 80s and now it’s impossible to change, and honestly I don’t think learning everything about terminal internals is worth it. But some parts are not that hard to understand and can really make your experience in the terminal better, like: When I wrote How Git Works , I thought I knew how Git worked, and I was right. But the terminal is different. Even though I feel totally confident in the terminal and even though I’ve used it every day for 20 years, I had a lot of misunderstandings about how the terminal works and (unless you’re the author of or something) I think there’s a good chance you do too. A few things I learned that are actually useful to me: As usual these days I wrote a bunch of blog posts about various side quests: A long time ago I used to write zines mostly by myself but with every project I get more and more help. I met with Marie Claire LeBlanc Flanagan every weekday from September to June to work on this one. The cover is by Vladimir Kašiković, Lesley Trites did copy editing, Simon Tatham (who wrote PuTTY ) did technical review, our Operations Manager Lee did the transcription as well as a million other things, and Jesse Luehrs (who is one of the very few people I know who actually understands the terminal’s cursed inner workings) had so many incredibly helpful conversations with me about what is going on in the terminal. Here are some links to get the zine again: As always, you can get either a PDF version to print at home or a print version shipped to your house. The only caveat is print orders will ship in August – I need to wait for orders to come in to get an idea of how many I should print before sending it to the printer.

0 views
Julia Evans 7 months ago

Standards for ANSI escape codes

Hello! Today I want to talk about ANSI escape codes. For a long time I was vaguely aware of ANSI escape codes (“that’s how you make text red in the terminal and stuff”) but I had no real understanding of where they were supposed to be defined or whether or not there were standards for them. I just had a kind of vague “there be dragons” feeling around them. While learning about the terminal this year, I’ve learned that: So I wanted to put together a list for myself of some standards that exist around escape codes, because I want to know if they have to feel unreliable and frustrating, or if there’s a future where we could all rely on them with more confidence. Have you ever pressed the left arrow key in your terminal and seen ? That’s an escape code! It’s called an “escape code” because the first character is the “escape” character, which is usually written as , , , , or . Escape codes are how your terminal emulator communicates various kinds of information (colours, mouse movement, etc) with programs running in the terminal. There are two kind of escape codes: Now let’s talk about standards! The first standard I found relating to escape codes was ECMA-48 , which was originally published in 1976. ECMA-48 does two things: The formats are extensible, so there’s room for others to define more escape codes in the future. Lots of escape codes that are popular today aren’t defined in ECMA-48: for example it’s pretty common for terminal applications (like vim, htop, or tmux) to support using the mouse, but ECMA-48 doesn’t define escape codes for the mouse. There are a bunch of escape codes that aren’t defined in ECMA-48, for example: I believe (correct me if I’m wrong!) that these and some others came from xterm, are documented in XTerm Control Sequences , and have been widely implemented by other terminal emulators. This list of “what xterm supports” is not a standard exactly, but xterm is extremely influential and so it seems like an important document. In the 80s (and to some extent today, but my understanding is that it was MUCH more dramatic in the 80s) there was a huge amount of variation in what escape codes terminals actually supported. To deal with this, there’s a database of escape codes for various terminals called “terminfo”. It looks like the standard for terminfo is called X/Open Curses , though you need to create an account to view that standard for some reason. It defines the database format as well as a C library interface (“curses”) for accessing the database. For example you can run this bash snippet to see every possible escape code for “clear screen” for all of the different terminals your system knows about: On my system (and probably every system I’ve ever used?), the terminfo database is managed by ncurses. I think it’s interesting that there are two main approaches that applications take to handling ANSI escape codes: Some examples of programs/libraries that take approach #2 (“don’t use terminfo”) include: I got curious about why folks might be moving away from terminfo and I found this very interesting and extremely detailed rant about terminfo from one of the fish maintainers , which argues that: [the terminfo authors] have done a lot of work that, at the time, was extremely important and helpful. My point is that it no longer is. I’m not going to do it justice so I’m not going to summarize it, I think it’s worth reading. I was just talking about the idea that you can use a “common set” of escape codes that will work for most people. But what is that set? Is there any agreement? I really do not know the answer to this at all, but from doing some reading it seems like it’s some combination of: and maybe ultimately “identify the terminal emulators you think your users are going to use most frequently and test in those”, the same way web developers do when deciding which CSS features are okay to use I don’t think there are any resources like Can I use…? or Baseline for the terminal though. (in theory terminfo is supposed to be the “caniuse” for the terminal but it seems like it often takes 10+ years to add new terminal features when people invent them which makes it very limited) I also asked on Mastodon why people found terminfo valuable in 2025 and got a few reasons that made sense to me: The way that ncurses uses the environment variable to decide which escape codes to use reminds me of how webservers used to sometimes use the browser user agent to decide which version of a website to serve. It also seems like it’s had some of the same results – the way iTerm2 reports itself as being “xterm-256color” feels similar to how Safari’s user agent is “Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.3 Safari/605.1.15”. In both cases the terminal emulator / browser ends up changing its user agent to get around user agent detection that isn’t working well. On the web we ended up deciding that user agent detection was not a good practice and to instead focus on standardization so we can serve the same HTML/CSS to all browsers. I don’t know if the same approach is the future in the terminal though – I think the terminal landscape today is much more fragmented than the web ever was as well as being much less well funded. A few more documents and standards related to escape codes, in no particular order: I sometimes see people saying that the unix terminal is “outdated”, and since I love the terminal so much I’m always curious about what incremental changes might make it feel less “outdated”. Maybe if we had a clearer standards landscape (like we do on the web!) it would be easier for terminal emulator developers to build new features and for authors of terminal applications to more confidently adopt those features so that we can all benefit from them and have a richer experience in the terminal. Obviously standardizing ANSI escape codes is not easy (ECMA-48 was first published almost 50 years ago and we’re still not there!). I don’t even know what all of the challenges are. But the situation with HTML/CSS/JS used to be extremely bad too and now it’s MUCH better, so maybe there’s hope.

0 views
Julia Evans 8 months ago

How to add a directory to your PATH

I was talking to a friend about how to add a directory to your PATH today. It’s something that feels “obvious” to me since I’ve been using the terminal for a long time, but when I searched for instructions for how to do it, I actually couldn’t find something that explained all of the steps – a lot of them just said “add this to ”, but what if you’re not using bash? What if your bash config is actually in a different file? And how are you supposed to figure out which directory to add anyway? So I wanted to try to write down some more complete directions and mention some of the gotchas I’ve run into over the years. Here’s a table of contents: If you’re not sure what shell you’re using, here’s a way to find out. Run this: Also bash is the default on Linux and zsh is the default on Mac OS (as of 2024). I’ll only cover bash, zsh, and fish in these directions. Bash has three possible config files: , , and . If you’re not sure which one your system is set up to use, I’d recommend testing this way: (there are a lot of elaborate flow charts out there that explain how bash decides which config file to use but IMO it’s not worth it to internalize them and just testing is the fastest way to be sure) Let’s say that you’re trying to install and run a program called and it doesn’t work, like this: How do you find what directory is in? Honestly in general this is not that easy – often the answer is something like “it depends on how npm is configured”. A few ideas: Once you’ve found a directory you think might be the right one, make sure it’s actually correct! For example, I found out that on my machine, is in . I can make sure that it’s the right directory by trying to run the program in that directory like this: It worked! Now that you know what directory you need to add to your , let’s move to the next step! Now we have the 2 critical pieces of information we need: Now what you need to add depends on your shell: bash instructions: Open your shell’s config file, and add a line like this: (obviously replace with the actual directory you’re trying to add) zsh instructions: You can do the same thing as in bash, but zsh also has some slightly fancier syntax you can use if you prefer: fish instructions: In fish, the syntax is different: (in fish you can also use , some notes on that further down ) Now, an extremely important step: updating your shell’s config won’t take effect if you don’t restart it! Two ways to do this: I’ve found that both of these usually work fine. And you should be done! Try running the program you were trying to run and hopefully it works now. If not, here are a couple of problems that you might run into: If the wrong version of a program is running, you might need to add the directory to the beginning of your PATH instead of the end. For example, on my system I have two versions of installed, which I can see by running : The one your shell will use is the first one listed . If you want to use the Homebrew version, you need to add that directory ( ) to the beginning of your PATH instead, by putting this in your shell’s config file (it’s instead of the usual ) or in fish: All of these directions only work if you’re running the program from your shell . If you’re running the program from an IDE, from a GUI, in a cron job, or some other way, you’ll need to add the directory to your PATH in a different way, and the exact details might depend on the situation. in a cron job Some options: I’m honestly not sure how to handle it in an IDE/GUI because I haven’t run into that in a long time, will add directions here if someone points me in the right direction. If you edit your path and start a new shell by running (or , or ), you’ll often end up with duplicate entries, because the shell keeps adding new things to your every time you start your shell. Personally I don’t think I’ve run into a situation where this kind of duplication breaks anything, but the duplicates can make it harder to debug what’s going on with your if you’re trying to understand its contents. Some ways you could deal with this: How to deduplicate your is shell-specific and there isn’t always a built in way to do it so you’ll need to look up how to accomplish it in your shell. Here’s a situation that’s easy to get into in bash or zsh: This happens because in bash, by default, history is not saved until you exit the shell. Some options for fixing this: When you install (Rust’s installer) for the first time, it gives you these instructions for how to set up your PATH, which don’t mention a specific directory at all. The idea is that you add that line to your shell’s config, and their script automatically sets up your (and potentially other things) for you. This is pretty common (for example Homebrew suggests you eval ), and there are two ways to approach this: I don’t think there’s anything wrong with doing what the tool suggests (it might be the “best way”!), but personally I usually use the second approach because I prefer knowing exactly what configuration I’m changing. fish has a handy function called that you can run to add a directory to your like this: This is cool (it’s such a simple command!) but I’ve stopped using it for a couple of reasons: Hopefully this will help some people. Let me know (on Mastodon or Bluesky) if you there are other major gotchas that have tripped you up when adding a directory to your PATH, or if you have questions about this post!

0 views
Julia Evans 9 months ago

What's involved in getting a "modern" terminal setup?

Hello! Recently I ran a terminal survey and I asked people what frustrated them. One person commented: There are so many pieces to having a modern terminal experience. I wish it all came out of the box. My immediate reaction was “oh, getting a modern terminal experience isn’t that hard, you just need to….”, but the more I thought about it, the longer the “you just need to…” list got, and I kept thinking about more and more caveats. So I thought I would write down some notes about what it means to me personally to have a “modern” terminal experience and what I think can make it hard for people to get there. Here are a few things that are important to me, with which part of the system is responsible for them: There are a million other terminal conveniences out there and different people value different things, but those are the ones that I would be really unhappy without. My basic approach is: A few things that affect my approach: What if you want a nice experience, but don’t want to spend a lot of time on configuration? Figuring out how to configure vim in a way that I was satisfied with really did take me like ten years, which is a long time! My best ideas for how to get a reasonable terminal experience with minimal config are: Personally I wouldn’t use xterm, rxvt, or Terminal.app as a terminal emulator, because I’ve found in the past that they’re missing core features (like 24-bit colour in Terminal.app’s case) that make the terminal harder to use for me. I don’t want to pretend that getting a “modern” terminal experience is easier than it is though – I think there are two issues that make it hard. Let’s talk about them! bash and zsh are by far the two most popular shells, and neither of them provide a default experience that I would be happy using out of the box, for example: And even though I love fish , the fact that it isn’t POSIX does make it hard for a lot of folks to make the switch. Of course it’s totally possible to learn how to customize your prompt in bash or whatever, and it doesn’t even need to be that complicated (in bash I’d probably start with something like , or maybe use starship ). But each of these “not complicated” things really does add up and it’s especially tough if you need to keep your config in sync across several systems. An extremely popular solution to getting a “modern” shell experience is oh-my-zsh . It seems like a great project and I know a lot of people use it very happily, but I’ve struggled with configuration systems like that in the past – it looks like right now the base oh-my-zsh adds about 3000 lines of config, and often I find that having an extra configuration system makes it harder to debug what’s happening when things go wrong. I personally have a tendency to use the system to add a lot of extra plugins, make my system slow, get frustrated that it’s slow, and then delete it completely and write a new config from scratch. In the terminal survey I ran recently, the most popular terminal text editors by far were , , and . I think the main options for terminal text editors are: The last issue is that sometimes individual programs that I use are kind of annoying. For example on my Mac OS machine, doesn’t support the keyboard shortcut. Fixing this to get a reasonable terminal experience in SQLite was a little complicated, I had to: I find that debugging application-specific issues like this is really not easy and often it doesn’t feel “worth it” – often I’ll end up just dealing with various minor inconveniences because I don’t want to spend hours investigating them. The only reason I was even able to figure this one out at all is that I’ve been spending a huge amount of time thinking about the terminal recently. A big part of having a “modern” experience using terminal programs is just using newer terminal programs, for example I can’t be bothered to learn a keyboard shortcut to sort the columns in , but in I can just click on a column heading with my mouse to sort it. So I use htop instead! But discovering new more “modern” command line tools isn’t easy (though I made a list here ), finding ones that I actually like using in practice takes time, and if you’re SSHed into another machine, they won’t always be there. Something I find tricky about configuring my terminal to make everything “nice” is that changing one seemingly small thing about my workflow can really affect everything else. For example right now I don’t use tmux. But if I needed to use tmux again (for example because I was doing a lot of work SSHed into another machine), I’d need to think about a few things, like: and probably more things I haven’t thought of. “Using tmux means that I have to change how I manage my colours” sounds unlikely, but that really did happen to me and I decided “well, I don’t want to change how I manage colours right now, so I guess I’m not using that feature!”. It’s also hard to remember which features I’m relying on – for example maybe my current terminal does have OSC 52 support and because copying from tmux over SSH has always Just Worked I don’t even realize that that’s something I need, and then it mysteriously stops working when I switch terminals. Personally even though I think my setup is not that complicated, it’s taken me 20 years to get to this point! Because terminal config changes are so likely to have unexpected and hard-to-understand consequences, I’ve found that if I change a lot of terminal configuration all at once it makes it much harder to understand what went wrong if there’s a problem, which can be really disorienting. So I usually prefer to make pretty small changes, and accept that changes can might take me a REALLY long time to get used to. For example I switched from using to eza a year or two ago and while I like it (because prints human-readable file sizes by default) I’m still not quite sure about it. But also sometimes it’s worth it to make a big change, like I made the switch to fish (from bash) 10 years ago and I’m very happy I did. Trying to explain how “easy” it is to configure your terminal really just made me think that it’s kind of hard and that I still sometimes get confused. I’ve found that there’s never one perfect way to configure things in the terminal that will be compatible with every single other thing. I just need to try stuff, figure out some kind of locally stable state that works for me, and accept that if I start using a new tool it might disrupt the system and I might need to rethink things.

0 views
Julia Evans 10 months ago

"Rules" that terminal programs follow

Recently I’ve been thinking about how everything that happens in the terminal is some combination of: The first three (your operating system, shell, and terminal emulator) are all kind of known quantities – if you’re using bash in GNOME Terminal on Linux, you can more or less reason about how how all of those things interact, and some of their behaviour is standardized by POSIX. But the fourth one (“whatever program you happen to be running”) feels like it could do ANYTHING. How are you supposed to know how a program is going to behave? This post is kind of long so here’s a quick table of contents: As far as I know, there are no real standards for how programs in the terminal should behave – the closest things I know of are: But even though there are no standards, in my experience programs in the terminal behave in a pretty consistent way. So I wanted to write down a list of “rules” that in my experience programs mostly follow. My goal here isn’t to convince authors of terminal programs that they should follow any of these rules. There are lots of exceptions to these and often there’s a good reason for those exceptions. But it’s very useful for me to know what behaviour to expect from a random new terminal program that I’m using. Instead of “uh, programs could do literally anything”, it’s “ok, here are the basic rules I expect, and then I can keep a short mental list of exceptions”. So I’m just writing down what I’ve observed about how programs behave in my 20 years of using the terminal, why I think they behave that way, and some examples of cases where that rule is “broken”. There are a bunch of common conventions that I think are pretty clearly the program’s responsibility to implement, like: But in this post I’m going to focus on things that it’s not 100% obvious are the program’s responsibility. For example it feels to me like a “law of nature” that pressing should quit a REPL, but programs often need to explicitly implement support for it – even though doesn’t need to implement support, does . (more about that in “rule 3” below) Understanding which things are the program’s responsibility makes it much less surprising when different programs’ implementations are slightly different. The main reason for this rule is that noninteractive programs will quit by default on if they don’t set up a signal handler, so this is kind of a “you should act like the default” rule. Something that trips a lot of people up is that this doesn’t apply to interactive programs like or or . This is because in an interactive program, has a different job – if the program is running an operation (like for example a search in or some Python code in ), then will interrupt that operation but not stop the program. As an example of how this works in an interactive program: here’s the code in prompt-toolkit (the library that iPython uses for handling input) that aborts a search when you press . TUI programs (like or ) will usually quit when you press . This rule doesn’t apply to any program where pressing to quit wouldn’t make sense, like or text editors. REPLs (like or ) will usually quit when you press on an empty line. This rule is similar to the rule – the reason for this is that by default if you’re running a program (like ) in “cooked mode”, then the operating system will return an when you press on an empty line. Most of the REPLs I use (sqlite3, python3, fish, bash, etc) don’t actually use cooked mode, but they all implement this keyboard shortcut anyway to mimic the default behaviour. For example, here’s the code in prompt-toolkit that quits when you press Ctrl-D, and here’s the same code in readline . I actually thought that this one was a “Law of Terminal Physics” until very recently because I’ve basically never seen it broken, but you can see that it’s just something that each individual input library has to implement in the links above. Someone pointed out that the Erlang REPL does not quit when you press , so I guess not every REPL follows this “rule”. Terminal programs rarely use colours other than the base 16 ANSI colours. This is because if you specify colours with a hex code, it’s very likely to clash with some users’ background colour. For example if I print out some text as , it would be almost invisible on a white background, though it would look fine on a dark background. But if you stick to the default 16 base colours, you have a much better chance that the user has configured those colours in their terminal emulator so that they work reasonably well with their background color. Another reason to stick to the default base 16 colours is that it makes less assumptions about what colours the terminal emulator supports. The only programs I usually see breaking this “rule” are text editors, for example Helix by default will use a purple background which is not a default ANSI colour. It seems fine for Helix to break this rule since Helix isn’t a “core” program and I assume any Helix user who doesn’t like that colorscheme will just change the theme. Almost every program I use supports keybindings if it would make sense to do so. For example, here are a bunch of different programs and a link to where they define to go to the end of the line: None of those programs actually uses directly, they just sort of mimic emacs/readline keybindings. They don’t always mimic them exactly : for example atuin seems to use as a prefix, so doesn’t go to the beginning of the line. Also all of these programs seem to implement their own internal cut and paste buffers so you can delete a line with and then paste it with . The exceptions to this are: I wrote more about this “what keybindings does a program support?” question in entering text in the terminal is complicated . I’ve never seen a program (other than a text editor) where doesn’t delete the last word. This is similar to the rule – by default if a program is in “cooked mode”, the OS will delete the last word if you press , and delete the whole line if you press . So usually programs will imitate that behaviour. I can’t think of any exceptions to this other than text editors but if there are I’d love to hear about them! Most programs will disable colours when writing to a pipe. For example: Both of those programs will also format their output differently when writing to the terminal: will organize files into columns, and ripgrep will group matches with headings. If you want to force the program to use colour (for example because you want to look at the colour), you can use to force the program’s output to be a tty like this: I’m sure that there are some programs that “break” this rule but I can’t think of any examples right now. Some programs have an flag that you can use to force colour to be on, in the example above you could also do . Usually if you pass to a program instead of a filename, it’ll read from stdin or write to stdout (whichever is appropriate). For example, if you want to format the Python code that’s on your clipboard with and then copy it, you could run: ( is a Mac program, you can do something similar on Linux with ) My impression is that most programs implement this if it would make sense and I can’t think of any exceptions right now, but I’m sure there are many exceptions. These rules took me a long time for me to learn because I had to: A lot of my understanding of the terminal is honestly still in the “subconscious pattern recognition” stage. The only reason I’ve been taking the time to make things explicit at all is because I’ve been trying to explain how it works to others. Hopefully writing down these “rules” explicitly will make learning some of this stuff a little bit faster for others.

0 views
Nicky Reinert 10 months ago

Advent of Code - Day 5 - Printer Updates (Bash)

(task | solution) It’s the fifth day of our outrageous adventure, and we are dealing with printer updates. To solve this riddle, we are getting back to the roots and our good ol’ friend Bash! The good news here is that I don’t need to set up any particular environment because, thankfully, Bash is …

0 views
Schneems 11 months ago

RubyConf 2024: Cloud Native Buildpack Hackday (and other Ruby deploy tools, too!)

I’ve spent the last decade+ working on Ruby deploy tooling, including (but not limited to) the Heroku classic and upcoming Cloud Native Buildpack. If you want to contribute to a Ruby deployment or packaging tool (even if it’s not one I maintain), I can help. If you want to learn more about Cloud Native Buildpacks (CNBs) and maybe get a green square on GitHub (or TWO!), keep reading for more resources. Note: This post is for an in-person hackday event at RubyConf 2024 happening on Thursday, November 14th. If you found this but are away from the event, you can still follow along, but I won’t be available for in-person collaboration. If you’re new to Cloud Native Buildpacks, it’s a way to generate OCI images (like docker) without a Dockerfile. Buildpacks take your application code on disk as input and inspect it to determine that it’s a Ruby app and needs to install gems with a bundler. Know before you go! Not strictly required, but will make your life better with iffy-wifi And clone the repo and install dependencies: If you’ve never heard of a buildpack, here are some getting-started guides you can try if you find a bug or run into questions. I can help. Once you’ve played with a buildpack, you’re ready for prime-time. Below, you’ll find some sample things to hack on. You can tackle one by yourself, if you’re ready, or A well-scoped-out task with a change example involves modifying code but requires minimal rust knowledge. Test drive Hanami with a Ruby CNB, document the experience and suggest changes or fixes. https://github.com/heroku/buildpacks-ruby/issues/333 https://github.com/heroku/buildpacks-ruby/issues/298 No link. Write a Cloud Native Buildpack. Bash tutorial at https://buildpacks.io/docs/ . For ideas of possible buildpack ideas, you can look at “classic” buildpacks existing Heroku “classic” buildpacks

0 views
Julia Evans 1 years ago

Some notes on upgrading Hugo

This seems to be discussed in the release notes for 0.57.2 I just needed to replace with in the template on the homepage as well as in my RSS feed template. I had this comment in the part of my theme where I link to the next/previous blog post: “next” and “previous” in hugo apparently mean the opposite of what I’d think they’d mean intuitively. I’d expect “next” to mean “in the future” and “previous” to mean “in the past” but it’s the opposite It looks they changed this in ad705aac064 so that “next” actually is in the future and “prev” actually is in the past. I definitely find the new behaviour more intuitive. Figuring out why/when all of these changes happened was a little difficult. I ended up hacking together a bash script to download all of the changelogs from github as text files , which I could then grep to try to figure out what happened. It turns out it’s pretty easy to get all of the changelogs from the GitHub API. So far everything was not so bad – there was also a change around taxonomies that’s I can’t quite explain, but it was all pretty manageable, but then we got to the really tough one: the markdown renderer. The blackfriday markdown renderer (which was previously the default) was removed in v0.100.0 . This seems pretty reasonable: It has been deprecated for a long time, its v1 version is not maintained anymore, and there are many known issues. Goldmark should be a mature replacement by now. Fixing all my Markdown changes was a huge pain – I ended up having to update 80 different Markdown files (out of 700) so that they would render properly, and I’m not totally sure The obvious question here is – why bother even trying to upgrade Hugo at all if I have to switch Markdown renderers? My old site was running totally fine and I think it wasn’t necessarily a good use of time, but the one reason I think it might be useful in the future is that the new renderer (goldmark) uses the CommonMark markdown standard , which I’m hoping will be somewhat more futureproof. So maybe I won’t have to go through this again? We’ll see. Also it turned out that the new Goldmark renderer does fix some problems I had (but didn’t know that I had) with smart quotes and how lists/blockquotes interact. The hard part of this Markdown change was even figuring out what changed. Almost all of the problems (including #2 and #3 above) just silently broke the site, they didn’t cause any errors or anything. So I had to diff the HTML to hunt them down. Here’s what I ended up doing: (the thing is searching for red/green text in the diff) This was very time consuming but it was a little bit fun for some reason so I kept doing it until it seemed like nothing too horrible was left. Here’s a list of every type of Markdown change I had to make. It’s very possible these are all extremely specific to me but it took me a long time to figure them all out so maybe this will be helpful to one other person who finds this in the future. This doesn’t work anymore (it doesn’t expand the link): I need to do this instead: This works too: I didn’t want this so I needed to configure: This doesn’t render as a nested list anymore if I only indent by 2 spaces, I need to put 4 spaces. The problem is that the amount of indent needed depends on the size of the list markers. Here’s a reference in CommonMark for this . Previously the here didn’t render as a blockquote, and with the new renderer it does. I found a bunch of Markdown that had been kind of broken (which I hadn’t noticed) that works better with the new renderer, and this is an example of that. Lists inside blockquotes also seem to work better. Previously this didn’t render as a heading, but now it does. So I needed to replace the with . I had something which looked like this: With Blackfriday it rendered like this: and with Goldmark it rendered like this: Same thing if there was an accidental at the beginning of a line, like in this Markdown snippet To fix this I just had to rewrap the line so that the wasn’t the first character. The Markdown is formatted this way because I wrap my Markdown to 80 characters a lot and the wrapping isn’t very context sensitive. There were a bunch of places where the old renderer (Blackfriday) was doing unwanted things in code blocks like replacing with or replacing quotes with smart quotes. I hadn’t realized this was happening and I was very happy to have it fixed. The way this gets rendered got better: Before there were two left smart quotes, now the quotes match. Previously if I had an image like this: it would get wrapped in a tag, now it doesn’t anymore. I dealt with this just by adding a to images in the CSS, hopefully that’ll make them display well enough. Previously this wouldn’t get wrapped in a tag, but now it seems to: I just gave up on fixing this though and resigned myself to maybe having some extra space in some cases. Maybe I’ll try to fix it later if I feel like another yakshave. I also needed to Here’s what I needed to add to my to do all that: Maybe I’ll try to get syntax highlighting working one day, who knows. I might prefer having it off though. I also wrote a little program to compare the Blackfriday and Goldmark output for various markdown snippets, here it is in a gist . It’s not really configured the exact same way Blackfriday and Goldmark were in my Hugo versions, but it was still helpful to have to help me understand what was going on. My approach to themes in Hugo has been: So I just need to edit the theme files to fix any problems. Also I wrote a lot of the theme myself so I’m pretty familiar with how it works. Relying on someone else to keep a theme updated feels kind of scary to me, I think if I were using a third-party theme I’d just copy the code into my site’s github repo and then maintain it myself. I asked on Mastodon if anyone had used a static site generator with good backwards compatibility. The main answers seemed to be Jekyll and 11ty. Several people said they’d been using Jekyll for 10 years without any issues, and 11ty says it has stability as a core goal . I think a big factor in how appealing Jekyll/11ty are is how easy it is for you to maintain a working Ruby / Node environment on your computer: part of the reason I stopped using Jekyll was that I got tired of having to maintain a working Ruby installation. But I imagine this wouldn’t be a problem for a Ruby or Node developer. Several people said that they don’t build their Jekyll site locally at all – they just use GitHub Pages to build it. Overall I’ve been happy with Hugo – I started using it because it had fast build times and it was a static binary, and both of those things are still extremely useful to me. I might have spent 10 hours on this upgrade, but I’ve probably spent 1000+ hours writing blog posts without thinking about Hugo at all so that seems like an extremely reasonable ratio. I find it hard to be too mad about the backwards incompatible changes, most of them were quite a long time ago, Hugo does a great job of making their old releases available so you can use the old release if you want, and the most difficult one is removing support for the Markdown renderer in favour of using something CommonMark-compliant which seems pretty reasonable to me even if it is a huge pain. But it did take a long time and I don’t think I’d particularly recommend moving 700 blog posts to a new Markdown renderer unless you’re really in the mood for a lot of computer suffering for some reason. The new renderer did fix a bunch of problems so I think overall it might be a good thing, even if I’ll have to remember to make 2 changes to how I write Markdown (4.1 and 4.3). Also I’m still using Hugo 0.54 for https://wizardzines.com so maybe these notes will be useful to Future Me if I ever feel like upgrading Hugo for that site. Hopefully I didn’t break too many things on the blog by doing this, let me know if you see anything broken!

0 views
Lambda Land 1 years ago

Lessons From Writing My First Academic Paper

I got a paper published at ECOOP this year! This is my first big paper published at a big conference. As such, I wanted to write down some things that I learned so that in the future I can remember a bit better what was hard for me. That way, should I one day advise PhD students working on their first papers, I can help them through the learning curve better. For us, this artifact took the form of a Docker container with a bash script that ran all the code examples from our paper to support the claims we made. I really like that reproducibility like this is often an option in CS. I think part of the difficulty stems from the sheer amount of new terminology and dense technical material present in a typical paper. Everything is new and therefore requires effort to comprehend. As you get familiar with the field, however, previously arcane concepts become easy to grasp. This also makes it easier to see the main idea of the paper: you can tune out the noise and focus on what is novel. Part of it comes from how unfamiliar the form of papers is. When I started research, papers felt arbitrarily formulaic. Now, I can recognize common structures in papers and use these patterns to understand the paper quicker. I actually find that papers are an efficient way for me to learn about cutting-edge research. I hoped, but did not know, that that would eventually be the case when I started. Asking for feedback can be hard—and I don’t just mean in the emotional pride-bruising sense: prompting the people you ask for feedback can be tricky. I asked someone for some help on a different piece of writing, and they weren’t able to give me much useful help and instead focused on trivial issues. Part of that is on me: I should have prompted better. But it is hard to prompt well. I find it a little curious that different kinds of people have different affinities to methods of giving feedback. With this paper, there’s a flag in LaTeX that adds line numbers to the PDF. So, if someone wants to comment on something, they can write: This seems natural to me. I like that it’s software-agnostic. I guess some fields are tied to particular technologies (e.g. suggested edits in Word) and that seems burdensome to me. Specifically, the program chair.  ↩︎ Specifically, the program chair.  ↩︎

0 views
Gabriel Garrido 1 years ago

Simple automated deployments using git push

Using remains one of my favorite ways of deploying software. It’s simple, effective, and you can stretch it significantly until you need more complex workflows. I’m not referring to using to trigger a Github action which builds and deploys software. I’m talking about using to deploy your branch to a server that you’ve named . I learned this workflow from Josef Strzibny’s excellent book Deployment from Scratch , which I’ve adapted somewhat. This note supposes you have SSH access to a server that has installed. Let’s assume that said server is already configured as a host in your machine’s SSH configuration file: I keep an Ansible playbook that automates provisioning this workflow. It should be easy to derive an equivalent bash script if you’re not using Ansible. The way this works is that you keep a bare git repository in the server where you want to deploy software. A bare repository is a repository that does not have a working directory. It does not push or pull. Anyone with access and permission to the server and the directory where git repository is created will be able to push to it to deploy. My convention is to create a directory for the project at hand in the directory. Inside, I will create two directories: a directory where the bare repository lives, and a directory where the source-controlled project files live. Then, you configure a script in the git repository hook’s directory. This script will check out the code that was pushed to the branch into the directory. You could do other git operations here like obtaining the current hash if you’re using that somewhere in your application code. Finally, you trigger a deployment script that also lives in the project root. This deployment script, in turn, takes care of whatever is necessary to build and release the pushed code. For example, you could use it generate a new version of a website that uses Hugo: It’s important to note that the current working directory for that script will be , so you may need to change directory or use absolute paths as I showed in the example above. Another important consideration is that the remote repository gets updated even if exits because of an error. It is recommended, particularly in the deploy script, to use so that the script stops if any command exits with an non-zero status. If an error ocurrs you’ll know right away because the stdout and stderr of is piped back to the client that pushed. I also recommend writing the script such that it is not coupled to a particular push. It should work with what’s in the directory. In other words, I should be able to use the same script to manually deploy the application if I have a reason to. Here are a couple of uses for this workflow, some of which I’ve done: If you’re using PHP, or serving plain HTML files, you may even get away with not having a script given that the hook updates the source files. At this point all you need to do is create a new SSH remote for the repository in your machine: The first above is the name of the remote, which is arbitrary. The second matches the name of the host we defined configured in our SSH configuration file. And finally, you push to it. Because this workflow is version-controlled, reverting or jumping to a specific version is just another git operation. I should emphasize that this workflow is pinned to the branch. Any change that you do must be reflected in that branch in order to get it live. That said, you can push other branches to this remote as it is a regular git repository. However, note the following difference. Doing will not deploy a new version with the contents of that branch. The hook above always checks out the contents of . You can either merge to in your machine and then push to it, or push directly from the branch to the remote: You can force push with if the remote warns you about discrepancies. This one is quite convenient for projects where you can get away with not using an image repository and build pipeline, and if building your image is not too process-intensive. Running periodically is necessary though.  ↩︎ Build a new binary of a Go program using and replace the process Build a new image of a Docker container using or 1 and replace the running containers Restart a Node.js server This one is quite convenient for projects where you can get away with not using an image repository and build pipeline, and if building your image is not too process-intensive. Running periodically is necessary though.  ↩︎

0 views
Matthias Endler 3 years ago

zerocal - A Serverless Calendar App in Rust Running on shuttle.rs

Every once in a while my buddies and I meet for dinner. I value these evenings, but the worst part is scheduling these events! We send out a message to the group. We wait for a response. We decide on a date. Someone sends out a calendar invite. Things finally happen. None of that is fun except for the dinner. Being the reasonable person you are, you would think: “Why don’t you just use a scheduling app?”. I have tried many of them. None of them are any good. They are all… too much ! Just let me send out an invite and whoever wants can show up. The nerdy, introvert engineer’s solution 💡 What we definitely need is yet another calendar app which allows us to create events and send out an invite with a link to that event! You probably didn’t see that coming now, did you? Oh, and I don’t want to use Google Calendar to create the event because I don’t trust them . Like any reasonable person, I wanted a way to create calendar entries from my terminal . That’s how I pitched the idea to my buddies last time. The answer was: “I don’t know, sounds like a solution in search of a problem.” But you know what they say: Never ask a starfish for directions. Show, don’t tell That night I went home and built a website that would create a calendar entry from parameters. It allows you to create a calendar event from the convenience of your command line: You can then save that to a file and open it with your calendar app. In a sense, it’s a “serverless calendar app”, haha. There is no state on the server, it just generates a calendar event on the fly and returns it. How I built it You probably noticed that the URL contains “shuttleapp.rs”. That’s because I’m using shuttle.rs to host the website. Shuttle is a hosting service for Rust projects and I wanted to try it out for a long time. To initialize the project using the awesome axum web framework, I’ve used and I was greeted with everything I needed to get started: Let’s quickly commit the changes: To deploy the code, I needed to sign up for a shuttle account. This can be done over at https://www.shuttle.rs/login . It will ask you to authorize it to access your Github account. and finally: Now let’s head over to zerocal.shuttleapp.rs : Deploying the first version took less than 5 minutes. Neat! We’re all set for our custom calendar app. Writing the app To create the calendar event, I used the icalendar crate (shout out to hoodie for creating this nice library!). iCalendar is a standard for creating calendar events that is supported by most calendar apps. Let’s create a demo calendar event: Simple enough. How to return a file!? Now that we have a calendar event, we need to return it to the user. But how do we return it as a file? There’s an example of how to return a file dynamically in axum here . Some interesting things to note here: Here is the implementation: We just create a new object and set the header to the correct MIME type for iCalendar files: . Then we return the response. Add date parsing This part is a bit hacky, so feel free to glance over it. We need to parse the date and duration from the query string. I used dateparser , because it supports sooo many different date formats . Would be nice to support more date formats like and , but I’ll leave that for another time. Let’s test it: Nice, it works! Opening it in the browser creates a new event in the calendar: Of course, it also works on Chrome, but you do support the open web , right? And for all the odd people who don’t use a terminal to create a calendar event, let’s also add a form to the website. Add a form I modified the function a bit to return the form if the query string is empty: After some more tweaking, we got ourselves a nice little form in all of its web 1.0 glory: The form And that’s it! We now have a little web app that can create calendar events. Well, almost. We still need to deploy it. Deploying Right, that’s all. It’s that easy. Thanks to the folks over at shuttle.rs for making this possible. The calendar app is now available at zerocal.shuttleapp.rs . Now I can finally send my friends a link to a calendar event for our next pub crawl. They’ll surely appreciate it. yeah yeah From zero to calendar in 100 lines of Rust Boy it feels good to be writing some plain HTML again. Building little apps never gets old. Check out the source code on GitHub and help me make it better! 🙏 Here are some ideas: Check out the issue tracker and feel free to open a PR! I don’t want to have to create an account for your calendar/scheduling/whatever app. I don’t want to have to add my friends. I don’t want to have to add my friends’ friends. I don’t want to have to add my friends’ friends’ friends. You get the idea: I just want to send out an invite and get no response from you. Every calendar file is a collection of events so we wrap the event in a object, which represents the collection. is a trait that allows us to return any type that implements it. is a newtype wrapper around that implements . ✅ Add location support (e.g. or ). Thanks to sigaloid . Add support for more human-readable date formats (e.g. , ). Add support for recurring events. Add support for timezones. Add Google calendar short-links ( ). Add example bash command to create a calendar event from the command line. Shorten the URL (e.g. )?

0 views