Posts in Api (20 found)
Simon Willison 1 weeks ago

Meta's new model is Muse Spark, and meta.ai chat has some interesting tools

Meta announced Muse Spark today, their first model release since Llama 4 almost exactly a year ago . It's hosted, not open weights, and the API is currently "a private API preview to select users", but you can try it out today on meta.ai (Facebook or Instagram login required). Meta's self-reported benchmarks show it competitive with Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 on selected benchmarks, though notably behind on Terminal-Bench 2.0. Meta themselves say they "continue to invest in areas with current performance gaps, such as long-horizon agentic systems and coding workflows". The model is exposed as two different modes on meta.ai - "Instant" and "Thinking". Meta promise a "Contemplating" mode in the future which they say will offer much longer reasoning time and should behave more like Gemini Deep Think or GPT-5.4 Pro. I prefer to run my pelican test via API to avoid being influenced by any invisible system prompts, but since that's not an option I ran it against the chat UI directly. Here's the pelican I got for "Instant": And this one for "Thinking": Both SVGs were rendered inline by the Meta AI interface. Interestingly, the Instant model output an SVG directly (with code comments) whereas the Thinking model wrapped it in a thin HTML shell with some unused JavaScript libraries. Which got me curious... Clearly Meta's chat harness has some tools wired up to it - at the very least it can render SVG and HTML as embedded frames, Claude Artifacts style. But what else can it do? I asked it: what tools do you have access to? I want the exact tool names, parameter names and tool descriptions, in the original format It spat out detailed descriptions of 16 different tools. You can see the full list I got back here - credit to Meta for not telling their bot to hide these, since it's far less frustrating if I can get them out without having to mess around with jailbreaks. Here are highlights derived from that response: Browse and search . can run a web search through an undisclosed search engine, can load the full page from one of those search results and can run pattern matches against the returned page content. Meta content search . can run "Semantic search across Instagram, Threads, and Facebook posts" - but only for posts the user has access to view which were created since 2025-01-01. This tool has some powerful looking parameters, including , , , and . "Catalog search" - can "Search for products in Meta's product catalog", presumably for the "Shopping" option in the Meta AI model selector. Image generation . generates images from prompts, and "returns a CDN URL and saves the image to the sandbox". It has modes "artistic" and "realistic" and can return "square", "vertical" or "landscape" images. container.python_execution - yes! It's Code Interpreter , my favourite feature of both ChatGPT and Claude. Execute Python code in a remote sandbox environment. Python 3.9 with pandas, numpy, matplotlib, plotly, scikit-learn, PyMuPDF, Pillow, OpenCV, etc. Files persist at . Python 3.9 is EOL these days but the library collection looks useful. I prompted "use python code to confirm sqlite version and python version" and got back Python 3.9.25 and SQLite 3.34.1 (from January 2021 ). container.create_web_artifact - we saw this earlier with the HTML wrapper around the pelican: Meta AI can create HTML+JavaScript files in its container which can then be served up as secure sandboxed iframe interactives. "Set kind to for websites/apps or for vector graphics." container.download_meta_1p_media is interesting: "Download media from Meta 1P sources into the sandbox. Use post_id for Instagram/Facebook/Threads posts, or for catalog product images". So it looks like you can pull in content from other parts of Meta and then do fun Code Interpreter things to it in the sandbox. container.file_search - "Search uploaded files in this conversation and return relevant excerpts" - I guess for digging through PDFs and similar? Tools for editing files in the container - , (with and ), . These look similar to Claude's text editor tool commands - these are becoming a common pattern across any file-equipped agent harness. container.visual_grounding - see below, this one is fun . subagents.spawn_agent - the sub-agent as a tool pattern. "Spawn an independent sub-agent for research, analysis, or delegation. It returns its final text response." third_party.link_third_party_account - "Initiate account linking for a third-party service", currently Google Calendar, Outlook Calendar, Gmail, or Outlook. Let's talk about that one. Here's the description in full: Description: Visual grounding tool that analyzes the image, identifies and labels objects, locates regions, or counts objects. Parameters: So it can take an image path, optional object names and return things in format , or . Sounds to me like Meta's Segment Anything . I decided to take it for a spin. generate a photo of a raccoon sitting on a trash can wearing trash as a hat Here you go — king of the alley, in full regalia: Coffee cup crown, banana peel brim, newspaper feather. Peak raccoon fashion. Want me to make a wilder version — maybe night-vision, or wearing sunglasses too? My guess is that image generation in Meta AI is powered by their Emu model , or an updated version of it. This is the same pattern as ChatGPT and Gemini image generation, where images are generated by a tool call (here that's ). The tool description said that it "saves the image to the sandbox", so my guess was that I'd then be able to run Python code against it. I tried: use python OpenCV to analyze that image and find out neat things about it Annoyingly it did not show me the code it ran, but it did create this image: Pretty neat! We can both generate images and then use them with Python tools. Next, the tool. Knowing the names of the tools makes it easy to refer to them in prompts: Now try the container.visual_grounding tool The response ( full response here ): Done — visual_grounding pinpointed every piece of the outfit with pixel-level precision: Want me to run it in bbox mode next to draw boxes around the fur texture, or count mode to see how many individual whiskers it can detect? Once again, Meta AI wrapped the results from the tool in a custom HTML visualization. That's clearly mode, but how about the others? I prompted: yes draw bounding boxes, can it do masks too? Yes — bounding boxes work great. Masks? Not directly in ; it only returns three formats: point, bbox, and count. No pixel-level segmentation masks. Here's the bbox output: What you’re seeing: Notice how the boxes nest: face is inside raccoon, eyes inside face, hat pieces overlap each other but stay above the face box. The tool is doing real object localization, not just guessing centers. About masks: since visual_grounding doesn't output them, I can fake them with OpenCV in the container. Two quick options: Want me to generate an actual alpha mask for the raccoon (cut it out from the alley), or for the trash hat pieces separately? Takes about 10 seconds and I can give you a transparent PNG. Here's the HTML for that bbox visualization . Finally I had it dump out the raw JSON for point mode, bbox mode and count mode. You can see that all here . The count mode is particularly interesting - here's that in full: So Meta AI has the ability to count a raccoon's whiskers baked into the default set of tools. Which means... it can count pelicans too! Here's that overlay exported as HTML . Update : Meta's Jack Wu confirms that these tools are part of the new harness they launched alongside the new model. On Twitter Alexandr Wang said : this is step one. bigger models are already in development with infrastructure scaling to match. private api preview open to select partners today, with plans to open-source future versions. I really hope they do go back to open-sourcing their models. Llama 3.1/3.2/3.3 were excellent laptop-scale model families, and the introductory blog post for Muse Spark had this to say about efficiency: [...] we can reach the same capabilities with over an order of magnitude less compute than our previous model, Llama 4 Maverick. This improvement also makes Muse Spark significantly more efficient than the leading base models available for comparison. So are Meta back in the frontier model game? Artificial Analysis think so - they scored Meta Spark at 52, "behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6". Last year's Llama 4 Maverick and Scout scored 18 and 13 respectively. I'm waiting for API access - while the tool collection on meta.ai is quite strong the real test of a model like this is still what we can build on top of it. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Browse and search . can run a web search through an undisclosed search engine, can load the full page from one of those search results and can run pattern matches against the returned page content. Meta content search . can run "Semantic search across Instagram, Threads, and Facebook posts" - but only for posts the user has access to view which were created since 2025-01-01. This tool has some powerful looking parameters, including , , , and . "Catalog search" - can "Search for products in Meta's product catalog", presumably for the "Shopping" option in the Meta AI model selector. Image generation . generates images from prompts, and "returns a CDN URL and saves the image to the sandbox". It has modes "artistic" and "realistic" and can return "square", "vertical" or "landscape" images. container.python_execution - yes! It's Code Interpreter , my favourite feature of both ChatGPT and Claude. Execute Python code in a remote sandbox environment. Python 3.9 with pandas, numpy, matplotlib, plotly, scikit-learn, PyMuPDF, Pillow, OpenCV, etc. Files persist at . Python 3.9 is EOL these days but the library collection looks useful. I prompted "use python code to confirm sqlite version and python version" and got back Python 3.9.25 and SQLite 3.34.1 (from January 2021 ). container.create_web_artifact - we saw this earlier with the HTML wrapper around the pelican: Meta AI can create HTML+JavaScript files in its container which can then be served up as secure sandboxed iframe interactives. "Set kind to for websites/apps or for vector graphics." container.download_meta_1p_media is interesting: "Download media from Meta 1P sources into the sandbox. Use post_id for Instagram/Facebook/Threads posts, or for catalog product images". So it looks like you can pull in content from other parts of Meta and then do fun Code Interpreter things to it in the sandbox. container.file_search - "Search uploaded files in this conversation and return relevant excerpts" - I guess for digging through PDFs and similar? Tools for editing files in the container - , (with and ), . These look similar to Claude's text editor tool commands - these are becoming a common pattern across any file-equipped agent harness. container.visual_grounding - see below, this one is fun . subagents.spawn_agent - the sub-agent as a tool pattern. "Spawn an independent sub-agent for research, analysis, or delegation. It returns its final text response." third_party.link_third_party_account - "Initiate account linking for a third-party service", currently Google Calendar, Outlook Calendar, Gmail, or Outlook. object_names (array, required) image_path (string, default "") format_type ("bbox"|"point"|"count"|null, default "bbox") title (string|null)[string] raccoon : big box from (270,240) to (892,845) — covers 62% of the width, basically the whole animal coffee cup : (424,115)-(610,273) — sits highest, 158px tall in the 0-1000 space banana peel : (472,215)-(667,291) — overlaps the cup's base, confirming it's draped over the side newspaper : (305,223)-(548,315) — left ear coverage trash can lid : (107,800)-(1000,998) — spans almost the full image width at the bottom face : (305,283)-(750,537) — tight crop around the mask eyes : both ~40×37 pixel boxes, perfectly level at y=379 GrabCut — I give it the raccoon bbox, it iteratively segments fur from background and returns a binary mask SAM-style color threshold — use the bbox plus k-means to isolate the coffee cup or banana peel

0 views
Martin Alderson 2 weeks ago

Telnyx, LiteLLM and Axios: the supply chain crisis

While the world's been watching physical supply chains, a different kind of supply chain attack has been escalating in the open source ecosystem. Over the past week a group of bad actors have been compromising various open source projects, pushing malicious versions of libraries which inject a trojan that collects sensitive data from systems that install the malicious version. Ironically, the first attack started with , an open source package for finding security vulnerabilities. The scale of the issue is growing and is alarming. This wave of attacks started with some smaller libraries, then started to hit more popular packages in the supply chain with , a popular package for voice and SMS integration. This had ~150k/week downloads on the affected package. was next - a much more popular package for calling various APIs. This had ~22M/week downloads. Finally, and most concerning, the npm package for - an incredibly widely used library for calling APIs, was attacked on March 31st. This has at least 100M downloads a week and is a very core piece of software that is used in millions of apps. There was a rapid reaction to each of these attacks to remove the malicious versions, but even in the hours they were up, tens of thousands of machines (and potentially far more) were likely compromised. The attackers are leveraging stolen credentials from the previous attack(s) to then infect more packages in the supply chain. This creates a vicious cycle of compromises that continues to grow. Equally, other systems are at risk - for every system that the attack compromises who happens to also be a developer of another software library, there are probably thousands of other developers who have unfortunately leaked very sensitive data to the attackers. This is not a new issue, and last year we saw the and attacks against the npm ecosystem which in two waves backdoored over 1,000 packages. The aim of this attack appears to have been to steal crypto - with reports suggesting $8.5m was stolen. The infrastructure providers behind this supply chain did respond by putting various mitigations in place. The primary two were requiring published packages to use short-lived tokens - which reduces the impact of "old" credentials being able to publish new packages. It appears this has not solved the issue - given it seems these packages have managed to be published regardless. The more invasive one is to allow developers to not install "brand new" packages. Instead, they get held for a time period - say 24 hours - with the idea being the community will (hopefully) detect malicious versions in the 24 hours and revoke them before they are installed. This is a double edged sword though - as often you need rapid response to a vulnerable package to avoid security issues. This can be overridden manually - but it does introduce some overhead to response to urgent security flaws. Finally, npm are rolling out staged publishing. This requires a separate step when publishing new versions of packages for a "trusted" human to do a check on the platform with two step verification to avoid automated attacks. However, given it seems developers computers' are being compromised it is not implausible to suggest that the attacker could also perform this step. I'm extremely concerned about the cybersecurity risk LLMs pose, which I don't think is sufficiently priced in on the impact it is going to have outside of niche parts of the tech community. While it's hard to know for sure how the initial attacks were discovered, I strongly suspect they have been aided by LLMs to find the exploit(s) in the first place and develop subsequent attacks. While this is conjecture, the number of exploits being found by non-malicious actors is exploding . I found one myself - which I wrote up in a recent post , still unpatched - in less than half an hour. There's endless other examples online . So it seems to me that LLMs are acting as an accelerant: Firstly, they make finding security vulnerabilities far easier - which allows the whole supply chain attack cycle to start. And the leaked rumours about the new Mythos model from Anthropic being a step change better than Opus 4.6 (which is already exceptionally good at finding security issues) means the direction of travel is only going one way. Secondly, they allow attackers to build far more sophisticated attacks far quicker than before - for example, one of the attacks in this recent wave hid one exploit in an audio file. Next, this is all happening while the infrastructure providers of the software supply chain are on the back foot with improving mitigations. Finally, so much of the software ecosystems' critical security infrastructure is maintained by volunteers who are often unpaid. As always, the above image illustrates the point far better than words can. To reiterate - it may be that this is just a well resourced group that could have done all this without LLMs. But given adoption of coding agents is so high in the broader developer community, it seems far fetched to say they wouldn't be used for nefarious means. Fundamentally, these attacks are possible because OSes (by default) are far too permissive and designed in a world where software is either trusted or not. The attempts to secure this - by trusting certain publishers - falls down for both agents and supply chain attacks because agents can use trusted software in unexpected ways, and if the trusted authors of the software are compromised it bypasses everything. Thinking a few steps ahead here, it seems to me that the core mitigations are (mostly) insufficient. There are some things however that would help with the supply chain in particular: To me though I keep coming back to the realisation that the difficulty of sandboxing agents faces very similar challenges to helping mitigate the impact of this security issue. iOS and Android were designed with this approach in mind - each app has very limited access to other apps and the OS as a whole. I think we need to move desktop and server operating systems to a similar model for this new world. While this won't resolve all issues, it will dramatically reduce the "blast impact" of each attack and prevent the "virality" of many exploits from gathering traction. The OS should know that should only write package files to a certain set of folders and reject everything else. The OS should know a baseline of services a CI/CD run and what network calls it makes, to avoid connections to random command and control services. And like mobile OSes, one program shouldn't be able to read another programs files and data without explicit opt in. If you've used sandbox mode in a coding agent, you will be familiar with this approach - all the pieces are there already. Qubes OS is probably the closest thing outside of mobile OSes to what I'm thinking we need to move to - a security focused Linux operating system which runs each app in a total self-contained VM. It's an enormous undertaking to migrate the world's software to run like this, and perhaps governments should be allocating significant resources to open source projects to help them adopt this. Any delay to publishing packages can backfire and introduce delays in responding to real security incidents There is too much software - maintained or unmaintained - which is likely to be vulnerable Much of this software, if it is maintained, is poorly resourced and is likely to burn out volunteers trying to resolve a flood of security issues in the near term Frontier labs donating compute and tokens to automatically scan every package update for potential signs of compromise before publishing. This would be an excellent use of their leading models

0 views
iDiallo 2 weeks ago

How Do We Get Developers to Read the Docs

When I reviewed this PR, I had tears in my eyes. We had done it. We had finally created the perfect API. To top it off, the senior developer who worked on it had written documentation to match. No stones were left unturned. I had the code open on one window and the doc on the other. The moment I felt hesitation in the code, the documentation reassured me. Why do we make two calls to get the... "We are fetching two types of orders to support legacy subscribers..." the documentation answered before I completed my question. This was standard number 15 . The one to rule them all. But I still had one question. As the owner of the API, I read the documentation. Will other developers ever think to read it? How do I get people to want to read the documentation before they use this API? Because in my experience, nobody reads the documentation. Not to say that documentation is useless, but my mistake was thinking that the people who want to implement the API are interested in documentation at all. For every API ever built, there are two audiences to cater to, and confusing them is where most documentation goes wrong. The first group is the consumers of the API. The only thing they want to know is: do the endpoints do what I need, and what parameters do they take? They are not reading your documentation like a book. They are scanning it like a menu. They want to find the thing they need, copy the example, and move on. The second group is the maintainers of the API. The people who need to understand the why behind every decision. Why are there two calls? Why does this endpoint behave differently for legacy users? Why is this field nullable? These are the people who will be debugging at 2am, and they need the full picture. The worst thing you can do is write one document that tries to serve both audiences equally. You end up with something that's too deep for the first group to skim, and not structured enough for the second group to find it useful. For the first audience, the API should speak for itself. The best documentation you can provide is not text to read through, but a well-designed API. Follow clear, repeatable patterns where the user can anticipate, or even assume the available features. If you have an endpoint called , the assumption should be that returns a specific order. If you add , there should probably be a too. When the pattern is consistent, the consumer doesn't need to read anything, they just guess correctly. When you do write documentation for this audience, resist the urge to explain your internals. They don't need to know that you're fetching from two different database tables to support legacy subscribers. What they need to know is: . One sentence. Done. I like this idiom: "Too much information and no information, accomplish the same goal." This is a mistake I see most often. It's a painful one because it comes from a good place. The writer of the documentation, usually the person who built the thing, feels a sense of responsibility. They want to be thorough. They want no one to be confused. So they write everything down. The result is a documentation page that looks like this: This endpoint retrieves orders for a given user. It was introduced in v2.3 of the API following the migration from the legacy order management system (OMS) in Q3 2021. Internally, the resolver makes two sequential calls (one to the new orders table and one to the legacy_orders table) and merges the results using the order ID as a deduplication key. Note that legacy orders may not contain a field, which was not captured before 2019. If you are building a UI, you should account for this possibility. The endpoint also supports cursor-based pagination, though offset-based pagination is available for backward compatibility with clients built before v2.1. Additionally, orders in a state may not appear immediately... A developer scanning this page will read the first sentence, close the tab, and think about designing API standard number 16. They'll go look at the codebase instead, or ping a teammate, or just guess. The documentation existed, it just didn't get read. Which means it accomplished exactly the same thing as having no documentation at all. The same way you don't write a comment to explain every line of code, a documentation doesn't benefit from too much information. My go to solution isn't to omit information, but to write it in layers. Collapsible sections are one of the most underrated tools in documentation design. They let the consumer skim the surface: endpoint name, what it returns, a working example. And they let the maintainer dive deeper into the implementation notes, the edge cases, and the historical context. The same principle applies to how you order information. Lead with what the API does. Follow with how to use it. Bury the why at the bottom, behind a toggle or a "Details" section, available to those who need it, invisible to those who don't. Think of it like a well-designed error message. A good error message tells you what went wrong in plain language. A great error message also includes an expandable stack trace, but it doesn't show you the stack trace first. Your documentation has the same job. Give people the answer they're looking for, and then offer the depth to those willing to dig. The second audience, the maintainers, do need the full picture. The two database calls, the deduplication logic, the historical reason the field is sometimes null. This is the documentation that prevents a future developer from "fixing" something that wasn't broken, or removing what looks like redundant code. But this documentation doesn't have to live on the same page as the quick-start guide. Deep implementation notes belong in inline code comments or a separate internal wiki. The public-facing API reference should stay clean. When you separate operational documentation (for consumers) from institutional documentation (for maintainers), both documents get better. The consumer doc gets shorter and clearer. The maintainer doc gets deeper because it's no longer trying to also be beginner-friendly. The goal of documentation isn't completeness. Completeness is what you write for yourself, to feel like you've done your job. The goal of documentation is to transfer the right information into the right person's head at the right moment. That senior developer who wrote the documentation I cried over understood this. She didn't write everything she knew. She wrote exactly what someone reading the code would need to know, at the exact moment they'd need it. And the API design allowed anyone consuming it to make correct assumptions (intuitive design) on how it works. Both groups are happy.

0 views

Walking backwards into the future – A look at descriptor heap in Granite

It seems like I can never quite escape the allure of fiddling with bits more efficiently every passing year. I recently went through the process of porting over Granite’s Vulkan backend to use VK_EXT_descriptor_heap. There wasn’t exactly a burning need to do this work, but science demands I sacrifice my limited free time for these experiments. My name may or may not be on the extension summary, and it’s important to eat your own dog food. In this post, I want to explore ways in which we can port over an old school binding model to newer APIs should the need arise. Granite’s binding model is designed for really old Vulkan. The project started in January 2017 after all, at which point Vulkan was in its infancy. Bindless was not really a thing yet, and I had to contend with really old mobile hardware. Slot-based bindings have been with us since OpenGL and early D3D. I still think it’s a fine model from a user’s perspective. I have no problem writing code like: It’s very friendly to tooling and validation and I just find it easy to use overall. GPU performance is great too since vendors have maximal flexibility in how to implement the API. The major downside is the relatively heavy CPU cost associated with it since there are many API calls to make. In my projects, it’s rarely a concern, but when doing heavy CPU-bound workloads like PS2 GS emulation, it did start to matter quite a bit When SPIR-V shaders are consumed in Granite, they are automatically reflected. E.g., with GLSL: I automatically generate VkDescriptorSetLayout for each unique set, and combine these into a VkPipelineLayout as one does. VkDescriptorSetLayouts is hash’n’cached into a DescriptorSetAllocator. The implicit assumption by shaders I write is that low-frequency updates have lower set values. This matches Vulkan’s pipeline layout compatibility rules too. Given the hardcore descriptor churn this old model can incur, UBOs originally used VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC. Since linearly allocating new UBOs per draw is a hot path, I wanted to avoid having to allocate and write new descriptor sets all the time. This is precisely what the dynamic buffer types were designed for. I did not use it for SSBOs since DYNAMIC has some unfortunate interactions with descriptor size, since you cannot change the size, only offset. The size of UBOs is somewhat irrelevant, and I just hardcoded in a 64K window. There are two main strategies for allocating sets from a VkDescriptorPool, both which are kinda bad. The typical model I believe most do is the “jumbo” allocator where you create a big pool with many sets and many descriptors with different descriptor types and pray for the best. When the pool is OOM-ed, allocate another. One unfortunate thing about the jumbo pool is that you can’t really know up front exactly how to balance the descriptor types properly. It will always be a shaky heuristic. In raw Vulkan 1.0, it was straight up illegal to allocate any further once a limit had been reached, causing even more headaches. The very first maintenance extension to Vulkan fixed this and added OUT_OF_POOL_MEMORY which allows applications to just keep going until the pool is exhausted. Fun fact is that some vendors would never exhaust the pool and just straight up ignore what you pass into vkCreateDescriptorPool, so that’s fun. Granite went the route of a slab allocator per VkDescriptorSetLayout instead, one allocator per thread. Allocate a group of like 64 VkDescriptorSets in one go and parcel them out as needed. Main advantage here was no need to keep calling vkAllocateDescriptorSets over and over, and in the early years, I even hash’n’cached the descriptor sets. The primary reason for doing that was that some early mobile drivers were extreeeeeeeemely slow at vkUpdateDescriptorSets for some reason. Not a great time. This slab approach lead to memory bloat though. At some point VK_KHR_descriptor_update_template was added which aims to accelerate vkUpdateDescriptorsSets. Instead of having the driver parse the structs and switching on the descriptorType to write descriptors, the update template allows drivers in theory to “precompile” a highly optimized function that updates descriptors based on the template that is provided in vkCreateDescriptorUpdateTemplate. This was a nice incremental thing to add to Granite. I don’t think the promise of update templates really worked out in the end though. Most drivers I think just resorted to parsing the original template instead, leading to no speedup. Push descriptors were designed quite early on in Vulkan’s life, but its adoption was … spotty at best. It didn’t make it into core until Vulkan 1.4! Push descriptors solved some issues for us slot and binding troglodytes since there was simply no need to mess around with allocating sets and pools when we could just push descriptors and the driver would deal with it. The major downside is that only one descriptor set can be a push set, but in Granite’s case, I could design for that limitation when writing shaders. The last set index in a VkPipelineLayout would get assigned as a push set. After going push descriptors, I dropped the old UBO_DYNAMIC path, since push descriptors are not compatible with it, and the UBO_DYNAMIC wins were … questionable at best anyway. It took a while to move to this model though. AMD Windows driver was infamously dragging its feet for years before finally accepting reality and at that point I was ready to move over. It’s still not a hard requirement in Granite due to mobile concerns, but then the driver hits the slow path, and I don’t really care anymore At some point, any modern renderer has to deal with this and Granite hit this wall with clustered shading, where an array of shadow maps became a hard necessity. I’m not a big fan of “everything is bindless” myself, since I think it makes debugging way more annoying and stresses tooling and validation more than it should, but sometimes the scissor juggling is necessary. When Granite reflects a shader looking like this: The set layout is converted into an UPDATE_AFTER_BIND set with VARIABLE_COUNT array length. There is also a special helper function to aid in allocating these bindless sets where the API mostly turns into: The CPU overhead of this isn’t quite trivial either, but with the set and pool model, it’s not easy to escape this reality without a lot of rewrites. For now, I only support sampled images with bindless and I never really had any need or desire to add more. For bindless buffers, there is the glorious buffer_device_address instead. This model has served and keeps serving Granite well. Once this model is in place, the only real reason to go beyond this for my use cases is performance (and curiosity). VK_EXT_descriptor_buffer asks the question of what happens when we just remove the worst parts of the descriptor API: Sets are now backed by a slice of memory, and pools are replaced by a big descriptor buffer that is bound to a command buffer. Some warts remain however, as VkDescriptorSetLayout and PipelineLayout persist. If you’re porting from the legacy model like I was, this poses no issues at all, and actually reduces the friction. Descriptor buffers are a perfectly sound middle-ground alternative for those who aren’t a complete bindless junkie yet, but want some CPU gains along the way. In the ideal use case for descriptor buffers, we have one big descriptor buffer that is always bound. This is allocated with PCI-e BAR on dGPUs, so DEVICE_LOCAL | HOST_VISIBLE. Instead of allocating descriptor sets, command buffer performs a linear allocation which is backed by slices allocated from the global descriptor buffer. No API calls needed. The size to allocate for VkDescriptorSet is queried from the set layout itself, and each descriptor is assigned an offset that the driver controls. There is a wart in the spec where the min-spec for sampler descriptor buffers is very small (4K samplers). In this case, there is a risk that just linearly allocating out of the heap will trivially OOM the entire thing and we have to allocate new sampler descriptor buffers all the time. In practice, this limitation is completely moot. Granite only opts into descriptor buffers if the limits are reasonable. There is supposed to be a performance hit to rebinding descriptor buffers, but in practice, no vendor actually ended up implementing descriptor buffers like that. However, since VK_EXT_descriptor_heap will be way more strict about these kinds of limitations, I designed the descriptor_buffer implementation around the single global heap model to avoid rewrites later. There is certainly a risk of going OOM when linearly allocating like this, but I’ve never hit close to the limits. It’s not hard to write an app that would break Granite in half though, but I consider that a “doctor, my GPU hurts when I allocate like this” kind of situation. This is where we should have a major win, but it’s not all that clear. For each descriptor type, I have different strategies on how to deal with them. The basic idea of descriptor buffers is that we can call vkGetDescriptorEXT to build a descriptor in raw bytes. This descriptor can now be copied around freely by the CPU with e.g. memcpy, or even on the GPU in shaders (but that’s a level of scissor juggling I am not brave enough for). These are the simplest ones to contend with. Descriptor buffers still retain the VkImageView and VkSampler object. The main addition I made was to allocate a small payload up front and write the descriptor once. E.g.: Instead of vkUpdateDescriptorSets, we can now replace it with a trivial memcpy. The memcpy functions are function pointers that resolve the byte count. This is a nice optimization since the memcpy functions can unroll to perfectly unrolled SIMD load-store. Allocating bindless sets of sampled images with this method becomes super efficient, since it boils down to a special function that does: I rarely use these, but they are also quite neat in descriptor buffers. VkBufferView is gone now, so we just need to create a descriptor payload once from VkDeviceAddress and it’s otherwise the same as above. This descriptor type is somewhat of a relic these days, but anyone coming from a GL/GLES background instead of D3D will likely use this descriptor type out of old habit, me included. The API here is slightly more unfortunate, since there is no obvious way to create these descriptors up-front. We don’t necessarily know all the samplers an image will be combined with, so we have to do it last minute, calling vkGetDescriptorEXT to create the combined descriptor. We cannot meaningfully pre-create descriptors for UBOs and SSBOs so we’re in a similar situation where we have to call vkGetDescriptorEXT for each buffer last-minute. Unfortunately, there is no array of descriptor version for GetDescriptorEXT, so in the extreme cases, descriptor buffers can actually have worse CPU overhead than legacy model. DXVK going via winevulkan .dll <-> .so translation overhead has been known to hit this, but for everyone else I’d expect the difference to be moot. Since descriptor buffer is an incremental improvement over legacy model, we retain optional support for push descriptors. This can be useful in some use cases (it’s critical for vkd3d-proton), but Granite does need it. Once we’re in descriptor buffer land, we’re locked in. Descriptor buffers are battle tested and very well supported at this point. Perhaps not on very old mobile drivers, but slightly newer devices tend to have it, so there’s that! RenderDoc has solid support these days as well. At a quick glance, descriptor heap looks very similar to D3D12 (and it is), but there are various additions on top to make it more compatible with the various binding models that exist out there in the wild, especially for people who come from a GL/Vulkan 1.0 kind of engine design. The normal D3D12 model has some flaws if you’re not fully committed to bindless all day every day, mainly that: This is to match how some hardware works, nothing too complicated. I allocate for the supported ~1 million resource descriptors and 4096 samplers. There is a reserved region for descriptors as well which is new to this extension. In D3D12 this is all abstracted away since applications don’t have direct access to the descriptor heap memory. For the resource heap, we have a 512 K descriptor area which can be freely allocated from, like we did with descriptor buffer. Unlike descriptor buffer where we hammer this arena allocator all the time, we will only rarely need to touch it with descriptor heap. The next ~500k or so descriptors are dedicated to holding the descriptor payload for VkImageView, VkSampler and VkBufferView. All of these objects are now obsolete. When Granite creates a Vulkan::ImageView, it internally allocates a free slab index from this upper region, writes the descriptor there and stores the heap index instead. This enables “true” bindless in a performant way. We could have done this before if we wanted to, but in descriptor buffer we would have eaten a painful indirection on a lot of hardware, which is not great. Some Vulkan drivers actually works just like this internally. You can easily tell, because some drivers report that an image descriptor is just sizeof(uint32_t). We’d have our index into the “heap”, which gets translated into yet another index into the “true” (hidden) heap. Chasing pointers is bad for perf as we all know. We keep a copy of the descriptor payload in CPU memory too, in case we have to write to the arena allocated portion of the heap later. The upper region of ~10k descriptors or so (depends on the driver) is just a reserved region we bind and never touch. It’s there so that drivers can deal with CmdResolveImage, CmdBlitImage and other such special APIs that internally require descriptors. For samplers, there is no arena allocator. It’s so tiny. Instead, when creating a sampler, we allocate a slab index and return a dummy handle by just pointer casting the index instead. We’ll make good use of the mapping APIs later to deal with this lack of arena allocation. In fact, we will never have to copy sampler descriptor payloads around, and we don’t have to mess around with static samplers either, neat! For the static sampler crowd, there is full support for embedded samplers which functions just like D3D12 static samplers, so there’s that but Granite doesn’t use it. It was a non-trivial amount of code to get to this point, but hey, that’s what happens when you try to support 3 descriptor models at once I guess … Core Vulkan 1.0 settled on 128 bytes of push constants being the limit. This was raised in Vulkan 1.4 but Granite keeps the old limit (I could probably live with 32 or 64 bytes to be fair). Push data expands to 256 byte as a minimum, and the main idea behind descriptor heap is that pipeline layouts are completely gone, and we get to decide how the driver should interpret the push data space. This is similar to D3D12 root parameters except it’s not abstracted behind a SetRootParameter() kind of interface that is called one at a time. In Vulkan, we can call CmdPushDataEXT once. VkPipelineLayout and VkDescriptorSetLayout is just gone now, poof, does not exist at all. This is huge for usability. Effectively, we can pretend that the VkPipelineLayout is now just push constant range of 256 bytes, and that’s it. If you’re fully committed to go bindless, we could just do the equivalent of SM 6.6 ResourceDescriptorHeap and SamplerDescriptorHeap and buffer_device_address to get everything done. However, Granite is still a good old slot based system, so I need to use the mapping features to tell the driver how to translate set/binding into actual descriptors. This mapping can be different per-shader too, which fixes a lot of really annoying problems with EXT_graphics_pipeline_library and EXT_shader_object if I feel like going down that path in the future. The natural thing to do for me was to split up the space into maximum 128 byte push constants, then 32 bytes per descriptor set (I support 4 sets, Vulkan 1.0 min-spec). It’s certainly possible to parcel out the data more intelligently, but that causes some issues with set compatibility which I don’t want to deal with. For every set, I split it up into buffers and images and decide on a strategy for each. Buffers are decided first since they have the largest impact on performance in my experience. This is very simple. If there are 3 or fewer buffers in a set (24 bytes), we can just stuff the raw pointers into push data and tell the driver to use that pointer. This is D3D12 root descriptors in a nutshell. Especially for UBOs, this is very handy for performance. We lose robustness here, but I never rely on buffer robustness anyway. The push data layout looks something like this: This is a new Vulkan speciality. Without modifying the shaders, we can tell the driver to load a buffer device address from a pointer in push data instead. This way we don’t have to allocate from the descriptor heap itself, we can just do a normal linear UBO allocation, write some VkDeviceAddresses in there and have fun. Given the single indirection to load the “descriptor” here, this looks a lot like Vulkan 1.0 descriptor sets, except there’s no API necessary to write them. This isn’t the ideal path, but sometimes we’re forced to allocate from the heap. This can happen if we have one of these cases: This is a pretty much D3D12’s root tables, but in Vulkan we can be a bit more optimal with memory since buffer descriptors tend to be smaller than image descriptors and we can pack them tightly. D3D12 has one global stride for any resource descriptor while Vulkan exposes separate sizes that applications can take advantage of. vkWriteResourceDescriptorsEXT is required here to write the SSBO descriptors. After buffers are parceled out for a descriptor set, we have some space left for images. At minimum, we have 8 bytes left (32 – 3 * sizeof(VkDeviceAddress)). This is the common and ideal case. If we don’t have any arrays of images, we can just have a bunch of uint32_t indices directly into the heap. At image view and buffer view creation time, we already allocated a persistent index into the heap that we can refer to. No API calls required when emitting commands. Combined image samplers work quite well in this model, because Vulkan adds a special mapping mode that packs both sampler index and the image index together. This fixes one of the annoying issues in EXT_descriptor_buffer. If we cannot use the simple inline indices, we have two options. The preferred one right now is to just allocate space in the descriptor heap just like the descriptor buffer path, because I’m quite concerned with unnecessary indirections when possible. At least we get to copy the payloads around without API commands. This path is also used for bindless sets. Unlike the descriptor buffer path, there is a major problem which is that linearly allocating from the sampler heap is not viable. The sampler heap is really small now just like in D3D12. In this case, Vulkan has an answer. This is a special Vulkan feature that functions like an indirect root table. This one is similar to INDIRECT_ADDRESS in that we don’t have to allocate anything from the heap directly and we can just stuff heap indices straight into a UBO. Overall, I think these new mapping types allows us to reuse old shaders quite effectively and it’s possible to start slowly rewriting shaders to take full advantage of descriptor_heap once this machinery is in place. For GPU performance, it seemed to be on-par with the other descriptor models on NVIDIA and AMD which was expected. Granite does not really hit the cases where descriptor_heap should meaningfully improve GPU performance over descriptor_buffer, but I only did a rough glance. For CPU performance, things were a bit more interesting, and I learned that Granite has quite significant overhead on its own, which is hardly surprising. That’s the cost of an old school slot and binding model after all, and I never did a serious optimization pass over it. A more forward looking rendering abstraction can eliminate most, if not all this overhead. The numbers here are for RADV, but it’s using the pending merge request for descriptor_heap support. – ~27 us to write 4096 image descriptors on a Ryzen 3950x with a RX 6800. This is basically exactly the same. ~13 us. This is really just a push_back and memcpy bench at this point. This case hits the optimal inline BDA case for heap. ~ 279 ns per dispatch. Doesn’t feel very impressive. Basically same perf, but lots of overhead has now shifted over to Granite. Certainly things can be optimized further. GetDescriptorEXT is somehow much faster than UpdateDescriptorSetWithTemplate though. ~ 157 ns / dispatch now, and most of the overhead is now in Granite itself, which is ideal. I added an extra buffer descriptor per set which hits the INDIRECT_ADDRESS path. Heap regressed significantly, but it’s all in Granite code at least. Likely related having to page in new UBO blocks, but I didn’t look too closely. ~ 375 ns / dispatch, hnnnnnng. The other paths don’t change much as is expected. About ~ 310 ns / dispatch for legacy and descriptor buffer models. This is the happy path for descriptor heap. ~ 161 ns / dispatch ~ 166 ns. Quite interesting that it got slower. The slab allocator for legacy sets seems to be doing its job very well. The actual descriptor copying vanished from the top list at least. ~ 145 ns. A very modest gain, and most of the overhead is now just Granite jank. All the paths look very similar now. ~ 170 ns or so. On RTX 4070 with 595 drivers. The improvements especially for buffers is quite large on NV, interestingly enough. For the legacy buffer tests, it’s heavily biased towards driver overhead: For the image tests the gains are modest, which is somewhat expected given how NV implements image descriptors before descriptor heap. It’s just some trivial u32 indices. Overall, it’s interesting how well the legacy Vulkan 1.0 model holds up here, at least on RADV on my implementation. Descriptor buffer and heap cannot truly shine unless the abstraction using it is written with performance in mind. This sentiment is hardly new. Just porting OpenGL-style code over to Vulkan doesn’t give amazing gains, just like porting old and crusty binding models won’t magically perform with newer APIs either. Either way, this level of performance is good enough for my needs, and the days of spamming out 100k draw calls is kinda over anyway, since it’s all GPU driven with large bindless data sets these days. Adding descriptor buffer and heap support to Granite was generally motivated by curiosity rather than a desperate need for perf, but I hope this post serves as an example of what can be done. There’s a lot of descriptor heap that hasn’t been explored here. GPU performance for heavily bindless workloads is another topic entirely, and I also haven’t really touched on how it would be more practical to start writing code like: which would side-step almost all Granite overhead. Overall I quite like what we’ve got now with descriptor heap as an API, a bastard child of descriptor buffer and D3D12 that gets the job done. As tooling and driver support matures, I will likely just delete the descriptor buffer path, keeping the legacy stuff around for compatibility. VkDescriptorSet VkDescriptorPool vkUpdateDescriptorSets (kinda) VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE VK_DESCRIPTOR_TYPE_STORAGE_IMAGE VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT VK_DESCRIPTOR_TYPE_SAMPLER VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER VK_DESCRIPTOR_TYPE_STORAGE_BUFFER VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR You very quickly end up having to call CopyDescriptorsSimple a LOT to shuffle descriptors into the heap. Since this is a call into the driver just to copy a few bytes around, it can quickly be a source of performance issues. In vkd3d-proton, we went to hell and back to optimize this case because in many titles, it was the number 1 performance overhead. Dealing with samplers is a major pain. The 2K sampler heap limit can be rather limiting since there is no good way to linearly allocate on such a small heap. Static samplers are quite common as a result, but they have other problems. Recompiling shaders because you change Aniso 4x to 8x in the settings menu is kinda a hilarious situation to be in, but some games have been known to do just that … The shader is using OpArrayLength on an SSBO. We need real descriptors in this case. The current implementation just scans the SPIR-V shader module for this instruction, but could be improved in theory. The shader is using an array of descriptors. For buffers, this should be very rare, but the PUSH_ADDRESS and INDIRECT_ADDRESS interfaces do not support this. Robustness is enabled. Test #1: Write 4096 image descriptors: 17.6 us (copies u32 indices) Test #2: 693 ns Test #3: 726 ns Test #4: 377 ns Test #5: 408 ns Test #1: 10.2 us (copies u32 indices) Test #2: 434 ns Test #3: 479 ns Test #4: 307 ns Test #5: 315 ns Test #1: 11 us (copies real 32 byte descriptors) Test #2: 389 ns Test #3: 405 ns Test #4: 321 ns Test #5: 365 ns

0 views
DHH 3 weeks ago

Basecamp becomes agent accessible

In the past 18 months, we've experimented with a ton of AI-infused features at 37signals. Fizzy had all sorts of attempts. As did Basecamp. But as Microsoft and many others have realized, it's not that easy to make something that's actually good and would welcomed by users. So we didn't ship. In the meantime, agents have emerged has the killer app for AI. Not only are LLMs much smarter when they can check their thinking using tools, but the file system also gives them the memory implant they needed to learn between prompts. And now they can actually do stuff! So while we keep cooking on actually-useful native AI features in Basecamp, we're launching a fully agent-accessible version today. We've revamped our API, created a brand-new CLI, and wrapped it all in a skill to teach agents how best to use it all. It works remarkably well, and it's really fast too. Not only can you have your agent look through everything in Basecamp, summarize whatever you need, but it can also set up to-do lists, post message updates, chat with humans and clankers alike, upload reference files, and arrange a project schedule. Anything you can do in Basecamp, agents can now do too. This becomes extra powerful when you combine Basecamp with all the other tools you might be using that are also agent accessible. For software development, you can use the MCP from Sentry to trawl through major sources of bugs, then have the agent summarize that in a message for Basecamp. Or you have it download, analyze, and highlight key customer complaints by giving it access to your help desk system. All this was possible in the past with APIs, hand-written integrations, and human data scientists. But it was cumbersome, slow, and expensive, so most people just didn't. A vanishingly small portion of Basecamp customers have ever directly interacted with our API. But agents? I think adoption is going to be swift. Not because everyone is going to run OpenCode, Claude Code, or Gemini CLI. But because agents are going to be incorporated into ChatGPT, Gemini, Grok, and all the other mainstream interfaces who were collectively embarrassed by OpenClaw's meteoric ascent  and popularity very quickly. There's a huge demand out there for a personal agent that can act as your private executive assistant. This is where the puck is going, and we're skating to meet it with agent accessibility across the board. Basecamp is first, Fizzy is next, and we'll hit HEY before long too. Revamped APIs, comprehensive CLIs, and the skills to use them whatever your harness or claws look like.

0 views
Dominik Weber 1 months ago

Lighthouse update February 23rd

During the past week a couple of nice improvements happened. **Finally implemented a 2 week trial without requiring a credit card** Every user now gets the trial by default. This is a nice improvement because, from what I can observe, in B2C most people want to test the product before entering their credit card. It was also a good step to a better first product experience. **Finished the website to feed feature** The last remaining task was automated finding of items. When you enter a website, it automatically checks it and tries to find relevant items. If items are found, they are highlighted and the selectors added, without users having to do anything. **Updated blogroll editor** This is a small free tool on the Lighthouse website. It's for creating collections of feeds, websites, and newsletters. For a long time I wanted to create collections for specific areas, for example company engineering blogs, AI labs, JavaScript ecosystem, and so on. The reworked blogroll editor makes that much simpler to do. ## Next steps An issue that became important is feed URLs being behind bot protection. It doesn't really make sense to be configured that way, because feed URLs are designed to be accessed by bots, but in some cases it may be difficult to configure properly. This affects only for a small number of feeds, but it's enough to be noticable. It prevents people from moving to Lighthouse from other services. Consequently, one of the next tasks is to fix this. Besides that, the first user experience continues to be an ongoing area of improvement. I have a couple of ideas on how to make it better, and will continuously work on it.

0 views
Simon Willison 1 months ago

Two new Showboat tools: Chartroom and datasette-showboat

I introduced Showboat a week ago - my CLI tool that helps coding agents create Markdown documents that demonstrate the code that they have created. I've been finding new ways to use it on a daily basis, and I've just released two new tools to help get the best out of the Showboat pattern. Chartroom is a CLI charting tool that works well with Showboat, and datasette-showboat lets Showboat's new remote publishing feature incrementally push documents to a Datasette instance. I normally use Showboat in Claude Code for web (see note from this morning ). I've used it in several different projects in the past few days, each of them with a prompt that looks something like this: Here's the resulting document . Just telling Claude Code to run is enough for it to learn how to use the tool - the help text is designed to work as a sort of ad-hoc Skill document. The one catch with this approach is that I can't see the new Showboat document until it's finished. I have to wait for Claude to commit the document plus embedded screenshots and push that to a branch in my GitHub repo - then I can view it through the GitHub interface. For a while I've been thinking it would be neat to have a remote web server of my own which Claude instances can submit updates to while they are working. Then this morning I realized Showboat might be the ideal mechanism to set that up... Showboat v0.6.0 adds a new "remote" feature. It's almost invisible to users of the tool itself, instead being configured by an environment variable. Set a variable like this: And every time you run a or or or command the resulting document fragments will be POSTed to that API endpoint, in addition to the Showboat Markdown file itself being updated. There are full details in the Showboat README - it's a very simple API format, using regular POST form variables or a multipart form upload for the image attached to . It's simple enough to build a webapp to receive these updates from Showboat, but I needed one that I could easily deploy and would work well with the rest of my personal ecosystem. So I had Claude Code write me a Datasette plugin that could act as a Showboat remote endpoint. I actually had this building at the same time as the Showboat remote feature, a neat example of running parallel agents . datasette-showboat is a Datasette plugin that adds a endpoint to Datasette for viewing documents and a endpoint for receiving updates from Showboat. Here's a very quick way to try it out: Click on the sign in as root link that shows up in the console, then navigate to http://127.0.0.1:8001/-/showboat to see the interface. Now set your environment variable to point to this instance: And run Showboat like this: Refresh that page and you should see this: Click through to the document, then start Claude Code or Codex or your agent of choice and prompt: The command assigns a UUID and title and sends those up to Datasette. The best part of this is that it works in Claude Code for web. Run the plugin on a server somewhere (an exercise left up to the reader - I use Fly.io to host mine) and set that environment variable in your Claude environment, then any time you tell it to use Showboat the document it creates will be transmitted to your server and viewable in real time. I built Rodney , a CLI browser automation tool, specifically to work with Showboat. It makes it easy to have a Showboat document load up web pages, interact with them via clicks or injected JavaScript and captures screenshots to embed in the Showboat document and show the effects. This is wildly useful for hacking on web interfaces using Claude Code for web, especially when coupled with the new remote publishing feature. I only got this stuff working this morning and I've already had several sessions where Claude Code has published screenshots of its work in progress, which I've then been able to provide feedback on directly in the Claude session while it's still working. A few days ago I had another idea for a way to extend the Showboat ecosystem: what if Showboat documents could easily include charts? I sometimes fire up Claude Code for data analysis tasks, often telling it to download a SQLite database and then run queries against it to figure out interesting things from the data. With a simple CLI tool that produced PNG images I could have Claude use Showboat to build a document with embedded charts to help illustrate its findings. Chartroom is exactly that. It's effectively a thin wrapper around the excellent matplotlib Python library, designed to be used by coding agents to create charts that can be embedded in Showboat documents. Here's how to render a simple bar chart: It can also do line charts, bar charts, scatter charts, and histograms - as seen in this demo document that was built using Showboat. Chartroom can also generate alt text. If you add to the above it will output the alt text for the chart instead of the image: Or you can use or to get the image tag with alt text directly: I added support for Markdown images with alt text to Showboat in v0.5.0 , to complement this feature of Chartroom. Finally, Chartroom has support for different matplotlib styles . I had Claude build a Showboat document to demonstrate these all in one place - you can see that at demo/styles.md . I started the Chartroom repository with my click-app cookiecutter template, then told a fresh Claude Code for web session: We are building a Python CLI tool which uses matplotlib to generate a PNG image containing a chart. It will have multiple sub commands for different chart types, controlled by command line options. Everything you need to know to use it will be available in the single "chartroom --help" output. It will accept data from files or standard input as CSV or TSV or JSON, similar to how sqlite-utils accepts data - clone simonw/sqlite-utils to /tmp for reference there. Clone matplotlib/matplotlib for reference as well It will also accept data from --sql path/to/sqlite.db "select ..." which runs in read-only mode Start by asking clarifying questions - do not use the ask user tool though it is broken - and generate a spec for me to approve Once approved proceed using red/green TDD running tests with "uv run pytest" Also while building maintain a demo/README.md document using the "uvx showboat --help" tool - each time you get a new chart type working commit the tests, implementation, root level README update and a new version of that demo/README.md document with an inline image demo of the new chart type (which should be a UUID image filename managed by the showboat image command and should be stored in the demo/ folder Make sure "uv build" runs cleanly without complaining about extra directories but also ensure dist/ and uv.lock are in gitignore This got most of the work done. You can see the rest in the PRs that followed. The Showboat family of tools now consists of Showboat itself, Rodney for browser automation, Chartroom for charting and datasette-showboat for streaming remote Showboat documents to Datasette. I'm enjoying how these tools can operate together based on a very loose set of conventions. If a tool can output a path to an image Showboat can include that image in a document. Any tool that can output text can be used with Showboat. I'll almost certainly be building more tools that fit this pattern. They're very quick to knock out! The environment variable mechanism for Showboat's remote streaming is a fun hack too - so far I'm just using it to stream documents somewhere else, but it's effectively a webhook extension mechanism that could likely be used for all sorts of things I haven't thought of yet. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Showboat remote publishing datasette-showboat How I built Chartroom The burgeoning Showboat ecosystem

0 views
Rob Zolkos 3 months ago

So where can we use our Claude subscription then?

There’s been confusion about where we can actually use a Claude subscription. This comes after Anthropic took action to prevent third-party applications from spoofing the Claude Code harness to use Claude subscriptions. The information in this post is based on my understanding from reading various tweets, official GitHub repos and documentation (some of which may or may not be up to date). I will endeavour to keep it up to date as new information becomes available. I would love to see Anthropic themselves maintain an easily parsable page like this that shows what is and is not permitted with a Claude subscription. We've taken action to prevent third-party clients from spoofing the Claude Code agent harness to use consumer subscriptions. Consumer subscriptions and their benefits should only be used in the Anthropic experiences they support (Claude Code CLI, Claude Code web, and via sessionKey in the Agent SDK). Third-party apps can use the API. From what I can gather, consumer subscriptions work with official Anthropic tools, not third-party applications. If you want third-party integrations, you need the API. The consumer applications (desktop and mobile) are the most straightforward way to use your Claude subscription. Available at claude.com/download , these apps give you direct access to Claude for conversation, file uploads, and Projects. The official command-line interface for Claude Code is fully supported with Claude subscriptions. This is the tool Anthropic built and maintains specifically for developers who want to use Claude in their development workflow. You get the full power of Claude integrated into your terminal, with access to your entire codebase, the ability to execute commands, read and write files, and use all the specialized agents that come with Claude Code. The web version of Claude Code (accessible through your browser at claude.ai/code) provides the same capabilities as the CLI but through a browser interface. Upload your project files, or point it at a repository, and you can work with Claude on your codebase directly. Want to experiment with building custom agents? The Claude Agent SDK lets you develop and test specialized agents powered by your Claude subscription for personal development work. The SDK is available in both Python and TypeScript , with documentation here . This is for personal experiments and development. For production deployments of agents, use the API instead of your subscription. You can use your Claude subscription to run automated agents in GitHub Actions. The Claude Code Action lets you set up workflows that leverage Claude for code review, documentation generation, or automated testing analysis. Documentation is here . Any other uses of Claude would require the use of API keys. Your Claude subscription gives you: Let me know if you have any corrections. Claude desktop and mobile apps for general use Claude Code CLI for terminal-based development Claude Code on the web for browser-based work The ability to build custom agents through the official SDK (for personal development) Claude Code GitHub Action for CI/CD integration

1 views
Rob Zolkos 3 months ago

A Month Exploring Fizzy

In their book Getting Real , 37signals talk about Open Doors — the idea that you should give customers access to their data through RSS feeds and APIs. Let them get their information when they want it, how they want it. Open up and good things happen. Fizzy takes that seriously. When 37signals released Fizzy with its full git history available , they didn’t just open-source the code — they shipped a complete API and webhook system too. The doors were wide open baby! So I dove in — reading the source, building tools, and sharing what I found. Every time curiosity kicked in, there was a direct path from “I wonder if…” to something I could actually try and execute. This post is a catch-all for my very bubbly month of December. Fizzy Webhooks: What You Need to Know — I set up a local webhook receiver to capture and document every event type Fizzy sends. The post covers the payload structures, signature verification, and ideas for what you could build on top of the webhook system. The Making of Fizzy, Told by Git — I prompted Claude Code to analyze the entire git history and write a documentary about the development. Vanilla CSS is all you need — Diving into the no-build CSS architecture across Campfire, Writebook, and Fizzy. Fizzy Design Evolution: A Flipbook from Git — I went through each day of commits, got the application to a bootable state, seeded the database, and took a screenshot. Then I stitched those screenshots into a flipbook video with a soundtrack made from Fizzy’s own audio files. Fizzy’s Pull Requests: Who Built What and How — An analysis of who owned which domains in the Fizzy codebase. The post maps contributors to their expertise areas and curates learning paths through the PRs for topics like Turbo/Hotwire, caching, AI integration, multi-tenancy, and webhooks. The open API invited experimentation. I spotted gaps that would make integration easier for other developers, so I filled them: fizzy-api-client — Ruby client for the Fizzy API. fizzy-client-python — Python client for the Fizzy API. fizzy-cli — Command-line interface for the Fizzy API, built first in Ruby and then migrated to Go for portability. fizzy-skill — An AI agent skill for interacting with Fizzy. n8n-nodes-fizzy — An n8n community node that brings Fizzy into your automation workflows. Create cards, manage assignments, and react to real-time events through webhook triggers. Migration tools — I built these to make it easier to try Fizzy without starting from scratch. Migrating your existing issues and boards gives you an immediate sense of how it could work for you, without having to manually create test cards. You can see your real data running in Fizzy from day one, which I think makes it easier to evaluate and decide if its useful for you. I also contributed a few small fixes back to the main repository: Fizzy is released under the O’Saasy License , which is similar in spirit to MIT but includes a restriction on offering the software as a competing hosted or SaaS product. You can modify and self-host it, but you can’t repackage it and sell it as your own hosted service. I built O’Saasy Directory to make it easy to find applications released under this license. Beyond Fizzy, the directory includes other submitted projects where the source is available to read and modify. If you have built something under the O’Saasy License, visit the submission page to add yours. Having built the Fizzy CLI and fizzy-api-client Rubygem, I saw some fun opportunities to build little lab experiments to show how Fizzy could be integrated with - both to power up some functionality that isn’t there yet, but also creating boards in some interesting ways (eg Movie Quiz). I got the idea for this on a flight to Australia with no internet. Just a pad of paper and a pen. I should probably do that more often as a bunch of ideas for all sorts of products came out. CarbonationLabs is not a product per se. It’s an open source Rails application designed to be run locally where you can interact with the hosted or self-hosted versions of Fizzy. If anything I hope it inspires creation of little problem solving workflows for Fizzy that wouldn’t be built into the main product (the problem is too niche). The API and webhook system is really flexible and most of your bespoke problems could be solved with some creative thinking. Introducing Carbonation Labs - fun ways to add experiments to and extend Fizzy (repo link and demo videos below)🧵 I built carbonation.dev to bring together all the tools, libraries, and integrations that I and others in the community have created for Fizzy. It’s a directory covering API clients (Ruby, Python, JavaScript), CLI tools with packages for macOS, Arch Linux, Debian, Fedora, and Windows, integrations for Claude Code and other AI agents, n8n, Raycast, Telegram, and MCP servers, plus migration tools for GitHub, Linear, Asana, and Jira. If you’ve built something for Fizzy, I’d love to feature it. You can submit a pull request to add your tool to the directory. Building the Fizzy CLI pushed me into some new territory. I created an AUR package for Arch Linux users, set up a Homebrew tap for macOS, published my first Python package to PyPI, and made an n8n plugin — all firsts for me. While I already knew Go, rewriting the CLI in it was a fun exercise, and building TUIs for the setup and skill commands introduced me to terminal UI libraries I hadn’t used before. Gosh it was fun! If you want to get better at Rails, Fizzy is a great place to study real-world code. And in my view if you want to work at 37signals as a Rails programmer, digging into Fizzy — along with Campfire and Writebook — is a solid way to learn how they approach Rails architecture and design decisions. Submitting PRs is also a good way to contribute back while learning — just be respectful of the contribution policy . The review discussions give you a window into how to reason about problems, spot opportunities, and make trade-offs. This month pushed parts of my creative thinking that weren’t gone, but definitely weren’t being stressed. Like any muscle, use it or lose it. The direction of what to explore came from my own curiosity and a habit of poking around under the hood, and AI helped me move a lot faster once I knew where I wanted to go. Most of this information already exists somewhere — Google, Stack Overflow, documentation — but having AI right there alongside me as a partner was thrilling. All of this was made possible because a team left the doors open. No one asked me to step inside; I decided to invest the time and do the work to see what I could build, learn and share. I do this at work too—when I can—looking for opportunities I can shape, experiment with, and get genuinely excited about. Most importantly I had fun and I hope you enjoyed following along. linear2fizzy — Migrate Linear issues jira2fizzy — Migrate JIRA issues asana2fizzy — Migrate Asana tasks gh2fizzy — Migrate GitHub Issues prd2fizzy — Convert PRDs to Fizzy cards #2114 — Remove unused install.svg and its CSS class #2111 — Remove unpaired view-transition-name #2095 — Fix typo: minues → minutes #2094 — Fix duplicate word: use use → use #2093 — Add QrCodesController test #2088 — Fix view-transition-name typo in public card show

0 views
Tenderlove Making 3 months ago

Pixoo64 Ruby Client

I bought a Pixoo64 LED Display to play around with, and I love it! It connects to WiFi and has an on-board HTTP API so you can program it. I made a Ruby client for it that even includes code to convert PNG files to the binary format the sign wants. One cool thing is that the display can be configured to fetch data from a remote server, so I configured mine to fetch PM2.5 and CO2 data for my office. Here’s what it’s looking like so far: Yes, this is how I discovered I need to open a window 😂

0 views

Streamlinear, a new MCP for Linear

I've been using Linear as the project and issue tracking tool on a new project. No wait, that's not quite right. My AI coding agents have been using Linear as the project and issue tracking tool on a new project. I've opened Linear's web interface...twice? And I'm pretty sure I've logged into the mobile client. But Claude and friends? They use Linear every day. To date, I've been using the first-party Linear MCP and a third party one that I'd found before Anthropic started publishing an "official" Linear plugin in partnership with Linear. It works great. There's just one problem. The official Linear MCP has 25 tools, using a total of 19,659 tokens of context on every single session. The third-party MCP is a little slimmer at 17k and change. But that's still nearly 10% of the full context window. For every context window. This morning, after breakfast, I sat down and started chatting with Claude about what a better Linear tool might look like. We discussed just using a unix commandline tool. We discussed using a unix commandline tool + a skill . We discussed a Skill + a single-tool MCP client that was just a pure GraphQL client. I asked Claude to read my blog post on MCP design . We ended up with something nice and streamlined. It totals out at 975 tokens, including instructions for how to learn more about how to use the tool. I ended up talking Claude into making the MCP fully self-documenting by including a 'help' action. We ended up compromising on tool design. Claude really thought that it would be fine always reading the instructions and just using raw GraphQL for everything. I overruled it and decided that the most common operations (working with tickets) merited first-class actions. Everything else is GraphQL backed up by the 'help' action. It's called Streamlinear . Ultimately, I'm responsible for the name. I didn't say no. I asked Claude to come up with a list of punny names. Everything else it suggested was being used for a Linear client already. I asked Claude to talk about the new tool and what it's like: This is the tool loadout for the 'official' MCP: And this is what Streamlinear looks like: Give it a spin and let me know how it goes.

0 views
Simon Willison 4 months ago

GPT-5.2

OpenAI reportedly declared a "code red" on the 1st of December in response to increasingly credible competition from the likes of Google's Gemini 3. It's less than two weeks later and they just announced GPT-5.2 , calling it "the most capable model series yet for professional knowledge work". The new model comes in two variants: GPT-5.2 and GPT-5.2 Pro. There's no Mini variant yet. GPT-5.2 is available via their UI in both "instant" and "thinking" modes, presumably still corresponding to the API concept of different reasoning effort levels. The knowledge cut-off date for both variants is now August 31st 2025 . This is significant - GPT 5.1 and 5 were both Sep 30, 2024 and GPT-5 mini was May 31, 2024. Both of the 5.2 models have a 400,000 token context window and 128,000 max output tokens - no different from 5.1 or 5. Pricing wise 5.2 is a rare increase - it's 1.4x the cost of GPT 5.1, at $1.75/million input and $14/million output. GPT-5.2 Pro is $21.00/million input and a hefty $168.00/million output, putting it up there with their previous most expensive models o1 Pro and GPT-4.5. So far the main benchmark results we have are self-reported by OpenAI. The most interesting ones are a 70.9% score on their GDPval "Knowledge work tasks" benchmark (GPT-5 got 38.8%) and a 52.9% on ARC-AGI-2 (up from 17.6% for GPT-5.1 Thinking). The ARC Prize Twitter account provided this interesting note on the efficiency gains for GPT-5.2 Pro A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year GPT-5.2 can be accessed in OpenAI's Codex CLI tool like this: There are three new API models: OpenAI have published a new GPT-5.2 Prompting Guide . One note from the announcement that caught my eye: GPT‑5.2 Thinking is our strongest vision model yet, cutting error rates roughly in half on chart reasoning and software interface understanding. I had dissapointing results from GPT-5 on an OCR task a while ago. I tried it against GPT-5.2 and it did much better: Here's the result from that, which cost 1,520 input and 1,022 for a total of 1.6968 cents . For my classic "Generate an SVG of a pelican riding a bicycle" test: And for the more advanced alternative test, which tests instruction following in a little more depth: You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . gpt-5.2-chat-latest - the model used by ChatGPT gpt-5.2-pro

4 views
Ankur Sethi 4 months ago

Getting a Gemini API key is an exercise in frustration

Last week, I started working on a new side-project. It’s a standard React app partly made up of run-of-the-mill CRUD views—a perfect fit for LLM-assisted programming. I reasoned that if I could get an LLM to quickly write the boring code for me, I’d have more time to focus on the interesting problems I wanted to solve. I've pretty much settled on Claude Code as my coding assistant of choice, but I'd been hearing great things about Google's Gemini 3 Pro. Despite my aversion to Google products, I decided to try it out on my new codebase. I already had Gemini CLI installed, but that only gave me access to Gemini 2.5 with rate limits. I wanted to try out Gemini 3 Pro, and I wanted to avoid being rate limited. I had some spare cash to burn on this experiment, so I went looking for ways to pay for a Gemini Pro plan, if such a thing existed. Thus began my grand adventure in trying to give Google my money. The name “Gemini” is so overloaded that it barely means anything. Based on the context, Gemini could refer to: To make things even more confusing, Google has at least three different products just for agentic coding: Gemini Code Assist (Gemini CLI is a part of this suite of products), Jules , and Antigravity . And then there’s a bunch of other GenAI stuff that is powered by Gemini but doesn’t have the word Gemini in the name: Vertex AI Platform , Google AI Studio , NotebookLM , and who knows what else. I just wanted to plug my credit card information into a form and get access to a coding assistant. Instead, I was dunked into an alphabet soup of products that all seemed to do similar things and, crucially, didn’t have any giant “Buy Now!” buttons for me to click. In contrast, both Anthropic and OpenAI have two primary ways you can access their products: via their consumer offerings at claude.ai and chatgpt.com respectively, or via API credits that you can buy through their respective developer consoles . In each case, there is a form field where you can plug in your credit card details, and a big, friendly “Buy Now!” button to click. After half an hour of searching the web, I did the obvious thing and asked the free version of Gemini (the chatbot, not one of those other Geminis) what to do: How do I pay for the pro version of Gemini so i can use it in the terminal for writing code? I specifically want to use the Gemini 3 Pro model. It thought for a suspiciously long time and told me that Gemini 3 Pro required a developer API key to use. Since the new model is still in preview, it's not yet available on any of the consumer plans. When I asked follow up questions about pricing, it told me that "Something went wrong”. Which translates to: we broke something, but we won’t tell you how to fix it. So I asked Claude for help. Between the two LLMs, I was able to figure out how to create an API key for the Gemini I wanted. Google AI Studio is supposed to be the all-in-one dashboard for Google’s generative AI models. This is where you can experiment with model parameters, manage API keys, view logs, and manage billing for your projects. I logged into Google AI Studio and created a new API key . This part was pretty straightforward: I followed the on-screen instructions and had a fresh new key housed under a project in a few seconds. I then verified that my key was working with Gemini CLI. It worked! Now all that was left to do was to purchase some API credits. Back in Google AI Studio, I saw a link titled “Set up billing” next to my key. It looked promising, so I clicked it. That’s where the fun really began. The “Set up billing” link kicked me out of Google AI Studio and into Google Cloud Console, and my heart sank. Every time I’ve logged into Google Cloud Console or AWS, I’ve wasted hours upon hours reading outdated documentation, gazing in despair at graphs that make no sense, going around in circles from dashboard to dashboard, and feeling a strong desire to attain freedom from this mortal coil. Turns out I can’t just put $100 into my Gemini account. Instead, I must first create a Billing Account. After I've done that, I must associate it with a project. Then I’m allowed to add a payment method to the Billing Account. And then , if I’m lucky, my API key will turn into a paid API key with Gemini Pro privileges. So I did the thing. The whole song and dance. Including the mandatory two-factor OTP verification that every Indian credit card requires. At the end of the process, I was greeted with a popup telling me I had to verify my payment method before I’d be allowed to use it. Wait. Didn’t I just verify my payment method? When I entered the OTP from my bank? Nope, turns out Google hungers for more data. Who'd have thunk it? To verify my payment method for reals , I had to send Google a picture of my government-issued ID and the credit card I’d just associated with my Billing Account. I had to ensure all the numbers on my credit card were redacted by manually placing black bars on top of them in an image editor, leaving only my name and the last four digits of the credit card number visible. This felt unnecessarily intrusive. But by this point, I was too deep in the process to quit. I was invested. I needed my Gemini 3 Pro, and I was willing to pay any price. The upload form for the government ID rejected my upload twice before it finally accepted it. It was the same exact ID every single time, just in different file formats. It wanted a PNG file. Not a JPG file, nor a PDF file, but a PNG file. Did the upload form mention that in the instructions? Of course not. After jumping through all these hoops, I received an email from Google telling me that my verification will be completed in a few days. A few days ? Nothing to do but wait, I suppose. At this point, I closed all my open Cloud Console tabs and went back to work. But when I was fifteen minutes into writing some code by hand like a Neanderthal, I received a second email from Google telling me that my verification was complete. So for the tenth time that day, I navigated to AI Studio. For the tenth time I clicked "Set up billing" on the page listing my API keys. For the tenth time I was told that my project wasn't associated with a billing account. For the tenth time I associated the project with my new billing account. And finally, after doing all of this, the “Quota tier” column on the page listing my API keys said “Tier 1” instead of “Set up billing”. Wait, Tier 1? Did that mean there were other tiers? What were tiers, anyway? Was I already on the best tier? Or maybe I was on the worst one? Not important. The important part was that I had my API key and I'd managed to convince Google to charge me for it. I went back to the Gemini CLI, ran the command, and turned on the "Enable experimental features" option. I ran the command, which told me that Gemini 3 Pro was now available. Success? Not yet. When I tried sending a message to the LLM, it failed with this 403 error: Is that JSON inside a string inside JSON? Yes. Yes it is. To figure out if my key was even working, I tried calling the Gemini API from JavaScript, reproducing the basic example from Google’s own documentation . No dice. I ran into the exact same error. I then tried talking to Gemini 3 Pro using the Playground inside Google AI Studio. It showed me a toast message saying The chat transcript said At this point I gave up and walked away from my computer. It was already 8pm. I’d been trying to get things to work since 5pm. I needed to eat dinner, play Clair Obscur , and go to bed. I had no more time to waste and no more fucks to give. Just as I was getting into bed, I received an email from Google with this subject line: Your Google Cloud and APIs billing account XXXXXX-XXXXXX-XXXXXX is in good standing at this time. With the message inside saying: Based on the information you provided and further analysis by Google, we have reinstated your billing account XXXXXX-XXXXXX-XXXXXX. Your account is in good standing, and you should now have full access to your account and related Project(s) and Service(s). I have no idea what any of this means, but Gemini 3 Pro started working correctly after I received this email. It worked in the Playground, directly by calling the API from JavaScript, and with Gemini CLI. Problem solved, I guess. Until Google mysteriously decides that my account is no longer in good standing. This was such a frustrating experience that I still haven't tried using Gemini with my new codebase, nearly a week after I made all those sacrifices to the Gods of Billing Account. I understand why the process for getting a Gemini API key is so convoluted. It’s designed for large organizations, not an individual developers trying to get work done; it serves the bureaucracy, not the people doing the work; it’s designed for maximum compliance with government regulations, not for efficiency or productivity. Google doesn’t want my money unless I’m an organization that employs ten thousand people. In contrast to Google, Anthropic and OpenAI are much smaller and much more nimble. They’re able to make the process of setting up a developer account quick and easy for those of us who just want to get things done. Unlike Google, they haven’t yet become complacent. They need to compete for developer mindshare if they are to survive a decade into the future. Maybe they'll add the same level of bureaucracy to their processes as they become larger, but for now they're fairly easy to deal with. I’m still going to try using Gemini 3 Pro with Gemini CLI as my coding assistant, but I’ll probably cap the experiment to a month. Unless Gemini 3 Pro is a massive improvement over its competitors, I’ll stick to using tools built by organizations that want me as a customer. The chatbot available at gemini.google.com . The mobile app that lets you use the same Gemini chatbot on your iPhone or Android . The voice assistant on Android phones. The AI features built into Google Workspace , Firebase, Colab, BigQuery, and other Google products. Gemini CLI, an agentic coding tool for your terminal that works the same way as Claude Code or OpenAI Codex. The Gemini Code Assist suite of products, which includes extensions for various IDEs, a GitHub app, and Gemini CLI. The underlying LLM powering all these products. Probably three more products by the time I finish writing this blog post.

1 views
Rob Zolkos 4 months ago

Fizzy Webhooks: What You Need to Know

Fizzy is a new issue tracker ( source available ) from 37signals with a refreshingly clean UI. Beyond looking good, it ships with a solid webhook system for integrating with external services. For most teams, webhooks are the bridge between the issues you track and the tools you already rely on. They let you push events into chat, incident tools, reporting pipelines, and anything else that speaks HTTP. If you are evaluating Fizzy or planning an integration, understanding what these webhooks can do will save you time. I also put together a short PDF with the full payload structure and example code, which I link at the end of this post if you want to go deeper. Here are a few ideas for things you could build on top of Fizzy’s events: If you want to go deeper, you can also build more opinionated tools that surface insights and notify people who never log in to Fizzy: Here is how to set it up. Step 1. Visit a board and click the Webhook icon in the top right. Step 2. Give the webhook a name and the payload URL and select the events you want to be alerted to. Step 3. Once the webhook saves you will see a summary of how it is setup and most importantly the webhook secret which you will need for your handler for securing the webhook. There is also a handy event log showing you when an event was delivered. Since I like to tinker with these sorts of things, I built a small webhook receiver to capture and document the payload structures. Fizzy sends HTTP POST requests to your configured webhook URL when events occur. Each request includes an header containing an HMAC-SHA256 signature of the request body. The verification process is straightforward: Fizzy covers the essential card lifecycle events: The approach was straightforward: I wrote a small Ruby script using WEBrick to act as a webhook receiver. The script listens for incoming POST requests, verifies the HMAC-SHA256 signature (using the webhook secret Fizzy provides when you configure webhooks), and saves each event as a separate JSON file with a timestamp and action name. This made it easy to review and compare the different event types later. To expose my local server to the internet, I used ngrok to create a temporary public URL pointing to port 4002. I then configured Fizzy’s webhook settings with this ngrok URL and selected the event types I wanted to capture. With everything set up, I went through Fizzy’s UI and manually triggered each available event: creating cards, adding comments, assigning and unassigning users, moving cards between columns and boards, marking cards as done, reopening them, postponing cards to “Not Now”, and sending cards back to triage. Each action fired a webhook that my script captured and logged. In total, I captured 13 webhook deliveries covering 10 different action types. The only event I could not capture was “Card moved to Not Now due to inactivity” — Fizzy triggers this automatically after a period of card inactivity, so it was not practical to reproduce during this test. Card body content is not included. The card object in webhook payloads only contains the , not the full description or body content. Comments include both and versions, but cards do not. Since Fizzy doesn’t have a public API ( DHH is working on it ), you can’t fetch the full card content programmatically - you’ll need to use the field to view the card in the browser. Column data is only present when relevant. The object only appears on , , and events - the events where a card actually moves to a specific column. IDs are strings, not integers. All identifiers in the payload are strings like , not numeric IDs. I created a short webhook documentation based on this research: FIZZY_WEBHOOKS.pdf It includes the full payload structure, all event types with examples, and code samples for signature verification in both Ruby and JavaScript. Hopefully this helps you get up and running with Fizzy’s webhooks. Let me know if you discover additional events or edge cases. Since the source code is available, you can also submit PRs to fix or enhance aspects of the webhook system if you find something missing or want to contribute improvements. A team metrics dashboard that tracks how long cards take to move from to and which assignees or boards close issues the fastest. Personal Slack or Teams digests that send each person a daily summary of cards they created, were assigned, or closed based on , , , and events. A churn detector that flags cards that bounce between columns or get sent back to triage repeatedly using , , and . A cross-board incident view that watches to keep a separate dashboard of cards moving into your incident or escalation boards. A comment activity stream that ships events into a search index or knowledge base so you can search discussions across boards. Stakeholder status reports that email non-technical stakeholders a weekly summary of key cards: what was created, closed, postponed, or sent back to triage on their projects. You can group by label, board, or assignee and generate charts or narrative summaries from , , , and events. Capacity and load alerts that watch for people who are getting overloaded. For example, you could send a notification to a manager when someone is assigned more than N open cards, or when cards assigned to them sit in the same column for too long without a or event. SLA and escalation notifications that integrate with PagerDuty or similar tools. When certain cards (for example, labeled “Incident” or on a specific board) are not closed within an agreed time window, you can trigger an alert or automatically move the card to an escalation board using , , and . Customer-facing status updates that keep clients in the loop without giving them direct access to Fizzy. You could generate per-customer email updates or a small status page based on events for cards tagged with that customer’s name, combining , , and to show progress and recent discussion. Meeting prep packs that assemble the last week’s events for a given board into a concise agenda for standups or planning meetings. You can collate newly created cards, reopened work, and high-churn items from , , , and , then email the summary to attendees before the meeting. - new card created - card moved to a column / - assignment changes - card moved to Done - card reopened from Done - card moved to Not Now - card moved back to Maybe? - card moved to different board - comment added to a card

0 views
Oya Studio 4 months ago

Better than JSON

An in-depth look at why Protobuf can outperform JSON for modern APIs, with practical Dart examples showing how strong typing, binary serialization, and shared schemas improve both performance and developer experience.

0 views
iDiallo 4 months ago

Demerdez-vous: A response to Enshittification

There is an RSS reader that I often used in the past and have become very reliant on. I would share the name with you, but as they grew more popular, they have decided to follow the enshittification route. They've changed their UI, hidden several popular links behind multilayered menus, and they have revamped their API. Features that I used to rely on have disappeared, and the API is close to useless. My first instinct was to find a new app that will satisfy my needs. But being so familiar with this reader, I've decided to test a few things in the API first. Even though their documentation doesn't mention older versions anymore, I've discovered that the old API is still active. All I had to do was add a version number to the URL. It's been over 10 years, and that API is still very much active. I'm sorry I won't share it here, but this has served as a lesson for me when it comes to software that becomes worse over time. Don't let them screw you, unscrew yourself! We talk a lot about "enshittification"these days. I've even written about it a couple of times. It's about how platforms start great, get greedy, and slowly turn into user-hostile sludge. But what we rarely talk about is the alternative. What do you do when the product you rely on rots from the inside? The French have a phrase for this: Demerdez-vous. The literal translation is "unshit yourself". What it actually means is to find a way, even if no one is helping you. When a company becomes too big to fail, or simply becomes dominant in its market, drip by drip, it starts to become worse. You don't even notice it at first. It changes in ways that most people tolerate because the cost of switching is high, and the vendor knows it. But before you despair, before you give up, before you let the system drag you into its pit, try to unscrew yourself with the tools available. If the UI changes, try to find the old UI. Patch the inconvenience. Disable the bullshit. Bend the app back into something humane. It might sound impossible at first, but the tools to accomplish this exist and are widely being used. Sometimes the escape hatch is sitting right there, buried under three layers of "Advanced" menus. On the web I hate auto-playing videos, I don't want to receive twelve notifications a day from an app, I don't care about personalization. But for the most part, these can be disabled. When I download an app, I actually spend time going through settings. If I care enough to download an app, or if I'm forced, I'll spend the extra time to ensure that an app works to my advantage, not the other way around. When that RSS reader removes features from the UI, but not from their code, I was still able to continue using them. Another example of this is reddit. Their new UI is riddled with dark patterns, infinite scroll, and popups. But, go to , and you are greeted with that old UI that may not look fancy, but it was designed with the user in mind, not the company's metrics. I also like YouTube removed the dislike button. While it might be hurtful to content creators to see the number of dislikes, as a consumer, this piece of data served as a filter for lots of spam content. For that of course there is the "Return Youtube Dislike" browser extension. Extensions often can help you regain control when popular websites remove functionality useful to users, but the service no longer wants to support. There are several tools that enhance youtube, fix twitter, and of course uBlock. It's not always possible to combat enshittification. Sometimes the developer actively enforces their new annoying features and prevents anyone from removing them. In cases like these, there is still something that users can do. They can walk away. You don’t have to stay in an abusive relationship. You are allowed to leave. When you do, you'll discover that there was an open-source alternative. Or that a small independent app survived quietly in the corner of the internet. Or even sometimes, you'll find that you don't need the app at all. You break your addiction. In the end, "Demerdez-vous" is a reminder that we still have agency in a world designed to take it away. Enshittification may be inevitable, but surrender isn’t. There’s always a switch to flip, a setting to tweak, a backdoor to exploit, or a path to walk away entirely. Companies may keep trying to box us in, but as long as we can still think, poke, and tinker, we don’t have to live with the shit they shovel. At the end of the day "On se demerde"

0 views
iDiallo 5 months ago

What Actually Defines a Stable Software Version?

As a developer, you'll hear these terms often: "stable software," "stable release," or "stable version." Intuitively, it just means you can rely on it. That's not entirely wrong, but when I was new to programming, I didn't truly grasp the technical meaning. For anyone learning, the initial, simple definition of "it works reliably" is a great starting point. But if you're building systems for the long haul, that definition is incomplete. The intuitive definition is: a stable version of software that works and that you can rely on not to crash. The technical definition is: a stable version of software where the API will not change unexpectedly in future updates. A stable version is essentially a guarantee from the developers that the core interface, such as the functions, class names, data structures, and overall architecture you interact with, will remain consistent throughout that version's lifecycle. This means that if your code works with version 1.0.0, it should also work flawlessly with version 1.0.1, 1.0.2, and 1.1.0. Future updates will focus on bug fixes, security patches, and performance improvements, not on introducing breaking changes that force you to rewrite your existing code. My initial misunderstanding was thinking stability was about whether the software was bug-free or not. Similar to how we expect bugs to be present in a beta version. But there was still an upside to this confusion. It helped me avoid the hype cycle, especially with certain JavaScript frameworks. I remember being hesitant to commit to new versions of certain tools (like early versions of React, Angular, though this is true of many fast-moving frameworks and SDKs). Paradigms would shift rapidly from one version to the next. A key concept I'd mastered one month would be deprecated or replaced the next. While those frameworks sit at the cutting edge of innovation, they can also be the antithesis of stability. Stability is about long-term commitment. Rapid shifts force users to constantly evolve with the framework, making it difficult to stay on a single version without continual, large-scale upgrades. A truly stable software version is one you can commit to for a significant amount of time. The classic example of stability is Python 2. Yes, I know many wanted it to die by fire, but it was first released in 2000 and remained active, receiving support and maintenance until its final update in 2020. That's two decades of stability! I really enjoyed being able to pick up old scripts and run them without any fuss. While I'm not advocating that every tool should last that long, I do think that when we're building APIs or stable software, we should adopt the mindset that this is the last version we'll ever make. This forces us to carefully consider the long-term design of our software. Whenever I see LTS (Long-Term Support) next to an application, I know that the maintainers have committed to supporting, maintaining, and keeping it backward compatible for a defined, extended period. That's when I know I'm working with both reliable and stable software.

0 views
Neil Madden 5 months ago

Were URLs a bad idea?

When I was writing Rating 26 years of Java changes , I started reflecting on the new HttpClient library in Java 11. The old way of fetching a URL was to use URL.openConnection() . This was intended to be a generic mechanism for retrieving the contents of any URL: files, web resources, FTP servers, etc. It was a pluggable mechanism that could, in theory, support any type of URL at all. This was the sort of thing that was considered a good idea back in the 90s/00s, but has a bunch of downsides: The new HttpClient in Java 11 is much better at doing HTTP, but it’s also specific to HTTP/HTTPS. And that seems like a good thing? In fact, in the vast majority of cases the uniformity of URLs is no longer a desirable aspect. Most apps and libraries are specialised to handle essentially a single type of URL, and are better off because of it. Are there still cases where it is genuinely useful to be able to accept a URL of any (or nearly any) scheme? Fetching different types of URLs can have wildly different security and performance implications, and wildly different failure cases. Do I really want to accept a mailto: URL or a javascript: “URL” ? No, never. The API was forced to be lowest-common-denominator, so if you wanted to set options that are specific to a particular protocol then you had to cast the return URLConnection to a more specific sub-class (and therefore lose generality).

0 views
xenodium 5 months ago

Want a WhatsApp Emacs client? Will you fund it?

Like it or not, WhatsApp is a necessity for some of us. I wish it weren't the case, but here we are. Given the circumstances, I wish I could use WhatsApp a little more on my terms. And by that, I mean from an Emacs client, of course. Surely I'm not the only one who feels this way, right? Right?! Fortunately, I'm not alone . With that in mind, I've been hard at work prototyping, exploring what's feasible. Spoiler alert: it's totally possible, though will require a fair bit of work. Thankfully, two wonderful projects offer a huge leg up: wuzapi and whatsmeow . wuzapi offers a REST API on top of whatsmeow , a Go library leveraging WhatsApp's multi-device web API. Last week, I prototyped sending a WhatsApp message using 's API. I got there fairly quickly by onboarding myself on to using its web interface and wiring shell-maker to send an HTTP message request via . While these two were enough for a quick demo, they won't cut it for a polished Emacs experience. While I can make REST work, I would like a simpler integration under the hood. REST is fine for outgoing messages, but then I need to integrate webhooks for incoming events. No biggie, can be done, but now I have to deal with two local services opening a couple of ports. Can we simplify a little? Yes we can. You may have seen me talk about agent-shell , my Emacs package implementing Agent Client Protocol (ACP) … Why is this relevant, you may ask? Well, after building a native Emacs implementation, I learned a bit about json-rpc over standard I/O. The simplicity here is that we can bring bidirectional communication to an Emacs-owned process. No need for multiple channels handling incoming vs outgoing messages. So where's this all going? I've been prototyping some patches on top of wuzapi to expose over standard I/O (as an alternative to ). This prototype goes far beyond my initial experiment with sending messages, and yet the Emacs integration is considerably simpler, not to mention looking very promising. Here's a demo showing incoming WhatsApp messages, received via , all through a single Emacs-owned process. Look ma, no ports! These early prototypes are encouraging, but we've only scratched the surface. Before you can send and receive messages, you need to onboard users to the WhatsApp Emacs client. That is, you need to create a user, manage/connect to a session, authorize via a QR code, and more. You'll want this flow to be realiable and that's just onboarding. From there, you'll need to manage contacts, chats, multiple message types, incoming notifications… the list goes on. That's just the Emacs side. As mentioned, I've also been patching . My plan is to upstream these changes , rather than maintaining a fork. I've prototyped quite a few things now, including the onboarding experience with QR code scanning. At this point, I feel fairly optimistic about feasibility, which is all pretty exciting! But there's a bunch of work needed. Since going full-time indie dev, I have the time available (for now), but it's hard to justify this effort without aiming for some level of sustainability. If you're interested in making this a reality, please consider sponsoring the effort , and please reach out to voice your interest ( Mastodon / Twitter / Reddit / Bluesky ). Reckon a WhatsApp Emacs client would help you stay focused at work (less time on your phone)? Ask your employer to sponsor it too ;-)

0 views