GreatReads - Blog Aggregator · Phoenix Framework

JSON

0 views

Simon Willison 1 weeks ago

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

Today at Google I/O, Google released Gemini 3.5 Flash . This one skipped the modifier and went straight to general availability, and Google appear to be using it for a whole lot of their key products: 3.5 Flash is available today to billions of people globally: As usual with Gemini, the most interesting details are tucked away in the What's new in Gemini 3.5 Flash developer documentation. It mostly has the same set of platform features as the previous Gemini 3.x series, albeit with no computer use . The model ID is . The knowledge cut-off is January 2025, and it supports 1,048,576 input tokens and 65,536 maximum output tokens. Google are also pushing a new Interactions API , currently in beta, which looks to me like their version of the patterns introduced by OpenAI Responses - in particular server-side history management. Gemini 3.5 Flash is accompanied by a notable price bump. The previous models in the "Flash" family were Gemini 3 Flash Preview and Gemini 3.1 Flash-Lite . The new 3.5 Flash is 3x the price of 3 Flash Preview and 6x the price of 3.1 Flash-Lite (see price comparison here ). At $1.50/million input and $9/million output it's getting close in price to Google's Gemini 3.1 Pro, which is $2 and $12. The Gemini team promise that 3.5 Pro will roll out "next month" - presumably at an even higher price. This fits a trend: OpenAI's GPT-5.5 was 2x the price of GPT-5.4, and Claude Opus 4.7 is around 1.46x the price of 4.6 when you take the new tokenizer into account . Given the price increase it's interesting to see Google roll it out for so many of their own free-to-consumer products. It feels like all three of the major AI labs are starting to probe the price tolerance of their API customers. Artificial Analysis publish the cost to run their proprietary benchmark against models, which is a useful way to take things like tokenization and increased volume of reasoning tokens into account. Some numbers worth comparing: Running the benchmark for 3.5 Flash (high) cost significantly more than 3.1 Pro Preview! Here are some numbers from other vendors: I ran "Generate an SVG of a pelican riding a bicycle" against the Gemini API and got back this pelican, which is a lot : From the code comments: hedgehog on Hacker News : That pelican looks like it's in Miami for a crypto conference. That one cost me 11 input tokens and 14,403 output tokens, for a total cost of just under 13 cents . You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . For everyone via the Gemini app and AI Mode in Google Search For developers in our agent-first development platform Google Antigravity and Gemini API in Google AI Studio and Android Studio For enterprises in Gemini Enterprise Agent Platform and Gemini Enterprise. Gemini 3.5 Flash (high) : $1,551.60 Gemini 3.1 Pro Preview : $892.28 Gemini 3 Flash Preview (Reasoning) : $278.26 Gemini 3.1 Flash-Lite Preview : $93.60 Claude Opus 4.7 (Adaptive Reasoning, Max Effort) : $5,117.14 Claude Opus 4.7 (Non-reasoning, High Effort) : $1,217.23 GPT-5.5 (xhigh) : $3,357.00 GPT-5.5 (medium) : $1,199.14

1 views

Unsung 3 weeks ago

“There seems to be a file that is just filled with undecipherable Morse.”

On April Fools in 2021, the popular xkcd comic ran Checkbox , which was a Morse code puzzle in disguise. (It’s interesting to see the community trying to figure out what it actually does .) Engineer Max Goodhart built the front-end and wrote a summary of the whole project : This year was a doozy. We specced and scrapped several different ideas in the months leading up to today. We finally settled on today’s concept just 3 days ago. The need to do something simple was a really useful constraint, and we leaned into the idea of making something primitive but deep. The team seems to have had a lot of fun with it, including even JavaScript being encoded in Morse Code (the link in the blog post no longer works, but you can still see it on the Internet Archive ). Goodhart also wrote about the immense challenge of adjusting the Morse tapping speed to the user, which counterintuitively ended up needing… adjusting the user to the speed. But the best part is that the server communications used the Morse code in URLs, as well: We took great pains to make the API for this project use morse code in the transport. If you take a look at the network inspector, you’ll notice that the URLs requested have morse code in them. This worked for every combination of letters imaginable, with two oddly specific exceptions: a solitary E, and a solitary I. I liked this description of what transpired next, which would have made me think I was going insane, too: Then, an even stranger thing happened . I copied and pasted the correct URL into my browser and pressed Enter, and right before my eyes, it deleted the ”.” from the end of the URL and returned a different result. I was delighted to discover an answer here, not only because in retrospect it’s such an obvious thing that was staring us all in the face for decades, but also because it has interesting URL construction consequences. #bugs #encoding #web

0 views

Josh Comeau 1 months ago

Scroll-Driven Animations

The new Animation Timeline API allows us to create dynamic scroll animations without any JavaScript! It’s honestly a very lovely API, and in this blog post, we’ll explore some of the super cool things we can do with it.

0 views

Simon Willison 1 months ago

A pelican for GPT-5.5 via the semi-official Codex backdoor API

GPT-5.5 is out . It's available in OpenAI Codex and is rolling out to paid ChatGPT subscribers. I've had some preview access and found it to be a fast, effective and highly capable model. As is usually the case these days, it's hard to put into words what's good about it - I ask it to build things and it builds exactly what I ask for! There's one notable omission from today's release - the API: API deployments require different safeguards and we are working closely with partners and customers on the safety and security requirements for serving it at scale. We'll bring GPT‑5.5 and GPT‑5.5 Pro to the API very soon. When I run my pelican benchmark I always prefer to use an API, to avoid hidden system prompts in ChatGPT or other agent harnesses from impacting the results. One of the ongoing tension points in the AI world over the past few months has concerned how agent harnesses like OpenClaw and Pi interact with the APIs provided by the big providers. Both OpenAI and Anthropic offer popular monthly subscriptions which provide access to their models at a significant discount to their raw API. OpenClaw integrated directly with this mechanism, and was then blocked from doing so by Anthropic. This kicked off a whole thing. OpenAI - who recently hired OpenClaw creator Peter Steinberger - saw an opportunity for an easy karma win and announced that OpenClaw was welcome to continue integrating with OpenAI's subscriptions via the same mechanism used by their (open source) Codex CLI tool. Does this mean anyone can write code that integrates with OpenAI's Codex-specific APIs to hook into those existing subscriptions? The other day Jeremy Howard asked : Anyone know whether OpenAI officially supports the use of the endpoint that Pi and Opencode (IIUC) uses? It turned out that on March 30th OpenAI's Romain Huet had tweeted : We want people to be able to use Codex, and their ChatGPT subscription, wherever they like! That means in the app, in the terminal, but also in JetBrains, Xcode, OpenCode, Pi, and now Claude Code. That’s why Codex CLI and Codex app server are open source too! 🙂 And Peter Steinberger replied to Jeremy that: OpenAI sub is officially supported. So... I had Claude Code reverse-engineer the openai/codex repo, figure out how authentication tokens were stored and build me llm-openai-via-codex , a new plugin for LLM which picks up your existing Codex subscription and uses it to run prompts! (With hindsight I wish I'd used GPT-5.4 or the GPT-5.5 preview, it would have been funnier. I genuinely considered rewriting the project from scratch using Codex and GPT-5.5 for the sake of the joke, but decided not to spend any more time on this!) Here's how to use it: All existing LLM features should also work - use to attach an image, to start an ongoing chat, to view logged conversations and to try it out with tool support . Let's generate a pelican! Here's what I got back : I've seen better from GPT-5.4 , so I tagged on and tried again : That one took almost four minutes to generate, but I think it's a much better effort. If you compare the SVG code ( default , xhigh ) the one took a very different approach, which is much more CSS-heavy - as demonstrated by those gradients. used 9,322 reasoning tokens where the default used just 39. One of the most notable things about GPT-5.5 is the pricing. Once it goes live in the API it's going to be priced at twice the cost of GPT-5.4 - $5 per 1M input tokens and $30 per 1M output tokens, where 5.4 is $2.5 and $15. GPT-5.5 Pro will be even more: $30 per 1M input tokens and $180 per 1M output tokens. GPT-5.4 will remain available. At half the price of 5.5 this feels like 5.4 is to 5.5 as Claude Sonnet is to Claude Opus. Ethan Mollick has a detailed review of GPT-5.5 where he put it (and GPT-5.5 Pro) through an array of interesting challenges. His verdict: the jagged frontier continues to hold, with GPT-5.5 excellent at some things and challenged by others in a way that remains difficult to predict. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Install Codex CLI, buy an OpenAI plan, login to Codex Install LLM: Install the new plugin: Start prompting:

CSS

0 views

Zak Knill 1 months ago

SSE token streaming is easy, they said

I wrote about AI having ‘durable sessions’ to support async agentic applications, and in the comments everyone said: “Token streaming over SSE is easy” . …so I figured I’d dig into that claim. Agents used to be a thing you talked to synchronously. Now they’re a thing that runs in the background while you work. When you make that change, the transport breaks.

0 views

Ed Zitron's Where's Your Ed At 1 months ago

News: Anthropic Removes Claude Code From $20-A-Month "Pro" Subscription Plan For New Users (Developing)

In developing news, Anthropic appears to have removed access to AI coding tool Claude Code from its $20-a-month "Pro" accounts. This is likely another cost-cutting move that follows a recent change ( per The Information ) that forced enterprise users to pay on a per-million-token based rate rather than having rate limits that were, based on researchers' findings, often much higher than the cost of the subscription. Previously, users were able to access Claude using their Pro subscriptions via a command-line interface and both the web and desktop Claude apps. Users were, instead of paying on a per-million-token basis, allowed to use their subscription to access Claude Code, but will likely now have to pay for API access. Anthropic's Claude Code support documents ( as recently as this April 10th archived page ) previously read "Using Claude Code with your Pro or Max plan." The page now reads "Using Claude Code with your Max plan." Pricing on Anthropic's website reflects the removal of Claude Code on both mobile and desktop. Some Pro users report that they are still able to access Claude Code via the web app and Command-Line Interface. It is unclear at this time whether this change is retroactive or for new Pro subscribers, or whether Anthropic intends to entirely remove access to Claude Code (without paying for API tokens) from every Pro customer. I have requested a comment from Anthropic, and will update this piece when I receive it, or if Anthropic confirms this move otherwise. If you liked this news hit and want to support my independent reporting and analysis, why not subscribe to my premium newsletter? It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words, including vast, detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large . I recently put out the timely and important Hater’s Guide To The SaaSpocalypse , another on How AI Isn't Too Big To Fail , a deep (17,500 word) Hater’s Guide To OpenAI , and just last week put out the massive Hater’s Guide To Private Credit . Subscribing to premium is both great value and makes it possible to write these large, deeply-researched free pieces every week. Anthropic appears to have removed access to Claude Code for its $20-a-month "Pro" Plans. Current Pro users appear to still have access via the Claude web app. Claude Code support documents exclusively refer to accessing Claude Code via "your Max Plan," after previously saying you could access "with your Pro or Max Plan."

Business

0 views

Unsung 1 months ago

Raycast’s confetti cannon

Among many genuinely useful deeplinks you can use to control Raycast from afar in a simple way, I just spotted an interesting one: This is what it does: Despite it being a confetti cannon and nothing more, I think it goes deeper than stuff like e.g. Asana’s “ celebration creatures ”, and it deserves recognition for three actually kinda serious reasons: #above and beyond #coding #easter eggs #internal ui You can use it to quickly test whether you’re wiring deeplinks correctly. It’s clever the Raycast team put it at the beginning of the doc page ; I think every API or a complex connection method should have a simple and delightful “success scenario” for two reasons: to celebrate you establishing that connection, and to have something so simple it cannot itself be misbehaving (this way you know that if you can’t get confetti to work, you for sure messed up something elsewhere ). Once you know how to invoke it from far away, it’s also great for testing other things . Sounds can be muted. In JavaScript, can be too buried if you don’t have a console open or visible, and is kind of depressingly old-school and steals focus. This HUD-like thing feels like a modern way of approaching this: You know you’ll notice it when it fires away, and it will leave no lasting damage. (Okay, fair, it does steal focus too, so that’d be one thing to improve.) It has great production value. I hate perhaps all of Google’s search easter eggs because they’re built so extremely cheaply – try searching for “do a barrel roll” or “askew” (and no, I’m not going to dignify them with links because links are my love language). It’s rare and worth celebrating when something that could very well be an internal joke or a test feature for nerds is actually something you want to use because it’s so well-made. (See also: Linear’s internal testing UI .)

Testing

0 views

Simon Willison 1 months ago

Meta's new model is Muse Spark, and meta.ai chat has some interesting tools

Meta announced Muse Spark today, their first model release since Llama 4 almost exactly a year ago . It's hosted, not open weights, and the API is currently "a private API preview to select users", but you can try it out today on meta.ai (Facebook or Instagram login required). Meta's self-reported benchmarks show it competitive with Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 on selected benchmarks, though notably behind on Terminal-Bench 2.0. Meta themselves say they "continue to invest in areas with current performance gaps, such as long-horizon agentic systems and coding workflows". The model is exposed as two different modes on meta.ai - "Instant" and "Thinking". Meta promise a "Contemplating" mode in the future which they say will offer much longer reasoning time and should behave more like Gemini Deep Think or GPT-5.4 Pro. I prefer to run my pelican test via API to avoid being influenced by any invisible system prompts, but since that's not an option I ran it against the chat UI directly. Here's the pelican I got for "Instant": And this one for "Thinking": Both SVGs were rendered inline by the Meta AI interface. Interestingly, the Instant model output an SVG directly (with code comments) whereas the Thinking model wrapped it in a thin HTML shell with some unused JavaScript libraries. Which got me curious... Clearly Meta's chat harness has some tools wired up to it - at the very least it can render SVG and HTML as embedded frames, Claude Artifacts style. But what else can it do? I asked it: what tools do you have access to? I want the exact tool names, parameter names and tool descriptions, in the original format It spat out detailed descriptions of 16 different tools. You can see the full list I got back here - credit to Meta for not telling their bot to hide these, since it's far less frustrating if I can get them out without having to mess around with jailbreaks. Here are highlights derived from that response: Browse and search . can run a web search through an undisclosed search engine, can load the full page from one of those search results and can run pattern matches against the returned page content. Meta content search . can run "Semantic search across Instagram, Threads, and Facebook posts" - but only for posts the user has access to view which were created since 2025-01-01. This tool has some powerful looking parameters, including , , , and . "Catalog search" - can "Search for products in Meta's product catalog", presumably for the "Shopping" option in the Meta AI model selector. Image generation . generates images from prompts, and "returns a CDN URL and saves the image to the sandbox". It has modes "artistic" and "realistic" and can return "square", "vertical" or "landscape" images. container.python_execution - yes! It's Code Interpreter , my favourite feature of both ChatGPT and Claude. Execute Python code in a remote sandbox environment. Python 3.9 with pandas, numpy, matplotlib, plotly, scikit-learn, PyMuPDF, Pillow, OpenCV, etc. Files persist at . Python 3.9 is EOL these days but the library collection looks useful. I prompted "use python code to confirm sqlite version and python version" and got back Python 3.9.25 and SQLite 3.34.1 (from January 2021 ). container.create_web_artifact - we saw this earlier with the HTML wrapper around the pelican: Meta AI can create HTML+JavaScript files in its container which can then be served up as secure sandboxed iframe interactives. "Set kind to for websites/apps or for vector graphics." container.download_meta_1p_media is interesting: "Download media from Meta 1P sources into the sandbox. Use post_id for Instagram/Facebook/Threads posts, or for catalog product images". So it looks like you can pull in content from other parts of Meta and then do fun Code Interpreter things to it in the sandbox. container.file_search - "Search uploaded files in this conversation and return relevant excerpts" - I guess for digging through PDFs and similar? Tools for editing files in the container - , (with and ), . These look similar to Claude's text editor tool commands - these are becoming a common pattern across any file-equipped agent harness. container.visual_grounding - see below, this one is fun . subagents.spawn_agent - the sub-agent as a tool pattern. "Spawn an independent sub-agent for research, analysis, or delegation. It returns its final text response." third_party.link_third_party_account - "Initiate account linking for a third-party service", currently Google Calendar, Outlook Calendar, Gmail, or Outlook. Let's talk about that one. Here's the description in full: Description: Visual grounding tool that analyzes the image, identifies and labels objects, locates regions, or counts objects. Parameters: So it can take an image path, optional object names and return things in format , or . Sounds to me like Meta's Segment Anything . I decided to take it for a spin. generate a photo of a raccoon sitting on a trash can wearing trash as a hat Here you go — king of the alley, in full regalia: Coffee cup crown, banana peel brim, newspaper feather. Peak raccoon fashion. Want me to make a wilder version — maybe night-vision, or wearing sunglasses too? My guess is that image generation in Meta AI is powered by their Emu model , or an updated version of it. This is the same pattern as ChatGPT and Gemini image generation, where images are generated by a tool call (here that's ). The tool description said that it "saves the image to the sandbox", so my guess was that I'd then be able to run Python code against it. I tried: use python OpenCV to analyze that image and find out neat things about it Annoyingly it did not show me the code it ran, but it did create this image: Pretty neat! We can both generate images and then use them with Python tools. Next, the tool. Knowing the names of the tools makes it easy to refer to them in prompts: Now try the container.visual_grounding tool The response ( full response here ): Done — visual_grounding pinpointed every piece of the outfit with pixel-level precision: Want me to run it in bbox mode next to draw boxes around the fur texture, or count mode to see how many individual whiskers it can detect? Once again, Meta AI wrapped the results from the tool in a custom HTML visualization. That's clearly mode, but how about the others? I prompted: yes draw bounding boxes, can it do masks too? Yes — bounding boxes work great. Masks? Not directly in ; it only returns three formats: point, bbox, and count. No pixel-level segmentation masks. Here's the bbox output: What you’re seeing: Notice how the boxes nest: face is inside raccoon, eyes inside face, hat pieces overlap each other but stay above the face box. The tool is doing real object localization, not just guessing centers. About masks: since visual_grounding doesn't output them, I can fake them with OpenCV in the container. Two quick options: Want me to generate an actual alpha mask for the raccoon (cut it out from the alley), or for the trash hat pieces separately? Takes about 10 seconds and I can give you a transparent PNG. Here's the HTML for that bbox visualization . Finally I had it dump out the raw JSON for point mode, bbox mode and count mode. You can see that all here . The count mode is particularly interesting - here's that in full: So Meta AI has the ability to count a raccoon's whiskers baked into the default set of tools. Which means... it can count pelicans too! Here's that overlay exported as HTML . Update : Meta's Jack Wu confirms that these tools are part of the new harness they launched alongside the new model. On Twitter Alexandr Wang said : this is step one. bigger models are already in development with infrastructure scaling to match. private api preview open to select partners today, with plans to open-source future versions. I really hope they do go back to open-sourcing their models. Llama 3.1/3.2/3.3 were excellent laptop-scale model families, and the introductory blog post for Muse Spark had this to say about efficiency: [...] we can reach the same capabilities with over an order of magnitude less compute than our previous model, Llama 4 Maverick. This improvement also makes Muse Spark significantly more efficient than the leading base models available for comparison. So are Meta back in the frontier model game? Artificial Analysis think so - they scored Meta Spark at 52, "behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6". Last year's Llama 4 Maverick and Scout scored 18 and 13 respectively. I'm waiting for API access - while the tool collection on meta.ai is quite strong the real test of a model like this is still what we can build on top of it. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Browse and search . can run a web search through an undisclosed search engine, can load the full page from one of those search results and can run pattern matches against the returned page content. Meta content search . can run "Semantic search across Instagram, Threads, and Facebook posts" - but only for posts the user has access to view which were created since 2025-01-01. This tool has some powerful looking parameters, including , , , and . "Catalog search" - can "Search for products in Meta's product catalog", presumably for the "Shopping" option in the Meta AI model selector. Image generation . generates images from prompts, and "returns a CDN URL and saves the image to the sandbox". It has modes "artistic" and "realistic" and can return "square", "vertical" or "landscape" images. container.python_execution - yes! It's Code Interpreter , my favourite feature of both ChatGPT and Claude. Execute Python code in a remote sandbox environment. Python 3.9 with pandas, numpy, matplotlib, plotly, scikit-learn, PyMuPDF, Pillow, OpenCV, etc. Files persist at . Python 3.9 is EOL these days but the library collection looks useful. I prompted "use python code to confirm sqlite version and python version" and got back Python 3.9.25 and SQLite 3.34.1 (from January 2021 ). container.create_web_artifact - we saw this earlier with the HTML wrapper around the pelican: Meta AI can create HTML+JavaScript files in its container which can then be served up as secure sandboxed iframe interactives. "Set kind to for websites/apps or for vector graphics." container.download_meta_1p_media is interesting: "Download media from Meta 1P sources into the sandbox. Use post_id for Instagram/Facebook/Threads posts, or for catalog product images". So it looks like you can pull in content from other parts of Meta and then do fun Code Interpreter things to it in the sandbox. container.file_search - "Search uploaded files in this conversation and return relevant excerpts" - I guess for digging through PDFs and similar? Tools for editing files in the container - , (with and ), . These look similar to Claude's text editor tool commands - these are becoming a common pattern across any file-equipped agent harness. container.visual_grounding - see below, this one is fun . subagents.spawn_agent - the sub-agent as a tool pattern. "Spawn an independent sub-agent for research, analysis, or delegation. It returns its final text response." third_party.link_third_party_account - "Initiate account linking for a third-party service", currently Google Calendar, Outlook Calendar, Gmail, or Outlook. object_names (array, required) image_path (string, default "") format_type ("bbox"|"point"|"count"|null, default "bbox") title (string|null)[string] raccoon : big box from (270,240) to (892,845) — covers 62% of the width, basically the whole animal coffee cup : (424,115)-(610,273) — sits highest, 158px tall in the 0-1000 space banana peel : (472,215)-(667,291) — overlaps the cup's base, confirming it's draped over the side newspaper : (305,223)-(548,315) — left ear coverage trash can lid : (107,800)-(1000,998) — spans almost the full image width at the bottom face : (305,283)-(750,537) — tight crop around the mask eyes : both ~40×37 pixel boxes, perfectly level at y=379 GrabCut — I give it the raccoon bbox, it iteratively segments fur from background and returns a binary mask SAM-style color threshold — use the bbox plus k-means to isolate the coffee cup or banana peel

Machine Learning

HTML

0 views

Martin Alderson 2 months ago

Telnyx, LiteLLM and Axios: the supply chain crisis

While the world's been watching physical supply chains, a different kind of supply chain attack has been escalating in the open source ecosystem. Over the past week a group of bad actors have been compromising various open source projects, pushing malicious versions of libraries which inject a trojan that collects sensitive data from systems that install the malicious version. Ironically, the first attack started with , an open source package for finding security vulnerabilities. The scale of the issue is growing and is alarming. This wave of attacks started with some smaller libraries, then started to hit more popular packages in the supply chain with , a popular package for voice and SMS integration. This had ~150k/week downloads on the affected package. was next - a much more popular package for calling various APIs. This had ~22M/week downloads. Finally, and most concerning, the npm package for - an incredibly widely used library for calling APIs, was attacked on March 31st. This has at least 100M downloads a week and is a very core piece of software that is used in millions of apps. There was a rapid reaction to each of these attacks to remove the malicious versions, but even in the hours they were up, tens of thousands of machines (and potentially far more) were likely compromised. The attackers are leveraging stolen credentials from the previous attack(s) to then infect more packages in the supply chain. This creates a vicious cycle of compromises that continues to grow. Equally, other systems are at risk - for every system that the attack compromises who happens to also be a developer of another software library, there are probably thousands of other developers who have unfortunately leaked very sensitive data to the attackers. This is not a new issue, and last year we saw the and attacks against the npm ecosystem which in two waves backdoored over 1,000 packages. The aim of this attack appears to have been to steal crypto - with reports suggesting $8.5m was stolen. The infrastructure providers behind this supply chain did respond by putting various mitigations in place. The primary two were requiring published packages to use short-lived tokens - which reduces the impact of "old" credentials being able to publish new packages. It appears this has not solved the issue - given it seems these packages have managed to be published regardless. The more invasive one is to allow developers to not install "brand new" packages. Instead, they get held for a time period - say 24 hours - with the idea being the community will (hopefully) detect malicious versions in the 24 hours and revoke them before they are installed. This is a double edged sword though - as often you need rapid response to a vulnerable package to avoid security issues. This can be overridden manually - but it does introduce some overhead to response to urgent security flaws. Finally, npm are rolling out staged publishing. This requires a separate step when publishing new versions of packages for a "trusted" human to do a check on the platform with two step verification to avoid automated attacks. However, given it seems developers computers' are being compromised it is not implausible to suggest that the attacker could also perform this step. I'm extremely concerned about the cybersecurity risk LLMs pose, which I don't think is sufficiently priced in on the impact it is going to have outside of niche parts of the tech community. While it's hard to know for sure how the initial attacks were discovered, I strongly suspect they have been aided by LLMs to find the exploit(s) in the first place and develop subsequent attacks. While this is conjecture, the number of exploits being found by non-malicious actors is exploding . I found one myself - which I wrote up in a recent post , still unpatched - in less than half an hour. There's endless other examples online . So it seems to me that LLMs are acting as an accelerant: Firstly, they make finding security vulnerabilities far easier - which allows the whole supply chain attack cycle to start. And the leaked rumours about the new Mythos model from Anthropic being a step change better than Opus 4.6 (which is already exceptionally good at finding security issues) means the direction of travel is only going one way. Secondly, they allow attackers to build far more sophisticated attacks far quicker than before - for example, one of the attacks in this recent wave hid one exploit in an audio file. Next, this is all happening while the infrastructure providers of the software supply chain are on the back foot with improving mitigations. Finally, so much of the software ecosystems' critical security infrastructure is maintained by volunteers who are often unpaid. As always, the above image illustrates the point far better than words can. To reiterate - it may be that this is just a well resourced group that could have done all this without LLMs. But given adoption of coding agents is so high in the broader developer community, it seems far fetched to say they wouldn't be used for nefarious means. Fundamentally, these attacks are possible because OSes (by default) are far too permissive and designed in a world where software is either trusted or not. The attempts to secure this - by trusting certain publishers - falls down for both agents and supply chain attacks because agents can use trusted software in unexpected ways, and if the trusted authors of the software are compromised it bypasses everything. Thinking a few steps ahead here, it seems to me that the core mitigations are (mostly) insufficient. There are some things however that would help with the supply chain in particular: To me though I keep coming back to the realisation that the difficulty of sandboxing agents faces very similar challenges to helping mitigate the impact of this security issue. iOS and Android were designed with this approach in mind - each app has very limited access to other apps and the OS as a whole. I think we need to move desktop and server operating systems to a similar model for this new world. While this won't resolve all issues, it will dramatically reduce the "blast impact" of each attack and prevent the "virality" of many exploits from gathering traction. The OS should know that should only write package files to a certain set of folders and reject everything else. The OS should know a baseline of services a CI/CD run and what network calls it makes, to avoid connections to random command and control services. And like mobile OSes, one program shouldn't be able to read another programs files and data without explicit opt in. If you've used sandbox mode in a coding agent, you will be familiar with this approach - all the pieces are there already. Qubes OS is probably the closest thing outside of mobile OSes to what I'm thinking we need to move to - a security focused Linux operating system which runs each app in a total self-contained VM. It's an enormous undertaking to migrate the world's software to run like this, and perhaps governments should be allocating significant resources to open source projects to help them adopt this. Any delay to publishing packages can backfire and introduce delays in responding to real security incidents There is too much software - maintained or unmaintained - which is likely to be vulnerable Much of this software, if it is maintained, is poorly resourced and is likely to burn out volunteers trying to resolve a flood of security issues in the near term Frontier labs donating compute and tokens to automatically scan every package update for potential signs of compromise before publishing. This would be an excellent use of their leading models

Security

0 views

iDiallo 2 months ago

How Do We Get Developers to Read the Docs

When I reviewed this PR, I had tears in my eyes. We had done it. We had finally created the perfect API. To top it off, the senior developer who worked on it had written documentation to match. No stones were left unturned. I had the code open on one window and the doc on the other. The moment I felt hesitation in the code, the documentation reassured me. Why do we make two calls to get the... "We are fetching two types of orders to support legacy subscribers..." the documentation answered before I completed my question. This was standard number 15 . The one to rule them all. But I still had one question. As the owner of the API, I read the documentation. Will other developers ever think to read it? How do I get people to want to read the documentation before they use this API? Because in my experience, nobody reads the documentation. Not to say that documentation is useless, but my mistake was thinking that the people who want to implement the API are interested in documentation at all. For every API ever built, there are two audiences to cater to, and confusing them is where most documentation goes wrong. The first group is the consumers of the API. The only thing they want to know is: do the endpoints do what I need, and what parameters do they take? They are not reading your documentation like a book. They are scanning it like a menu. They want to find the thing they need, copy the example, and move on. The second group is the maintainers of the API. The people who need to understand the why behind every decision. Why are there two calls? Why does this endpoint behave differently for legacy users? Why is this field nullable? These are the people who will be debugging at 2am, and they need the full picture. The worst thing you can do is write one document that tries to serve both audiences equally. You end up with something that's too deep for the first group to skim, and not structured enough for the second group to find it useful. For the first audience, the API should speak for itself. The best documentation you can provide is not text to read through, but a well-designed API. Follow clear, repeatable patterns where the user can anticipate, or even assume the available features. If you have an endpoint called , the assumption should be that returns a specific order. If you add , there should probably be a too. When the pattern is consistent, the consumer doesn't need to read anything, they just guess correctly. When you do write documentation for this audience, resist the urge to explain your internals. They don't need to know that you're fetching from two different database tables to support legacy subscribers. What they need to know is: . One sentence. Done. I like this idiom: "Too much information and no information, accomplish the same goal." This is a mistake I see most often. It's a painful one because it comes from a good place. The writer of the documentation, usually the person who built the thing, feels a sense of responsibility. They want to be thorough. They want no one to be confused. So they write everything down. The result is a documentation page that looks like this: This endpoint retrieves orders for a given user. It was introduced in v2.3 of the API following the migration from the legacy order management system (OMS) in Q3 2021. Internally, the resolver makes two sequential calls (one to the new orders table and one to the legacy_orders table) and merges the results using the order ID as a deduplication key. Note that legacy orders may not contain a field, which was not captured before 2019. If you are building a UI, you should account for this possibility. The endpoint also supports cursor-based pagination, though offset-based pagination is available for backward compatibility with clients built before v2.1. Additionally, orders in a state may not appear immediately... A developer scanning this page will read the first sentence, close the tab, and think about designing API standard number 16. They'll go look at the codebase instead, or ping a teammate, or just guess. The documentation existed, it just didn't get read. Which means it accomplished exactly the same thing as having no documentation at all. The same way you don't write a comment to explain every line of code, a documentation doesn't benefit from too much information. My go to solution isn't to omit information, but to write it in layers. Collapsible sections are one of the most underrated tools in documentation design. They let the consumer skim the surface: endpoint name, what it returns, a working example. And they let the maintainer dive deeper into the implementation notes, the edge cases, and the historical context. The same principle applies to how you order information. Lead with what the API does. Follow with how to use it. Bury the why at the bottom, behind a toggle or a "Details" section, available to those who need it, invisible to those who don't. Think of it like a well-designed error message. A good error message tells you what went wrong in plain language. A great error message also includes an expandable stack trace, but it doesn't show you the stack trace first. Your documentation has the same job. Give people the answer they're looking for, and then offer the depth to those willing to dig. The second audience, the maintainers, do need the full picture. The two database calls, the deduplication logic, the historical reason the field is sometimes null. This is the documentation that prevents a future developer from "fixing" something that wasn't broken, or removing what looks like redundant code. But this documentation doesn't have to live on the same page as the quick-start guide. Deep implementation notes belong in inline code comments or a separate internal wiki. The public-facing API reference should stay clean. When you separate operational documentation (for consumers) from institutional documentation (for maintainers), both documents get better. The consumer doc gets shorter and clearer. The maintainer doc gets deeper because it's no longer trying to also be beginner-friendly. The goal of documentation isn't completeness. Completeness is what you write for yourself, to feel like you've done your job. The goal of documentation is to transfer the right information into the right person's head at the right moment. That senior developer who wrote the documentation I cried over understood this. She didn't write everything she knew. She wrote exactly what someone reading the code would need to know, at the exact moment they'd need it. And the API design allowed anyone consuming it to make correct assumptions (intuitive design) on how it works. Both groups are happy.

0 views

Maister's Graphics Adventures 2 months ago

Walking backwards into the future – A look at descriptor heap in Granite

It seems like I can never quite escape the allure of fiddling with bits more efficiently every passing year. I recently went through the process of porting over Granite’s Vulkan backend to use VK_EXT_descriptor_heap. There wasn’t exactly a burning need to do this work, but science demands I sacrifice my limited free time for these experiments. My name may or may not be on the extension summary, and it’s important to eat your own dog food. In this post, I want to explore ways in which we can port over an old school binding model to newer APIs should the need arise. Granite’s binding model is designed for really old Vulkan. The project started in January 2017 after all, at which point Vulkan was in its infancy. Bindless was not really a thing yet, and I had to contend with really old mobile hardware. Slot-based bindings have been with us since OpenGL and early D3D. I still think it’s a fine model from a user’s perspective. I have no problem writing code like: It’s very friendly to tooling and validation and I just find it easy to use overall. GPU performance is great too since vendors have maximal flexibility in how to implement the API. The major downside is the relatively heavy CPU cost associated with it since there are many API calls to make. In my projects, it’s rarely a concern, but when doing heavy CPU-bound workloads like PS2 GS emulation, it did start to matter quite a bit When SPIR-V shaders are consumed in Granite, they are automatically reflected. E.g., with GLSL: I automatically generate VkDescriptorSetLayout for each unique set, and combine these into a VkPipelineLayout as one does. VkDescriptorSetLayouts is hash’n’cached into a DescriptorSetAllocator. The implicit assumption by shaders I write is that low-frequency updates have lower set values. This matches Vulkan’s pipeline layout compatibility rules too. Given the hardcore descriptor churn this old model can incur, UBOs originally used VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC. Since linearly allocating new UBOs per draw is a hot path, I wanted to avoid having to allocate and write new descriptor sets all the time. This is precisely what the dynamic buffer types were designed for. I did not use it for SSBOs since DYNAMIC has some unfortunate interactions with descriptor size, since you cannot change the size, only offset. The size of UBOs is somewhat irrelevant, and I just hardcoded in a 64K window. There are two main strategies for allocating sets from a VkDescriptorPool, both which are kinda bad. The typical model I believe most do is the “jumbo” allocator where you create a big pool with many sets and many descriptors with different descriptor types and pray for the best. When the pool is OOM-ed, allocate another. One unfortunate thing about the jumbo pool is that you can’t really know up front exactly how to balance the descriptor types properly. It will always be a shaky heuristic. In raw Vulkan 1.0, it was straight up illegal to allocate any further once a limit had been reached, causing even more headaches. The very first maintenance extension to Vulkan fixed this and added OUT_OF_POOL_MEMORY which allows applications to just keep going until the pool is exhausted. Fun fact is that some vendors would never exhaust the pool and just straight up ignore what you pass into vkCreateDescriptorPool, so that’s fun. Granite went the route of a slab allocator per VkDescriptorSetLayout instead, one allocator per thread. Allocate a group of like 64 VkDescriptorSets in one go and parcel them out as needed. Main advantage here was no need to keep calling vkAllocateDescriptorSets over and over, and in the early years, I even hash’n’cached the descriptor sets. The primary reason for doing that was that some early mobile drivers were extreeeeeeeemely slow at vkUpdateDescriptorSets for some reason. Not a great time. This slab approach lead to memory bloat though. At some point VK_KHR_descriptor_update_template was added which aims to accelerate vkUpdateDescriptorsSets. Instead of having the driver parse the structs and switching on the descriptorType to write descriptors, the update template allows drivers in theory to “precompile” a highly optimized function that updates descriptors based on the template that is provided in vkCreateDescriptorUpdateTemplate. This was a nice incremental thing to add to Granite. I don’t think the promise of update templates really worked out in the end though. Most drivers I think just resorted to parsing the original template instead, leading to no speedup. Push descriptors were designed quite early on in Vulkan’s life, but its adoption was … spotty at best. It didn’t make it into core until Vulkan 1.4! Push descriptors solved some issues for us slot and binding troglodytes since there was simply no need to mess around with allocating sets and pools when we could just push descriptors and the driver would deal with it. The major downside is that only one descriptor set can be a push set, but in Granite’s case, I could design for that limitation when writing shaders. The last set index in a VkPipelineLayout would get assigned as a push set. After going push descriptors, I dropped the old UBO_DYNAMIC path, since push descriptors are not compatible with it, and the UBO_DYNAMIC wins were … questionable at best anyway. It took a while to move to this model though. AMD Windows driver was infamously dragging its feet for years before finally accepting reality and at that point I was ready to move over. It’s still not a hard requirement in Granite due to mobile concerns, but then the driver hits the slow path, and I don’t really care anymore At some point, any modern renderer has to deal with this and Granite hit this wall with clustered shading, where an array of shadow maps became a hard necessity. I’m not a big fan of “everything is bindless” myself, since I think it makes debugging way more annoying and stresses tooling and validation more than it should, but sometimes the scissor juggling is necessary. When Granite reflects a shader looking like this: The set layout is converted into an UPDATE_AFTER_BIND set with VARIABLE_COUNT array length. There is also a special helper function to aid in allocating these bindless sets where the API mostly turns into: The CPU overhead of this isn’t quite trivial either, but with the set and pool model, it’s not easy to escape this reality without a lot of rewrites. For now, I only support sampled images with bindless and I never really had any need or desire to add more. For bindless buffers, there is the glorious buffer_device_address instead. This model has served and keeps serving Granite well. Once this model is in place, the only real reason to go beyond this for my use cases is performance (and curiosity). VK_EXT_descriptor_buffer asks the question of what happens when we just remove the worst parts of the descriptor API: Sets are now backed by a slice of memory, and pools are replaced by a big descriptor buffer that is bound to a command buffer. Some warts remain however, as VkDescriptorSetLayout and PipelineLayout persist. If you’re porting from the legacy model like I was, this poses no issues at all, and actually reduces the friction. Descriptor buffers are a perfectly sound middle-ground alternative for those who aren’t a complete bindless junkie yet, but want some CPU gains along the way. In the ideal use case for descriptor buffers, we have one big descriptor buffer that is always bound. This is allocated with PCI-e BAR on dGPUs, so DEVICE_LOCAL | HOST_VISIBLE. Instead of allocating descriptor sets, command buffer performs a linear allocation which is backed by slices allocated from the global descriptor buffer. No API calls needed. The size to allocate for VkDescriptorSet is queried from the set layout itself, and each descriptor is assigned an offset that the driver controls. There is a wart in the spec where the min-spec for sampler descriptor buffers is very small (4K samplers). In this case, there is a risk that just linearly allocating out of the heap will trivially OOM the entire thing and we have to allocate new sampler descriptor buffers all the time. In practice, this limitation is completely moot. Granite only opts into descriptor buffers if the limits are reasonable. There is supposed to be a performance hit to rebinding descriptor buffers, but in practice, no vendor actually ended up implementing descriptor buffers like that. However, since VK_EXT_descriptor_heap will be way more strict about these kinds of limitations, I designed the descriptor_buffer implementation around the single global heap model to avoid rewrites later. There is certainly a risk of going OOM when linearly allocating like this, but I’ve never hit close to the limits. It’s not hard to write an app that would break Granite in half though, but I consider that a “doctor, my GPU hurts when I allocate like this” kind of situation. This is where we should have a major win, but it’s not all that clear. For each descriptor type, I have different strategies on how to deal with them. The basic idea of descriptor buffers is that we can call vkGetDescriptorEXT to build a descriptor in raw bytes. This descriptor can now be copied around freely by the CPU with e.g. memcpy, or even on the GPU in shaders (but that’s a level of scissor juggling I am not brave enough for). These are the simplest ones to contend with. Descriptor buffers still retain the VkImageView and VkSampler object. The main addition I made was to allocate a small payload up front and write the descriptor once. E.g.: Instead of vkUpdateDescriptorSets, we can now replace it with a trivial memcpy. The memcpy functions are function pointers that resolve the byte count. This is a nice optimization since the memcpy functions can unroll to perfectly unrolled SIMD load-store. Allocating bindless sets of sampled images with this method becomes super efficient, since it boils down to a special function that does: I rarely use these, but they are also quite neat in descriptor buffers. VkBufferView is gone now, so we just need to create a descriptor payload once from VkDeviceAddress and it’s otherwise the same as above. This descriptor type is somewhat of a relic these days, but anyone coming from a GL/GLES background instead of D3D will likely use this descriptor type out of old habit, me included. The API here is slightly more unfortunate, since there is no obvious way to create these descriptors up-front. We don’t necessarily know all the samplers an image will be combined with, so we have to do it last minute, calling vkGetDescriptorEXT to create the combined descriptor. We cannot meaningfully pre-create descriptors for UBOs and SSBOs so we’re in a similar situation where we have to call vkGetDescriptorEXT for each buffer last-minute. Unfortunately, there is no array of descriptor version for GetDescriptorEXT, so in the extreme cases, descriptor buffers can actually have worse CPU overhead than legacy model. DXVK going via winevulkan .dll <-> .so translation overhead has been known to hit this, but for everyone else I’d expect the difference to be moot. Since descriptor buffer is an incremental improvement over legacy model, we retain optional support for push descriptors. This can be useful in some use cases (it’s critical for vkd3d-proton), but Granite does need it. Once we’re in descriptor buffer land, we’re locked in. Descriptor buffers are battle tested and very well supported at this point. Perhaps not on very old mobile drivers, but slightly newer devices tend to have it, so there’s that! RenderDoc has solid support these days as well. At a quick glance, descriptor heap looks very similar to D3D12 (and it is), but there are various additions on top to make it more compatible with the various binding models that exist out there in the wild, especially for people who come from a GL/Vulkan 1.0 kind of engine design. The normal D3D12 model has some flaws if you’re not fully committed to bindless all day every day, mainly that: This is to match how some hardware works, nothing too complicated. I allocate for the supported ~1 million resource descriptors and 4096 samplers. There is a reserved region for descriptors as well which is new to this extension. In D3D12 this is all abstracted away since applications don’t have direct access to the descriptor heap memory. For the resource heap, we have a 512 K descriptor area which can be freely allocated from, like we did with descriptor buffer. Unlike descriptor buffer where we hammer this arena allocator all the time, we will only rarely need to touch it with descriptor heap. The next ~500k or so descriptors are dedicated to holding the descriptor payload for VkImageView, VkSampler and VkBufferView. All of these objects are now obsolete. When Granite creates a Vulkan::ImageView, it internally allocates a free slab index from this upper region, writes the descriptor there and stores the heap index instead. This enables “true” bindless in a performant way. We could have done this before if we wanted to, but in descriptor buffer we would have eaten a painful indirection on a lot of hardware, which is not great. Some Vulkan drivers actually works just like this internally. You can easily tell, because some drivers report that an image descriptor is just sizeof(uint32_t). We’d have our index into the “heap”, which gets translated into yet another index into the “true” (hidden) heap. Chasing pointers is bad for perf as we all know. We keep a copy of the descriptor payload in CPU memory too, in case we have to write to the arena allocated portion of the heap later. The upper region of ~10k descriptors or so (depends on the driver) is just a reserved region we bind and never touch. It’s there so that drivers can deal with CmdResolveImage, CmdBlitImage and other such special APIs that internally require descriptors. For samplers, there is no arena allocator. It’s so tiny. Instead, when creating a sampler, we allocate a slab index and return a dummy handle by just pointer casting the index instead. We’ll make good use of the mapping APIs later to deal with this lack of arena allocation. In fact, we will never have to copy sampler descriptor payloads around, and we don’t have to mess around with static samplers either, neat! For the static sampler crowd, there is full support for embedded samplers which functions just like D3D12 static samplers, so there’s that but Granite doesn’t use it. It was a non-trivial amount of code to get to this point, but hey, that’s what happens when you try to support 3 descriptor models at once I guess … Core Vulkan 1.0 settled on 128 bytes of push constants being the limit. This was raised in Vulkan 1.4 but Granite keeps the old limit (I could probably live with 32 or 64 bytes to be fair). Push data expands to 256 byte as a minimum, and the main idea behind descriptor heap is that pipeline layouts are completely gone, and we get to decide how the driver should interpret the push data space. This is similar to D3D12 root parameters except it’s not abstracted behind a SetRootParameter() kind of interface that is called one at a time. In Vulkan, we can call CmdPushDataEXT once. VkPipelineLayout and VkDescriptorSetLayout is just gone now, poof, does not exist at all. This is huge for usability. Effectively, we can pretend that the VkPipelineLayout is now just push constant range of 256 bytes, and that’s it. If you’re fully committed to go bindless, we could just do the equivalent of SM 6.6 ResourceDescriptorHeap and SamplerDescriptorHeap and buffer_device_address to get everything done. However, Granite is still a good old slot based system, so I need to use the mapping features to tell the driver how to translate set/binding into actual descriptors. This mapping can be different per-shader too, which fixes a lot of really annoying problems with EXT_graphics_pipeline_library and EXT_shader_object if I feel like going down that path in the future. The natural thing to do for me was to split up the space into maximum 128 byte push constants, then 32 bytes per descriptor set (I support 4 sets, Vulkan 1.0 min-spec). It’s certainly possible to parcel out the data more intelligently, but that causes some issues with set compatibility which I don’t want to deal with. For every set, I split it up into buffers and images and decide on a strategy for each. Buffers are decided first since they have the largest impact on performance in my experience. This is very simple. If there are 3 or fewer buffers in a set (24 bytes), we can just stuff the raw pointers into push data and tell the driver to use that pointer. This is D3D12 root descriptors in a nutshell. Especially for UBOs, this is very handy for performance. We lose robustness here, but I never rely on buffer robustness anyway. The push data layout looks something like this: This is a new Vulkan speciality. Without modifying the shaders, we can tell the driver to load a buffer device address from a pointer in push data instead. This way we don’t have to allocate from the descriptor heap itself, we can just do a normal linear UBO allocation, write some VkDeviceAddresses in there and have fun. Given the single indirection to load the “descriptor” here, this looks a lot like Vulkan 1.0 descriptor sets, except there’s no API necessary to write them. This isn’t the ideal path, but sometimes we’re forced to allocate from the heap. This can happen if we have one of these cases: This is a pretty much D3D12’s root tables, but in Vulkan we can be a bit more optimal with memory since buffer descriptors tend to be smaller than image descriptors and we can pack them tightly. D3D12 has one global stride for any resource descriptor while Vulkan exposes separate sizes that applications can take advantage of. vkWriteResourceDescriptorsEXT is required here to write the SSBO descriptors. After buffers are parceled out for a descriptor set, we have some space left for images. At minimum, we have 8 bytes left (32 – 3 * sizeof(VkDeviceAddress)). This is the common and ideal case. If we don’t have any arrays of images, we can just have a bunch of uint32_t indices directly into the heap. At image view and buffer view creation time, we already allocated a persistent index into the heap that we can refer to. No API calls required when emitting commands. Combined image samplers work quite well in this model, because Vulkan adds a special mapping mode that packs both sampler index and the image index together. This fixes one of the annoying issues in EXT_descriptor_buffer. If we cannot use the simple inline indices, we have two options. The preferred one right now is to just allocate space in the descriptor heap just like the descriptor buffer path, because I’m quite concerned with unnecessary indirections when possible. At least we get to copy the payloads around without API commands. This path is also used for bindless sets. Unlike the descriptor buffer path, there is a major problem which is that linearly allocating from the sampler heap is not viable. The sampler heap is really small now just like in D3D12. In this case, Vulkan has an answer. This is a special Vulkan feature that functions like an indirect root table. This one is similar to INDIRECT_ADDRESS in that we don’t have to allocate anything from the heap directly and we can just stuff heap indices straight into a UBO. Overall, I think these new mapping types allows us to reuse old shaders quite effectively and it’s possible to start slowly rewriting shaders to take full advantage of descriptor_heap once this machinery is in place. For GPU performance, it seemed to be on-par with the other descriptor models on NVIDIA and AMD which was expected. Granite does not really hit the cases where descriptor_heap should meaningfully improve GPU performance over descriptor_buffer, but I only did a rough glance. For CPU performance, things were a bit more interesting, and I learned that Granite has quite significant overhead on its own, which is hardly surprising. That’s the cost of an old school slot and binding model after all, and I never did a serious optimization pass over it. A more forward looking rendering abstraction can eliminate most, if not all this overhead. The numbers here are for RADV, but it’s using the pending merge request for descriptor_heap support. – ~27 us to write 4096 image descriptors on a Ryzen 3950x with a RX 6800. This is basically exactly the same. ~13 us. This is really just a push_back and memcpy bench at this point. This case hits the optimal inline BDA case for heap. ~ 279 ns per dispatch. Doesn’t feel very impressive. Basically same perf, but lots of overhead has now shifted over to Granite. Certainly things can be optimized further. GetDescriptorEXT is somehow much faster than UpdateDescriptorSetWithTemplate though. ~ 157 ns / dispatch now, and most of the overhead is now in Granite itself, which is ideal. I added an extra buffer descriptor per set which hits the INDIRECT_ADDRESS path. Heap regressed significantly, but it’s all in Granite code at least. Likely related having to page in new UBO blocks, but I didn’t look too closely. ~ 375 ns / dispatch, hnnnnnng. The other paths don’t change much as is expected. About ~ 310 ns / dispatch for legacy and descriptor buffer models. This is the happy path for descriptor heap. ~ 161 ns / dispatch ~ 166 ns. Quite interesting that it got slower. The slab allocator for legacy sets seems to be doing its job very well. The actual descriptor copying vanished from the top list at least. ~ 145 ns. A very modest gain, and most of the overhead is now just Granite jank. All the paths look very similar now. ~ 170 ns or so. On RTX 4070 with 595 drivers. The improvements especially for buffers is quite large on NV, interestingly enough. For the legacy buffer tests, it’s heavily biased towards driver overhead: For the image tests the gains are modest, which is somewhat expected given how NV implements image descriptors before descriptor heap. It’s just some trivial u32 indices. Overall, it’s interesting how well the legacy Vulkan 1.0 model holds up here, at least on RADV on my implementation. Descriptor buffer and heap cannot truly shine unless the abstraction using it is written with performance in mind. This sentiment is hardly new. Just porting OpenGL-style code over to Vulkan doesn’t give amazing gains, just like porting old and crusty binding models won’t magically perform with newer APIs either. Either way, this level of performance is good enough for my needs, and the days of spamming out 100k draw calls is kinda over anyway, since it’s all GPU driven with large bindless data sets these days. Adding descriptor buffer and heap support to Granite was generally motivated by curiosity rather than a desperate need for perf, but I hope this post serves as an example of what can be done. There’s a lot of descriptor heap that hasn’t been explored here. GPU performance for heavily bindless workloads is another topic entirely, and I also haven’t really touched on how it would be more practical to start writing code like: which would side-step almost all Granite overhead. Overall I quite like what we’ve got now with descriptor heap as an API, a bastard child of descriptor buffer and D3D12 that gets the job done. As tooling and driver support matures, I will likely just delete the descriptor buffer path, keeping the legacy stuff around for compatibility. VkDescriptorSet VkDescriptorPool vkUpdateDescriptorSets (kinda) VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE VK_DESCRIPTOR_TYPE_STORAGE_IMAGE VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT VK_DESCRIPTOR_TYPE_SAMPLER VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER VK_DESCRIPTOR_TYPE_STORAGE_BUFFER VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR You very quickly end up having to call CopyDescriptorsSimple a LOT to shuffle descriptors into the heap. Since this is a call into the driver just to copy a few bytes around, it can quickly be a source of performance issues. In vkd3d-proton, we went to hell and back to optimize this case because in many titles, it was the number 1 performance overhead. Dealing with samplers is a major pain. The 2K sampler heap limit can be rather limiting since there is no good way to linearly allocate on such a small heap. Static samplers are quite common as a result, but they have other problems. Recompiling shaders because you change Aniso 4x to 8x in the settings menu is kinda a hilarious situation to be in, but some games have been known to do just that … The shader is using OpArrayLength on an SSBO. We need real descriptors in this case. The current implementation just scans the SPIR-V shader module for this instruction, but could be improved in theory. The shader is using an array of descriptors. For buffers, this should be very rare, but the PUSH_ADDRESS and INDIRECT_ADDRESS interfaces do not support this. Robustness is enabled. Test #1: Write 4096 image descriptors: 17.6 us (copies u32 indices) Test #2: 693 ns Test #3: 726 ns Test #4: 377 ns Test #5: 408 ns Test #1: 10.2 us (copies u32 indices) Test #2: 434 ns Test #3: 479 ns Test #4: 307 ns Test #5: 315 ns Test #1: 11 us (copies real 32 byte descriptors) Test #2: 389 ns Test #3: 405 ns Test #4: 321 ns Test #5: 365 ns

Performance

0 views

DHH 2 months ago

Basecamp becomes agent accessible

In the past 18 months, we've experimented with a ton of AI-infused features at 37signals. Fizzy had all sorts of attempts. As did Basecamp. But as Microsoft and many others have realized, it's not that easy to make something that's actually good and would welcomed by users. So we didn't ship. In the meantime, agents have emerged has the killer app for AI. Not only are LLMs much smarter when they can check their thinking using tools, but the file system also gives them the memory implant they needed to learn between prompts. And now they can actually do stuff! So while we keep cooking on actually-useful native AI features in Basecamp, we're launching a fully agent-accessible version today. We've revamped our API, created a brand-new CLI, and wrapped it all in a skill to teach agents how best to use it all. It works remarkably well, and it's really fast too. Not only can you have your agent look through everything in Basecamp, summarize whatever you need, but it can also set up to-do lists, post message updates, chat with humans and clankers alike, upload reference files, and arrange a project schedule. Anything you can do in Basecamp, agents can now do too. This becomes extra powerful when you combine Basecamp with all the other tools you might be using that are also agent accessible. For software development, you can use the MCP from Sentry to trawl through major sources of bugs, then have the agent summarize that in a message for Basecamp. Or you have it download, analyze, and highlight key customer complaints by giving it access to your help desk system. All this was possible in the past with APIs, hand-written integrations, and human data scientists. But it was cumbersome, slow, and expensive, so most people just didn't. A vanishingly small portion of Basecamp customers have ever directly interacted with our API. But agents? I think adoption is going to be swift. Not because everyone is going to run OpenCode, Claude Code, or Gemini CLI. But because agents are going to be incorporated into ChatGPT, Gemini, Grok, and all the other mainstream interfaces who were collectively embarrassed by OpenClaw's meteoric ascent and popularity very quickly. There's a huge demand out there for a personal agent that can act as your private executive assistant. This is where the puck is going, and we're skating to meet it with agent accessibility across the board. Basecamp is first, Fizzy is next, and we'll hit HEY before long too. Revamped APIs, comprehensive CLIs, and the skills to use them whatever your harness or claws look like.

Swift

Business

0 views

Dominik Weber 3 months ago

Lighthouse update February 23rd

During the past week a couple of nice improvements happened. **Finally implemented a 2 week trial without requiring a credit card** Every user now gets the trial by default. This is a nice improvement because, from what I can observe, in B2C most people want to test the product before entering their credit card. It was also a good step to a better first product experience. **Finished the website to feed feature** The last remaining task was automated finding of items. When you enter a website, it automatically checks it and tries to find relevant items. If items are found, they are highlighted and the selectors added, without users having to do anything. **Updated blogroll editor** This is a small free tool on the Lighthouse website. It's for creating collections of feeds, websites, and newsletters. For a long time I wanted to create collections for specific areas, for example company engineering blogs, AI labs, JavaScript ecosystem, and so on. The reworked blogroll editor makes that much simpler to do. ## Next steps An issue that became important is feed URLs being behind bot protection. It doesn't really make sense to be configured that way, because feed URLs are designed to be accessed by bots, but in some cases it may be difficult to configure properly. This affects only for a small number of feeds, but it's enough to be noticable. It prevents people from moving to Lighthouse from other services. Consequently, one of the next tasks is to fix this. Besides that, the first user experience continues to be an ongoing area of improvement. I have a couple of ideas on how to make it better, and will continuously work on it.

0 views

Simon Willison 3 months ago

Two new Showboat tools: Chartroom and datasette-showboat

I introduced Showboat a week ago - my CLI tool that helps coding agents create Markdown documents that demonstrate the code that they have created. I've been finding new ways to use it on a daily basis, and I've just released two new tools to help get the best out of the Showboat pattern. Chartroom is a CLI charting tool that works well with Showboat, and datasette-showboat lets Showboat's new remote publishing feature incrementally push documents to a Datasette instance. I normally use Showboat in Claude Code for web (see note from this morning ). I've used it in several different projects in the past few days, each of them with a prompt that looks something like this: Here's the resulting document . Just telling Claude Code to run is enough for it to learn how to use the tool - the help text is designed to work as a sort of ad-hoc Skill document. The one catch with this approach is that I can't see the new Showboat document until it's finished. I have to wait for Claude to commit the document plus embedded screenshots and push that to a branch in my GitHub repo - then I can view it through the GitHub interface. For a while I've been thinking it would be neat to have a remote web server of my own which Claude instances can submit updates to while they are working. Then this morning I realized Showboat might be the ideal mechanism to set that up... Showboat v0.6.0 adds a new "remote" feature. It's almost invisible to users of the tool itself, instead being configured by an environment variable. Set a variable like this: And every time you run a or or or command the resulting document fragments will be POSTed to that API endpoint, in addition to the Showboat Markdown file itself being updated. There are full details in the Showboat README - it's a very simple API format, using regular POST form variables or a multipart form upload for the image attached to . It's simple enough to build a webapp to receive these updates from Showboat, but I needed one that I could easily deploy and would work well with the rest of my personal ecosystem. So I had Claude Code write me a Datasette plugin that could act as a Showboat remote endpoint. I actually had this building at the same time as the Showboat remote feature, a neat example of running parallel agents . datasette-showboat is a Datasette plugin that adds a endpoint to Datasette for viewing documents and a endpoint for receiving updates from Showboat. Here's a very quick way to try it out: Click on the sign in as root link that shows up in the console, then navigate to http://127.0.0.1:8001/-/showboat to see the interface. Now set your environment variable to point to this instance: And run Showboat like this: Refresh that page and you should see this: Click through to the document, then start Claude Code or Codex or your agent of choice and prompt: The command assigns a UUID and title and sends those up to Datasette. The best part of this is that it works in Claude Code for web. Run the plugin on a server somewhere (an exercise left up to the reader - I use Fly.io to host mine) and set that environment variable in your Claude environment, then any time you tell it to use Showboat the document it creates will be transmitted to your server and viewable in real time. I built Rodney , a CLI browser automation tool, specifically to work with Showboat. It makes it easy to have a Showboat document load up web pages, interact with them via clicks or injected JavaScript and captures screenshots to embed in the Showboat document and show the effects. This is wildly useful for hacking on web interfaces using Claude Code for web, especially when coupled with the new remote publishing feature. I only got this stuff working this morning and I've already had several sessions where Claude Code has published screenshots of its work in progress, which I've then been able to provide feedback on directly in the Claude session while it's still working. A few days ago I had another idea for a way to extend the Showboat ecosystem: what if Showboat documents could easily include charts? I sometimes fire up Claude Code for data analysis tasks, often telling it to download a SQLite database and then run queries against it to figure out interesting things from the data. With a simple CLI tool that produced PNG images I could have Claude use Showboat to build a document with embedded charts to help illustrate its findings. Chartroom is exactly that. It's effectively a thin wrapper around the excellent matplotlib Python library, designed to be used by coding agents to create charts that can be embedded in Showboat documents. Here's how to render a simple bar chart: It can also do line charts, bar charts, scatter charts, and histograms - as seen in this demo document that was built using Showboat. Chartroom can also generate alt text. If you add to the above it will output the alt text for the chart instead of the image: Or you can use or to get the image tag with alt text directly: I added support for Markdown images with alt text to Showboat in v0.5.0 , to complement this feature of Chartroom. Finally, Chartroom has support for different matplotlib styles . I had Claude build a Showboat document to demonstrate these all in one place - you can see that at demo/styles.md . I started the Chartroom repository with my click-app cookiecutter template, then told a fresh Claude Code for web session: We are building a Python CLI tool which uses matplotlib to generate a PNG image containing a chart. It will have multiple sub commands for different chart types, controlled by command line options. Everything you need to know to use it will be available in the single "chartroom --help" output. It will accept data from files or standard input as CSV or TSV or JSON, similar to how sqlite-utils accepts data - clone simonw/sqlite-utils to /tmp for reference there. Clone matplotlib/matplotlib for reference as well It will also accept data from --sql path/to/sqlite.db "select ..." which runs in read-only mode Start by asking clarifying questions - do not use the ask user tool though it is broken - and generate a spec for me to approve Once approved proceed using red/green TDD running tests with "uv run pytest" Also while building maintain a demo/README.md document using the "uvx showboat --help" tool - each time you get a new chart type working commit the tests, implementation, root level README update and a new version of that demo/README.md document with an inline image demo of the new chart type (which should be a UUID image filename managed by the showboat image command and should be stored in the demo/ folder Make sure "uv build" runs cleanly without complaining about extra directories but also ensure dist/ and uv.lock are in gitignore This got most of the work done. You can see the rest in the PRs that followed. The Showboat family of tools now consists of Showboat itself, Rodney for browser automation, Chartroom for charting and datasette-showboat for streaming remote Showboat documents to Datasette. I'm enjoying how these tools can operate together based on a very loose set of conventions. If a tool can output a path to an image Showboat can include that image in a document. Any tool that can output text can be used with Showboat. I'll almost certainly be building more tools that fit this pattern. They're very quick to knock out! The environment variable mechanism for Showboat's remote streaming is a fun hack too - so far I'm just using it to stream documents somewhere else, but it's effectively a webhook extension mechanism that could likely be used for all sorts of things I haven't thought of yet. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Showboat remote publishing datasette-showboat How I built Chartroom The burgeoning Showboat ecosystem

JSON

0 views

Rob Zolkos 4 months ago

So where can we use our Claude subscription then?

There’s been confusion about where we can actually use a Claude subscription. This comes after Anthropic took action to prevent third-party applications from spoofing the Claude Code harness to use Claude subscriptions. The information in this post is based on my understanding from reading various tweets, official GitHub repos and documentation (some of which may or may not be up to date). I will endeavour to keep it up to date as new information becomes available. I would love to see Anthropic themselves maintain an easily parsable page like this that shows what is and is not permitted with a Claude subscription. We've taken action to prevent third-party clients from spoofing the Claude Code agent harness to use consumer subscriptions. Consumer subscriptions and their benefits should only be used in the Anthropic experiences they support (Claude Code CLI, Claude Code web, and via sessionKey in the Agent SDK). Third-party apps can use the API. From what I can gather, consumer subscriptions work with official Anthropic tools, not third-party applications. If you want third-party integrations, you need the API. The consumer applications (desktop and mobile) are the most straightforward way to use your Claude subscription. Available at claude.com/download , these apps give you direct access to Claude for conversation, file uploads, and Projects. The official command-line interface for Claude Code is fully supported with Claude subscriptions. This is the tool Anthropic built and maintains specifically for developers who want to use Claude in their development workflow. You get the full power of Claude integrated into your terminal, with access to your entire codebase, the ability to execute commands, read and write files, and use all the specialized agents that come with Claude Code. The web version of Claude Code (accessible through your browser at claude.ai/code) provides the same capabilities as the CLI but through a browser interface. Upload your project files, or point it at a repository, and you can work with Claude on your codebase directly. Want to experiment with building custom agents? The Claude Agent SDK lets you develop and test specialized agents powered by your Claude subscription for personal development work. The SDK is available in both Python and TypeScript , with documentation here . This is for personal experiments and development. For production deployments of agents, use the API instead of your subscription. You can use your Claude subscription to run automated agents in GitHub Actions. The Claude Code Action lets you set up workflows that leverage Claude for code review, documentation generation, or automated testing analysis. Documentation is here . Any other uses of Claude would require the use of API keys. Your Claude subscription gives you: Let me know if you have any corrections. Claude desktop and mobile apps for general use Claude Code CLI for terminal-based development Claude Code on the web for browser-based work The ability to build custom agents through the official SDK (for personal development) Claude Code GitHub Action for CI/CD integration

Python

TypeScript

1 views

corsix.org 4 months ago

Thoughts on No Graphics API

0 views

Rob Zolkos 4 months ago

A Month Exploring Fizzy

In their book Getting Real , 37signals talk about Open Doors — the idea that you should give customers access to their data through RSS feeds and APIs. Let them get their information when they want it, how they want it. Open up and good things happen. Fizzy takes that seriously. When 37signals released Fizzy with its full git history available , they didn’t just open-source the code — they shipped a complete API and webhook system too. The doors were wide open baby! So I dove in — reading the source, building tools, and sharing what I found. Every time curiosity kicked in, there was a direct path from “I wonder if…” to something I could actually try and execute. This post is a catch-all for my very bubbly month of December. Fizzy Webhooks: What You Need to Know — I set up a local webhook receiver to capture and document every event type Fizzy sends. The post covers the payload structures, signature verification, and ideas for what you could build on top of the webhook system. The Making of Fizzy, Told by Git — I prompted Claude Code to analyze the entire git history and write a documentary about the development. Vanilla CSS is all you need — Diving into the no-build CSS architecture across Campfire, Writebook, and Fizzy. Fizzy Design Evolution: A Flipbook from Git — I went through each day of commits, got the application to a bootable state, seeded the database, and took a screenshot. Then I stitched those screenshots into a flipbook video with a soundtrack made from Fizzy’s own audio files. Fizzy’s Pull Requests: Who Built What and How — An analysis of who owned which domains in the Fizzy codebase. The post maps contributors to their expertise areas and curates learning paths through the PRs for topics like Turbo/Hotwire, caching, AI integration, multi-tenancy, and webhooks. The open API invited experimentation. I spotted gaps that would make integration easier for other developers, so I filled them: fizzy-api-client — Ruby client for the Fizzy API. fizzy-client-python — Python client for the Fizzy API. fizzy-cli — Command-line interface for the Fizzy API, built first in Ruby and then migrated to Go for portability. fizzy-skill — An AI agent skill for interacting with Fizzy. n8n-nodes-fizzy — An n8n community node that brings Fizzy into your automation workflows. Create cards, manage assignments, and react to real-time events through webhook triggers. Migration tools — I built these to make it easier to try Fizzy without starting from scratch. Migrating your existing issues and boards gives you an immediate sense of how it could work for you, without having to manually create test cards. You can see your real data running in Fizzy from day one, which I think makes it easier to evaluate and decide if its useful for you. I also contributed a few small fixes back to the main repository: Fizzy is released under the O’Saasy License , which is similar in spirit to MIT but includes a restriction on offering the software as a competing hosted or SaaS product. You can modify and self-host it, but you can’t repackage it and sell it as your own hosted service. I built O’Saasy Directory to make it easy to find applications released under this license. Beyond Fizzy, the directory includes other submitted projects where the source is available to read and modify. If you have built something under the O’Saasy License, visit the submission page to add yours. Having built the Fizzy CLI and fizzy-api-client Rubygem, I saw some fun opportunities to build little lab experiments to show how Fizzy could be integrated with - both to power up some functionality that isn’t there yet, but also creating boards in some interesting ways (eg Movie Quiz). I got the idea for this on a flight to Australia with no internet. Just a pad of paper and a pen. I should probably do that more often as a bunch of ideas for all sorts of products came out. CarbonationLabs is not a product per se. It’s an open source Rails application designed to be run locally where you can interact with the hosted or self-hosted versions of Fizzy. If anything I hope it inspires creation of little problem solving workflows for Fizzy that wouldn’t be built into the main product (the problem is too niche). The API and webhook system is really flexible and most of your bespoke problems could be solved with some creative thinking. Introducing Carbonation Labs - fun ways to add experiments to and extend Fizzy (repo link and demo videos below)🧵 I built carbonation.dev to bring together all the tools, libraries, and integrations that I and others in the community have created for Fizzy. It’s a directory covering API clients (Ruby, Python, JavaScript), CLI tools with packages for macOS, Arch Linux, Debian, Fedora, and Windows, integrations for Claude Code and other AI agents, n8n, Raycast, Telegram, and MCP servers, plus migration tools for GitHub, Linear, Asana, and Jira. If you’ve built something for Fizzy, I’d love to feature it. You can submit a pull request to add your tool to the directory. Building the Fizzy CLI pushed me into some new territory. I created an AUR package for Arch Linux users, set up a Homebrew tap for macOS, published my first Python package to PyPI, and made an n8n plugin — all firsts for me. While I already knew Go, rewriting the CLI in it was a fun exercise, and building TUIs for the setup and skill commands introduced me to terminal UI libraries I hadn’t used before. Gosh it was fun! If you want to get better at Rails, Fizzy is a great place to study real-world code. And in my view if you want to work at 37signals as a Rails programmer, digging into Fizzy — along with Campfire and Writebook — is a solid way to learn how they approach Rails architecture and design decisions. Submitting PRs is also a good way to contribute back while learning — just be respectful of the contribution policy . The review discussions give you a window into how to reason about problems, spot opportunities, and make trade-offs. This month pushed parts of my creative thinking that weren’t gone, but definitely weren’t being stressed. Like any muscle, use it or lose it. The direction of what to explore came from my own curiosity and a habit of poking around under the hood, and AI helped me move a lot faster once I knew where I wanted to go. Most of this information already exists somewhere — Google, Stack Overflow, documentation — but having AI right there alongside me as a partner was thrilling. All of this was made possible because a team left the doors open. No one asked me to step inside; I decided to invest the time and do the work to see what I could build, learn and share. I do this at work too—when I can—looking for opportunities I can shape, experiment with, and get genuinely excited about. Most importantly I had fun and I hope you enjoyed following along. linear2fizzy — Migrate Linear issues jira2fizzy — Migrate JIRA issues asana2fizzy — Migrate Asana tasks gh2fizzy — Migrate GitHub Issues prd2fizzy — Convert PRDs to Fizzy cards #2114 — Remove unused install.svg and its CSS class #2111 — Remove unpaired view-transition-name #2095 — Fix typo: minues → minutes #2094 — Fix duplicate word: use use → use #2093 — Add QrCodesController test #2088 — Fix view-transition-name typo in public card show

CSS

0 views

Tenderlove Making 4 months ago

Pixoo64 Ruby Client

I bought a Pixoo64 LED Display to play around with, and I love it! It connects to WiFi and has an on-board HTTP API so you can program it. I made a Ruby client for it that even includes code to convert PNG files to the binary format the sign wants. One cool thing is that the display can be configured to fetch data from a remote server, so I configured mine to fetch PM2.5 and CO2 data for my office. Here’s what it’s looking like so far: Yes, this is how I discovered I need to open a window 😂

Ruby

Hardware

0 views

Massively Parallel Procrastination 5 months ago

Streamlinear, a new MCP for Linear

I've been using Linear as the project and issue tracking tool on a new project. No wait, that's not quite right. My AI coding agents have been using Linear as the project and issue tracking tool on a new project. I've opened Linear's web interface...twice? And I'm pretty sure I've logged into the mobile client. But Claude and friends? They use Linear every day. To date, I've been using the first-party Linear MCP and a third party one that I'd found before Anthropic started publishing an "official" Linear plugin in partnership with Linear. It works great. There's just one problem. The official Linear MCP has 25 tools, using a total of 19,659 tokens of context on every single session. The third-party MCP is a little slimmer at 17k and change. But that's still nearly 10% of the full context window. For every context window. This morning, after breakfast, I sat down and started chatting with Claude about what a better Linear tool might look like. We discussed just using a unix commandline tool. We discussed using a unix commandline tool + a skill . We discussed a Skill + a single-tool MCP client that was just a pure GraphQL client. I asked Claude to read my blog post on MCP design . We ended up with something nice and streamlined. It totals out at 975 tokens, including instructions for how to learn more about how to use the tool. I ended up talking Claude into making the MCP fully self-documenting by including a 'help' action. We ended up compromising on tool design. Claude really thought that it would be fine always reading the instructions and just using raw GraphQL for everything. I overruled it and decided that the most common operations (working with tickets) merited first-class actions. Everything else is GraphQL backed up by the 'help' action. It's called Streamlinear . Ultimately, I'm responsible for the name. I didn't say no. I asked Claude to come up with a list of punny names. Everything else it suggested was being used for a Linear client already. I asked Claude to talk about the new tool and what it's like: This is the tool loadout for the 'official' MCP: And this is what Streamlinear looks like: Give it a spin and let me know how it goes.