Latest Posts (20 found)

rose ▪ bud ▪ thorn - march 2026

Reply via email Published 31 Mar, 2026 I was featured as a Country Reporter on noyb's channels! My summaries made it into their newsletter 4 times this month. I reached Gold Status in my volunteering (20+ summarized and translated decisions for GDPRhub) now. Next up is the Magenta Status at 35+ :) I've written 4 exams this month; if I'd pass all of them, that's 30 ECTS! I think I'll pass 3. Switched away from Discord . I have no issue with being classified as a teen on the platform because it doesn't stop me from doing anything, but the move fit in with living my actual values like I do with other tech/media things (preferring open source, EU, etc.). I'm both on Matrix and Fluxer. Did some spring cleaning, like clearing out the fridge, wiping the inside, and rearranging the contents, together with throwing away expired toiletries, putting like 2 years of used batteries in the battery collection bin, decluttering a drawer, and vacuuming under and behind the sofa and bed. I've really felt like pouring extra energy into my looks lately. Got back into oil massages for my scalp, hair treatments, sheet masks, teeth bleaching, and got my nails done again (after going natural since December) and got a pedicure, too. I bought new dress pants that are so insanely comfortable, good looking and flattering, it's ridiculous! My yearly gyn checkup came back fine, and I finally caved and got proper treatment for my PCOS and endometriosis. I went out for some runs in the late evening :) haven't run outside in ages, I usually limit it to the treadmill. I went out to parks and forests , enjoying the weather and my free time after the exams. It was super healing and relaxing. Journaled more. Went to a vegan food fair. I applied to a job opening sent to me by a fellow blogger (James) and got an interview !!! I think I did well :) Upcoming: More decluttering and selling, tidying up the basement. Planning to go to two museum exhibitions soon before they close. Gonna go on vacation with two friends for 8 days next month! Booked tickets for an upcoming data protection event. Working on business cards (and maybe stickers?) for it. I've had some issues with my illnesses . :( The stress of intense studying most of February and March, weird weather changes, straining work stuff, eating a little too much sugar, the family situation, and starting two new medications this month sent my body over the edge. That made my fitness goals and studying a bit harder. I also unfortunately didn’t taper off a bigger dose of an anti-anxiety med I occasionally take as needed and accidentally caused agonizing withdrawal symptoms without realizing in time 🥴 I cut contact to last family member I was still talking to. It's stressful to withstand all the attempts to reach out to me, and to stick with the decision without guilt. My wardrobe is stressing me a little. I preferred not to own much. Unfortunately, the less you have, the more you wear the same things, the more they get washed and worn out. At some point, you want to replace a lot of it at the same time. That's not only financially hurtful, but also annoying when you have the goal to sew most of your clothes yourself, and you currently neither have the time nor the energy to buy fabric and sew the things you need. I am annoyed at walking into these fast fashion places, seeing nothing I like, then forcing myself to look at stuff more closely and everything is XS, feels like a trash bag, and costs too much for how flimsy and unethical it is. I'll have to try my luck with thrifting more, but even that has been overrun with Shein trash. If I make it to the second interview round, I might have to deny it. I like the company, they’re a great and respected employer, generous, and the interview was fun… but there are some dealbreakers for me, which hurts. I sat with it after, and slept over it now, and I just don’t think I’ll be happy in these circumstances. :( I wish it wasn’t so, because they were in the Top 3 of places I’d wanna work at, and I want a job in data protection badly. But it doesn’t feel right, and I can’t justify moving forward with it, all things considered. It feels like the wrong time for me. Maybe another open position in a couple years?

0 views

Bic Runga At Te Paepae Theatre

What’s going on, Internet? Friday night my wife and I enjoyed a couple hours out in the evening to catch Bic Runga perform the second show on her Red Sunset tour at the recently opened Te Paepae Theatre. We got into town 30 minutes before doors opened, but rather than stress about catching the warm up act we grabbed dinner at an old favourite, Depot. We enjoyed clams, snapper slides, skirt steak and potato skins. Comforting knowing this is the same food we’d get here when we last visited a decade ago. We arrived at Te Paepae around 8 pm, headed up stairs and found our seats. The warm up band which turned out to be Bic’s husband’s band were just finishing up. We had to double check, but yes that was Bic on the drums. After a short 15-20 minute interval the show was back on with Bic taking the mic, and her husband returning the favour on drums. Bic wove in old favourites among new songs from the latest album Red Sunset. Red Sunset is her first album in 15 years (if we don’t include 2016’s cover album). I hadn’t managed to listen to Red Sunset yet, so the new songs were a first listen live. Pretty sure she opened with Drive. The new songs sounded great and I can’t wait to dive into the album on an upcoming roadtrip. As the show drew to an end Bic let us know that the show was wrapping up and rather than piss around with leaving the stage and coming back for the encore she got straight into her biggest and favourite track, Sway. What a tune to end the show with. After the show Bic headed straight to the merch tent and was signing vinyls and CDs and posing for selfies with fans. I grabbed a copy of her 1997 album Drive and got it signed by her. The vinyl itself wasn’t anything special. A single cardboard sleeve with a standard black vinyl. Sony obviously didn’t put a lot of effort into the production of this classic kiwi album. Drive will always be a favourite of mine, it is one of my earliest memories of really getting into kiwi music. During third form (Year 9) for a music class project we had to find a local artist to do a report on, I wasn’t clued up on local music back then like I am now. Dad shared a newspaper or magazine article on this new album from a 17 year old, Bic Runga. And that I would say was my awakening to local music. It wasn’t too long until the third labour government under Helen Clark would invest heavily into the arts and we’d all be exposed to kiwi music for a solid few years. Leaving the venue with my wife and the signed copy of Drive in my hand was a nice way to wrap up the evening and a good reminder of why I love kiwi music. Hey, thanks for reading this post in your feed reader! Want to chat? Reply by email or add me on XMPP , or send a webmention . Check out the posts archive on the website.

0 views

I’m returning my Studio Display XDR and buying another one

Sooo… I did a thing. I couldn’t help but be slightly dissatisfied by the clarity of my Studio Display XDR ’ s nano-texture display. It just made everything look a little less than Retina-quality. And for this price, I don’t want to have lingering regrets each time I use it. So, I ordered a second non-nano-texture version, banking on Apple’s generous return policy . It came in today. I set it up about 30 minutes ago. I put the two displays side by side and… it’s no question. The nano-texture is going back. Showing the same content on each display, at the same brightness level, I can absolutely see the fuzziness introduced by the “ matte” display. It’s not that nano-texture is all bad. I love how it looks when the display is dark — there are zero reflections. 1 But the point is to enjoy it while the display is on . Without nano-texture, everything is as crisp as I had hoped. I tend to lean toward the display when I’m concentrating, and even close up, the display is razor sharp. I technically have until April 9th to send back the nano-texture XDR , but, honestly, I think I’m going to package it up tonight. Well… maybe tomorrow. I might as well enjoy having 10k pixels of display at my disposal while I can. If I hold onto the original display until the last day that I can send it back, I will have had it for 24 days. That’s a full 10 extra days beyond the stated 14-day return period. It’s possible that I could have squeezed in even a few more days by initiating the return today, the 14th day after it was delivered, instead of the 11th. With that in mind, one could get nearly a month of use for testing and comparison of Apple’s products, with the ability to return it (free shipping both ways) for a full refund. That’s serious commitment to customer satisfaction, and one area where Apple’s standards haven’t slipped. To boot, by paying with Apple Card’s Monthly Installments (which allow you to pay for an item over 12 months with 0% interest), I’ve only been charged $287.92 for the nano-texture display, and $263.92 for the regular one. I think that was just the taxes for each one. To be sure, it’s a privileged position I’m in to be able to do these shenanigans, but there’s a lot to be said for how easy Apple has made it to purchase even it’s most expensive products with very little risk. If I were in an environment with light sources behind me, my decision might be very different. I think there’s definitely a place for this non-reflective display — it’s just not in my home office. ↩︎ HeyDingus is a blog by Jarrod Blundy about technology, the great outdoors, and other musings. If you like what you see — the blog posts , shortcuts , wallpapers , scripts , or anything — please consider leaving a tip , checking out my store , or just sharing my work. Your support is much appreciated! I’m always happy to hear from you on social , or by good ol' email . If I were in an environment with light sources behind me, my decision might be very different. I think there’s definitely a place for this non-reflective display — it’s just not in my home office. ↩︎

0 views

Continuous, Continuous, Continuous

Jason Gorman writes about the word “continuous” and its place in making software. We think of making software in stages (and we often assign roles to ourselves and other people based on these stages): the design phase, the coding phase, the testing phase, the integration phase, the release phase, and so on. However this approach to building and distributing software isn’t necessarily well-suited to an age where everything moves at breakneck speed and changes constantly. The moment we start writing code, we see how the design needs to change. The moment we start testing, we see how the code needs to change. The moment we integrate our changes, we see how ours or other people’s code needs to change. The moment we release working software into the world, we learn how the software needs to change. Making software is a continuous cycle of these interconnected stages: designing, coding, testing, integrating, releasing. But the lines between these stages are very blurry, and therefore the responsibilities of people on our teams will be too. The question is: are our cycles for these stages — and the collaborative work of the people involved in them — measured in hours or weeks? Do we complete each of these stages multiple times a day, or once every few weeks? if we work backwards from the goal of having working software that can be shipped at any time, we inevitably arrive at the need for continuous integration, and that doesn’t work without continuous testing, and that doesn’t work if we try to design and write all the code before we do any testing. Instead, we work in micro feedback loops, progressing one small step at a time, gathering feedback throughout so we can iterate towards a good result. Feedback on the process through the process must be evolutionary. You can’t save it all up for a post-mortem or a 1-on-1. It has to happen at the moment, evolving our understanding one piece of feedback at a time (see: Gall’s law , a complex system evolves from a simpler one). if code craft could be crystallised in one word, that word would be “continuous”. Your advantage in software will be your ability to evolve and change as your customers expectations evolve and change (because the world evolves and changes), which means you must be prepared to respond to, address, and deliver on changes in expectations at any given moment in time. Reply via: Email · Mastodon · Bluesky

0 views
ava's blog Yesterday

some silly art

Made some silly art of my online friends and myself today, redrawing memes or other images I saw online. I love mango. (this is referencing this meme ) These are Suliman as purple Keroppi and Mono as a mix of his Jiji icon and Googie . >:). ( orginal art ) This is Kami :3 ( original art from an anime called House of the Sun) Reply via email Published 30 Mar, 2026

0 views

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Trip Venturella released Mr. Chatterbox , a language model trained entirely on out-of-copyright text from the British Library. Here's how he describes it: Mr. Chatterbox is a language model trained entirely from scratch on a corpus of over 28,000 Victorian-era British texts published between 1837 and 1899, drawn from a dataset made available by the British Library . The model has absolutely no training inputs from after 1899 — the vocabulary and ideas are formed exclusively from nineteenth-century literature. Mr. Chatterbox's training corpus was 28,035 books, with an estimated 2.93 billion input tokens after filtering. The model has roughly 340 million paramaters, roughly the same size as GPT-2-Medium. The difference is, of course, that unlike GPT-2, Mr. Chatterbox is trained entirely on historical data. Given how hard it is to train a useful LLM without using vast amounts of scraped, unlicensed data I've been dreaming of a model like this for a couple of years now. What would a model trained on out-of-copyright text be like to chat with? Thanks to Trip we can now find out for ourselves! The model itself is tiny, at least by Large Language Model standards - just 2.05GB on disk. You can try it out using Trip's HuggingFace Spaces demo : Honestly, it's pretty terrible. Talking with it feels more like chatting with a Markov chain than an LLM - the responses may have a delightfully Victorian flavor to them but it's hard to get a response that usefully answers a question. The 2022 Chinchilla paper suggests a ratio of 20x the parameter count to training tokens. For a 340m model that would suggest around 7 billion tokens, more than twice the British Library corpus used here. The smallest Qwen 3.5 model is 600m parameters and that model family starts to get interesting at 2b - so my hunch is we would need 4x or more the training data to get something that starts to feel like a useful conversational partner. But what a fun project! I decided to see if I could run the model on my own machine using my LLM framework. I got Claude Code to do most of the work - here's the transcript . Trip trained the model using Andrej Karpathy's nanochat , so I cloned that project, pulled the model weights and told Claude to build a Python script to run the model. Once we had that working (which ended up needing some extra details from the Space demo source code ) I had Claude read the LLM plugin tutorial and build the rest of the plugin. llm-mrchatterbox is the result. Install the plugin like this: The first time you run a prompt it will fetch the 2.05GB model file from Hugging Face. Try that like this: Or start an ongoing chat session like this: If you don't have LLM installed you can still get a chat session started from scratch using uvx like this: When you are finished with the model you can delete the cached file using: This is the first time I've had Claude Code build a full LLM model plugin from scratch and it worked really well. I expect I'll be using this method again in the future. I continue to hope we can get a useful model from entirely public domain data. The fact that Trip was able to get this far using nanochat and 2.93 billion training tokens is a promising start. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
iDiallo Yesterday

How Do We Get Developers to Read the Docs

When I reviewed this PR, I had tears in my eyes. We had done it. We had finally created the perfect API. To top it off, the senior developer who worked on it had written documentation to match. No stones were left unturned. I had the code open on one window and the doc on the other. The moment I felt hesitation in the code, the documentation reassured me. Why do we make two calls to get the... "We are fetching two types of orders to support legacy subscribers..." the documentation answered before I completed my question. This was standard number 15 . The one to rule them all. But I still had one question. As the owner of the API, I read the documentation. Will other developers ever think to read it? How do I get people to want to read the documentation before they use this API? Because in my experience, nobody reads the documentation. Not to say that documentation is useless, but my mistake was thinking that the people who want to implement the API are interested in documentation at all. For every API ever built, there are two audiences to cater to, and confusing them is where most documentation goes wrong. The first group is the consumers of the API. The only thing they want to know is: do the endpoints do what I need, and what parameters do they take? They are not reading your documentation like a book. They are scanning it like a menu. They want to find the thing they need, copy the example, and move on. The second group is the maintainers of the API. The people who need to understand the why behind every decision. Why are there two calls? Why does this endpoint behave differently for legacy users? Why is this field nullable? These are the people who will be debugging at 2am, and they need the full picture. The worst thing you can do is write one document that tries to serve both audiences equally. You end up with something that's too deep for the first group to skim, and not structured enough for the second group to find it useful. For the first audience, the API should speak for itself. The best documentation you can provide is not text to read through, but a well-designed API. Follow clear, repeatable patterns where the user can anticipate, or even assume the available features. If you have an endpoint called , the assumption should be that returns a specific order. If you add , there should probably be a too. When the pattern is consistent, the consumer doesn't need to read anything, they just guess correctly. When you do write documentation for this audience, resist the urge to explain your internals. They don't need to know that you're fetching from two different database tables to support legacy subscribers. What they need to know is: . One sentence. Done. I like this idiom: "Too much information and no information, accomplish the same goal." This is a mistake I see most often. It's a painful one because it comes from a good place. The writer of the documentation, usually the person who built the thing, feels a sense of responsibility. They want to be thorough. They want no one to be confused. So they write everything down. The result is a documentation page that looks like this: This endpoint retrieves orders for a given user. It was introduced in v2.3 of the API following the migration from the legacy order management system (OMS) in Q3 2021. Internally, the resolver makes two sequential calls (one to the new orders table and one to the legacy_orders table) and merges the results using the order ID as a deduplication key. Note that legacy orders may not contain a field, which was not captured before 2019. If you are building a UI, you should account for this possibility. The endpoint also supports cursor-based pagination, though offset-based pagination is available for backward compatibility with clients built before v2.1. Additionally, orders in a state may not appear immediately... A developer scanning this page will read the first sentence, close the tab, and think about designing API standard number 16. They'll go look at the codebase instead, or ping a teammate, or just guess. The documentation existed, it just didn't get read. Which means it accomplished exactly the same thing as having no documentation at all. The same way you don't write a comment to explain every line of code, a documentation doesn't benefit from too much information. My go to solution isn't to omit information, but to write it in layers. Collapsible sections are one of the most underrated tools in documentation design. They let the consumer skim the surface: endpoint name, what it returns, a working example. And they let the maintainer dive deeper into the implementation notes, the edge cases, and the historical context. The same principle applies to how you order information. Lead with what the API does. Follow with how to use it. Bury the why at the bottom, behind a toggle or a "Details" section, available to those who need it, invisible to those who don't. Think of it like a well-designed error message. A good error message tells you what went wrong in plain language. A great error message also includes an expandable stack trace, but it doesn't show you the stack trace first. Your documentation has the same job. Give people the answer they're looking for, and then offer the depth to those willing to dig. The second audience, the maintainers, do need the full picture. The two database calls, the deduplication logic, the historical reason the field is sometimes null. This is the documentation that prevents a future developer from "fixing" something that wasn't broken, or removing what looks like redundant code. But this documentation doesn't have to live on the same page as the quick-start guide. Deep implementation notes belong in inline code comments or a separate internal wiki. The public-facing API reference should stay clean. When you separate operational documentation (for consumers) from institutional documentation (for maintainers), both documents get better. The consumer doc gets shorter and clearer. The maintainer doc gets deeper because it's no longer trying to also be beginner-friendly. The goal of documentation isn't completeness. Completeness is what you write for yourself, to feel like you've done your job. The goal of documentation is to transfer the right information into the right person's head at the right moment. That senior developer who wrote the documentation I cried over understood this. She didn't write everything she knew. She wrote exactly what someone reading the code would need to know, at the exact moment they'd need it. And the API design allowed anyone consuming it to make correct assumptions (intuitive design) on how it works. Both groups are happy.

0 views
neilzone Yesterday

Three months of not reading the news

Three months ago, I stopped reading the news . I made a note to force myself to reflect on it, after three months, and this is that reflection. I still read lots of RSS feeds of people’s blogs. I love this. I still read industry-specific news sites (mainly law-related stuff), and other sources of information which are often the basis of news coverage (e.g. government or regulator press releases and updates). I still read local news, but wow is that a rubbish experience. I get that local news needs funding to survive, but making the product so unappetising makes selling me a subscription a very hard sell indeed. Frankly, I could probably just not read the local news and keep an eye on the local council’s roadworks website instead. I still have my 404Media subscription although, to be honest, I am a bit on the fence about it. I am not sure if I will renew it or not at this point. No slight to the quality of their journalism. What I have basically stopped doing is reading the BBC, the FT, the Guardian etc. I had not appreciated just how conditioned I was to reading the news when I had a spare moment. It took me quite a while to get used to the idea of not opening the BBC website, in particular. I did not go to the extent of blocking news sites, so this was just based on self-control / choosing not to do it. Curiously, what I found hard was that almost instinctive “fingers move to open a news site” behaviour, rather than actually missing reading the news. I had to train myself out of it, and now, it doesn’t cross my mind. I have not managed to avoid general news entirely, nor was I really intended to do so. This was about lessening my exposure, rather than doing all that I can to avoid it. I still see people posting news-related stories in the fediverse, and I just scroll on by. In some cases, I can filter by keywords, and so no If someone posts news too much (or, in particular, posts party political stuff), I either unfollow them or mute them. I’ve no temptation to click the links. Yes, and that is by design! Before, I was informed about a whole load of things, in a way, and to an extent, that I didn’t find helpful or healthy. Now, I am aware, in broad terms, of major stuff going on around the world, but I am far less familiar with the minutiae, or the endless “up to the minute” reporting. That feels like a good level of awareness for me. I am also far less exposed to stuff that I never cared about in the first place, especially “celebrity” news, of which I remain blissfully ignorant, sport, and so on. To each, their own. For now, anyway, I don’t miss reading the news. I’ve overcome that reflex of opening a news site. I have not - as far as I know, anyway, which I appreciate is quite a caveat - missed anything which, had I known about it, would have made a significant difference to anything important. I read far more books (and buying the tiny, pocketable, X4 ereader was an attempt to distract me from my phone more often, letting me read even more). So I am going to carry on with this experiment for now, and see how I get on. I can’t prove that this experiment has been good for my mental health, but it certainly feels that way. Even though I do not want to read the news, I wonder if a monthly, edited, one-or-two page kind of approach, of key / important news stories, might be welcome. Of course, there would be complexity in determining what is “key” or “important”, as that is subjective.

2 views
Rik Huijzer Yesterday

Biblical Earth

This is an image by ChatGPT showing roughly how the Bible describes the earth: ![biblical-earth.png](/files/435339c4aa439f2a) Unfortunately, it's not showing the four corners. This image is based on, Isaiah 40:22 “It is he that sitteth upon the circle of the earth, and the inhabitants thereof are as grasshoppers…” Job 26:10 “He hath compassed the waters with bounds, until the day and night come to an end.” Job 26:7 “He stretcheth out the north over the empty place, and hangeth the earth upon nothing.” Proverbs 8:27 “When he prepared the heavens, I was there: when he set a...

0 views
HeyDingus Yesterday

7 Things This Week [#184]

A weekly list of interesting things I found on the internet, posted on Sundays. Sometimes themed, often not. 1️⃣ iWeb lives! Sort of. If you have an old Mac. But Corbin Davenport made an iWeb website just a few months ago and, honestly, it looks pretty awesome. [ 🔗 iweb.corbin.io ] 2️⃣ Manton Reece shared a letter that his mother wrote to her mother many years ago while she (and Manton) lived in Greece. A lovely snapshot in time. [ 🔗 manton.org ] 3️⃣ Alpinesavvy shared a story about drinking water that has me thinking about the counterproductive choices I make. [ 🔗 alpinesavvy.com ] 4️⃣ The first use of “ Wendy” as a first name was in Peter Pan . We still don’t know what it was short for! [ 🔗 wikipedia.org ] 5️⃣ Rands did the work and made incredible data tables with all the good charging/charger brick information for modern Apple devices. [ 🔗 randsinrepose.com ] 6️⃣ Todd Vaziri shows why having only two dots instead of three to represent outs in a baseball score graphic is just wrong. Looking at you, Netflix. [ 🦣 mastodon.social ] 7️⃣ The Midleton Mule was a featured drink at our St. Patrick’s Day meal and I can’t stop thinking about it. It was so fresh and delightful. Gonna have to make it at home! [ 🔗 gelsons.com ] Thanks for reading 7 Things . If you enjoyed these links or have something neat to share, please let me know . And remember that you can get more links to internet nuggets that I’m finding every day by following me @jarrod on the social web. HeyDingus is a blog by Jarrod Blundy about technology, the great outdoors, and other musings. If you like what you see — the blog posts , shortcuts , wallpapers , scripts , or anything — please consider leaving a tip , checking out my store , or just sharing my work. Your support is much appreciated! I’m always happy to hear from you on social , or by good ol' email .

1 views

Resolving Dependabot Issues with Claude Code

I created a Claude skill creatively called dependabot which once installed you can invoke like this: It will use the GitHub CLI to retrieve open Dependabot alerts and upgrade the relevant dependencies. If you have multiple GitHub accounts logged in via the CLI it will ask which one it should use if it can't figure it out based on how the skill was invoked or based on the repository settings. You can find the skill here: https://github.com/wjgilmore/dependabot-skill To install it globally, open a terminal and go to your home directory, then into and clone there. Then restart Claude Code and you should be able to invoke it like any other skill. Here is some example output of it running on one of my projects:

0 views
Xe Iaso Yesterday

Small note about AI 'GPUs'

I've been seeing talk around about wanting to capitalize on the AI bubble popping and picking up server GPUs for pennies on the dollar so they can play games in higher fidelity due to server GPUs having more video ram. I hate to be the bearer of bad news here, but most of those enterprise GPUs don't have the ability to process graphics. Yeah, that's right, in order to pack in as much compute as possible per chip, they removed video output and graphics processing from devices we are calling graphics processing units . The only thing those cards will be good for is CUDA operations for AI inference, AI training, or other things that do not involve gaming. On a separate note, I'm reaching the point in recovery where I am getting very bored and am so completely ready to just head home. At least the diet restrictions end this week, so that's something to look forward to. God I want a burrito.

0 views

My rainbow sweater

My sister got me a rainbow cardigan sweater a couple years ago for Christmas that is very fluffy and floppy. It doesn’t have pockets, it doesn’t have buttons, it just kind of drapes on me and is like a small blanket with arms. It’s not a practical sweater, but it’s cozy. Because it’s not practical, I always have to remember to, for example, wear only pants that have pockets with it, so that I can put my stuff (phone, lip balm, etc) somewhere. I always have to wear certain shirts that don’t bunch up in a certain way when the sweater is feeling extra floppy. It’s just… not the most convenient sweater. But hoo boy, my babies love my rainbow sweater. My oldest loves to sit on my lap and have me envelop her in it in a hug. My youngest loves to bury his face in it when he’s sleepy. Both of them love to pet it because it’s so soft. They admire the colors. They tangle their fingers in it and hold on tight to the loops. They flop with the sweater, and with me, and it’s the coziest thing in the world. I love, love, love putting on this sweater. I get a little giddy thinking about how the babies will gravitate towards it as soon as they see the thick loops plopped across my shoulders. It’s impractical, and it’s weird, but it brings me the best warm cuddles ever.

0 views
Den Odell Yesterday

You're Looking at the Wrong Pretext Demo

Pretext , a new JavaScript library from Cheng Lou, crossed 7,000 GitHub stars in its first three days. If you've been anywhere near the frontend engineering circles in that time, you've seen the demos: a dragon that parts text like water , fluid smoke rendered as typographic ASCII , a wireframe torus drawn through a character grid , multi-column editorial layouts with animated orbs displacing text at 60fps . These are visually stunning and they're why the library went viral. But they aren't the reason this library matters. The important thing Pretext does is predict the height of a block of text without ever reading from the DOM. This means you can position text nodes without triggering a single layout recalculation. The text stays in the DOM, so screen readers can read it and users can select it, copy it, and translate it. The accessibility tree remains intact, the performance gain is real, and the user experience is preserved for everyone. This is the feature that will change how production web applications handle text, and it's the feature almost nobody is demonstrating. The community has spent three days building dragons. It should be building chat interfaces. And the fact that the dragons went viral while the measurement engine went unnoticed tells us something important about how the frontend community evaluates tools: we optimize for what we can see, not for what matters most to the people using what we build. The problem is forced layout recalculation, where the browser has to pause and re-measure the page layout before it can continue. When a UI component needs to know the height of a block of text, the standard approach is to measure it from the DOM. You call or read , and the browser synchronously calculates layout to give you an answer. Do this for 500 text blocks in a virtual list and you've forced 500 of these pauses. This pattern, called layout thrashing , remains a leading cause of visual stuttering in complex web applications. Pretext's insight is that uses the same font engine as DOM rendering but operates outside the browser's layout process entirely. Measure a word via canvas, cache the width, and from that point forward layout becomes pure arithmetic: walk cached widths, track running line width, and insert breaks when you exceed the container's maximum. No slow measurement reads, and no synchronous pauses. The architecture separates this into two phases. does the expensive work once: normalize whitespace, segment the text using for locale-aware word boundaries, handle bidirectional text (such as mixing English and Arabic), measure segments with canvas, and return a reusable reference. is then pure calculation over cached widths, taking about 0.09ms for a 500-text batch against roughly 19ms for . Cheng Lou himself calls the 500x comparison "unfair" since it excludes the one-time cost, but that cost is only paid once and spread across every subsequent call. It runs once when the text appears, and every subsequent resize takes the fast path, where the performance boost is real and substantial. The core idea traces back to Sebastian Markbage's research at Meta, where Cheng Lou implemented the earlier prototype that proved canvas font metrics could substitute for DOM measurement. Pretext builds on that foundation with production-grade internationalization, bidirectional text support, and the two-phase architecture that makes the fast path so fast. Lou has a track record here: react-motion and ReasonML both followed the same pattern of identifying a constraint everyone accepted as given and removing it with a better abstraction. The first use case Pretext serves, and the one I want to make the case for, is measuring text height so you can render DOM text nodes in exactly the right position without ever asking the browser how tall they are. This isn't a compromise path, it's the most capable thing the library does. Consider a virtual scrolling list of 500 chat messages. To render only the visible ones, you need to know each message's height before it enters the viewport. The traditional approach is to insert the text into the DOM, measure it, and then position it, paying the layout cost for every message. Pretext lets you predict the height mathematically and then render the text node at the right position. The text itself still lives in the DOM, so the accessibility model, selection behavior, and find-in-page all work exactly as they would with any other text node. Here's what that looks like in practice: Two function calls: the first measures and caches, the second predicts height through calculation. No layout cost, yet the text you render afterward is a standard DOM node with full accessibility. The shrinkwrap demo is the clearest example of why this path matters. CSS sizes a container to the widest wrapped line, which wastes space when the last line is short. There's no CSS property that says "find the narrowest width that still wraps to exactly N lines." Pretext's calculates the optimal width mathematically, and the result is a tighter chat bubble rendered as a standard DOM text node. The performance gain comes from smarter measurement, not from abandoning the DOM. Nothing about the text changes for the end user. Accordion sections whose heights are calculated from Pretext, and masonry layouts with height prediction instead of DOM reads: these both follow the same model of fast measurement feeding into standard DOM rendering. There are edge cases worth knowing about, starting with the fact that the prediction is only as accurate as the font metrics available at measurement time, so fonts need to be loaded before runs or results will drift. Ligatures (where two characters merge into one glyph, like "fi"), advanced font features, and certain CJK composition rules can introduce tiny differences between canvas measurement and DOM rendering. These are solvable problems and the library handles many of them already, but acknowledging them is part of taking the approach seriously rather than treating it as magic. Pretext also supports manual line layout for rendering to Canvas, SVG, or WebGL. These APIs give you exact line coordinates so you can paint text yourself rather than letting the DOM handle it. This is the path that went viral, and the one that dominates every community showcase. The canvas demos are impressive and they're doing things the DOM genuinely can't do at 60fps. But they're also painting pixels, and when you paint text as canvas pixels, the browser has no idea those pixels represent language. Screen readers like VoiceOver, NVDA, and JAWS derive their understanding of a page from the accessibility tree, which is itself built from the DOM, so canvas content is invisible to them. Browser find-in-page and translation tools both skip canvas pixels entirely. Native text selection is tied to DOM text nodes and canvas has no equivalent, so users can't select, copy, or navigate the content by keyboard. A element is also a single tab stop, meaning keyboard users can't move between individual words or paragraphs within it, even if it contains thousands of words. In short, everything that makes text behave as text rather than an image of text disappears. None of this means the canvas path is automatically wrong. There are legitimate contexts where canvas text rendering is the right choice: games, data visualizations, creative installations, and design tools that have invested years in building their own accessibility layer on top of canvas. For SVG rendering, the trade-offs are different again, since SVG text elements do participate in the accessibility tree, making it a middle ground between DOM and canvas. But the canvas path is not the breakthrough, because canvas text rendering has existed for fifteen or more years across dozens of libraries. What none of them offered was a way to predict DOM text layout without paying the layout cost. Pretext's and do exactly that, and it's genuinely new. This pattern often repeats across the frontend ecosystem, and I understand why. A dragon parting text like water is something you can record as a GIF, post to your socials, and collect thousands of impressions. A virtual scrolling list that pre-calculates text heights looks identical to one that doesn't. The performance difference is substantial but invisible to the eye. Nobody makes a showcase called "works flawlessly with VoiceOver" or "scrolls 10,000 messages without a single forced layout" because these things look like nothing. They look like a web page working the way web pages are supposed to work. This is Goodhart's Law applied to web performance: once a metric becomes a target, it ceases to be a good measure. Frame rate and layout cost are proxies for "does this work well for users." GitHub stars are a proxy for "is this useful." When the proxy gets optimized instead, in this case by visually impressive demos that happen to use the path with the steepest accessibility trade-offs, the actual signal about what makes the library important gets lost. The library's identity gets set by its most visually impressive feature in the first 72 hours, and the framing becomes "I am drawing things" rather than "I am measuring things faster than anyone has before." Once that framing is set, it's hard to shift. The best text-editing libraries on the web, CodeMirror , Monaco , and ProseMirror , all made the deliberate choice to stay in the DOM even when leaving it would have been faster, because the accessibility model isn't optional. Pretext's DOM measurement path belongs in that tradition but goes further: those editors still read from the DOM when they need to know how tall something is. Pretext eliminates that step entirely, predicting height through arithmetic before the node is ever rendered. It's the next logical step in the same philosophy: keep text where it belongs, but stop paying the measurement cost to do so. I've been thinking about performance engineering as a discipline for most of my career, and what strikes me about Pretext is that the real innovation is the one that is hardest to see. Predicting how text will lay out before it reaches the page, while keeping the text in the DOM and preserving everything that makes it accessible, is a genuinely new capability on the web platform. It's the kind of foundational improvement that every complex text-heavy application can adopt immediately. If you're reaching for Pretext this week, reach for and first. Build something that keeps text in the DOM and predicts its height without asking the browser. Ship an interface that every user can read, select, search, and navigate. Nobody else has done this yet, and it deserves building. Performance engineering is at its best when it serves everyone without asking anyone to give something up. Faster frame rates that don't make someone nauseous. Fewer layout pauses that mean a page responds when someone with motor difficulties needs it to. Text that is fast and readable and selectable and translatable and navigable by keyboard and comprehensible to a screen reader. The dragons are fun. The measurement engine is important. Let's try not to confuse the two.

0 views

The Transposed Organization

One of the most common mistakes when building multi-agent systems is giving each sub-agent a specialized role: the “dev ops” agent, the “tester” agent, the “senior architect” agent. It mirrors how we organize humans but in my experience it doesn’t work well 1 . I’m starting to think the same thing is true for the humans who make up an organization. Most companies organize by what people know how to do but as AI pushes towards generalists (“Product Engineers”) and faster iteration cycles (“software factories”), I think they should instead organize by what problems people close. Cartoon via Nano Banana. This post proposes Company T (“company-transpose” 2 ), an operation that fundamentally flips how software companies organize talent and operate. Why I think this will happen, what will go wrong, and how an AI-native software company probably doesn’t even really look like a ‘software company’. The transposed organization organizes around what I’ll call “loops” rather than traditional specialities. A loop is the full chain of decisions between a problem and a deployed solution, owned by one person. It’s recurring, not one-shot, which is what makes it an org design unit rather than a project. Customer bugs → fixes in prod is a loop. Revenue pipeline → closed deals is a loop. The strongest loops close back to an external source (the customer who reported the bug gets the fix), but internal loops exist too. Here’s today’s org matrix. Rows are problems to solve, columns are specialist roles. Each cell is the person who handles that step. A single problem touches many people: A toy example of the org matrix, where rows are problems and columns are specialists. Traditional organizations organize around specialists. Read down any column: the same set of specialists handle that function across every problem. Bob writes the code, Carol does the design, Dave handles ops. A customer reports a bug and it touches five people across four handoffs. Frank, the person closest to the customer, has zero ability to fix the problem. Bob, the person who can fix it, has no direct relationship with the customer. Now transpose it. Same grid, but read across any row: the same generalist closes every step of that problem. A toy example of the “transposed” org matrix, where rows are loops and columns are specialties. Transposed organizations organize around loops. Bold means native skill; regular means AI-augmented. Alice was a PM; now she owns the full product loop, with agents handling the code, design, and ops she couldn’t do before. Eve was in sales; now she closes the full revenue loop. Frank was in support; now he diagnoses and fixes issues rather than just logging them. The loop owner’s role is closer to an architect than a traditional individual contributor. They’re not writing code the way an engineer writes code. They’re directing agent execution, making architectural and judgment calls across the full chain, deciding what to build and what “good” looks like before delivery. The primary output is decisions, not artifacts. Bob, Carol, and Dave didn’t disappear. Their roles shifted from direct execution to platform: encoding their specialist taste into the systems, guardrails, and context that every loop owner’s agents rely on. While this has always been the promise of platform teams, AI makes it more tractable because the encoding target is now a context layer that agents can interpret. It’s still hard, but it’s now the dedicated role rather than a side-effect. Bob doesn’t write the fix for Alice’s customer bug anymore, but he built the testing framework and code quality standards that Alice’s agents run against. Carol encoded the design system that ensures Eve’s agent-generated demos meet the brand bar. Dave built the deployment pipeline and observability stack that Frank’s agents use to ship and monitor fixes. Their expertise became infrastructure. They run their own loops now (platform reliability, design systems, etc.), and their output compounds across every other loop without requiring direct coordination with any of them. The relay chain between them collapsed into a shared context layer, made possible by agents that can absorb and apply specialist judgment at the point of execution rather than requiring the specialist to be in the chain. Today this works for some loops better than others. The gap between agent-assisted and specialist-quality output is real, and it narrows unevenly. A PM shipping production code via agents is not yet the same as a staff engineer shipping it. But the trajectory is directional, and the organizational question is whether you begin to restructure around where the capability is heading, wait until it arrives, or continue to assume things will stay where they are today. Thanks for reading Shrivu’s Substack! Subscribe for free to receive new posts and support my work. Every handoff in a relay chain has (at least) three costs: delay (waiting for the next person to pick it up), context loss (the next person only gets a summary of what the last person understood), and coordination overhead (syncs, updates, ticket management). These costs can be so deeply embedded in how companies operate that often we’ve stopped seeing them as costs. In a loop, there’s nobody to hand off to. The same person who heard the customer describe the problem is the one deploying the fix. There’s no “let me loop in engineering.” The context is native because the same human carried it the entire way. For external-facing loops, the output ships directly back to the source of the problem: customer reports a bug, customer gets a fix in prod. Prospect asks a question, prospect gets a live demo built for their use case. For internal loops, the closure is the same structure: system hits a scaling wall, the infra loop owner ships the fix; new hire joins, the people-development loop owner ramps them into a functioning loop owner. The topology is the same whether the source is a customer or the organization itself. This also changes accountability in ways that are hard to overstate. When six people touch a customer issue, responsibility diffuses. “X is working on it” is a phrase that defacto implies nobody specific is responsible for the outcome. When one person owns the full loop, they own the outcome. Innovation in a transposed org is encouraged and diffuses rapidly. It’s each loop owner getting better at their end-to-end vertical and being creative about how they close it. The product loop owner who finds a novel way to diagnose customer problems, the revenue loop owner who invents a new demo format that converts better: these emerge from depth within a loop, not from a separate R&D department. That doesn’t mean loop owners innovate in isolation. Shared skills and harnesses propagate ideas between loops through agents. Demo hours, brownbags, and on-the-loop peer review across loop owners propagate them between humans. The innovation surface is the combination both depth within a loop plus breadth across the shared context layer. The codified institutional knowledge (standards, conventions, judgment artifacts) compounds across every loop without requiring direct coordination between them. This doesn’t happen automatically. Skill and context governance, deciding which artifacts propagate, resolving conflicts between competing approaches, maintaining quality as the context layer grows, becomes a critical organizational capacity in an AI-native company. When you reorganize around loops, the company compresses. Not necessarily fewer people, but fewer coordination surfaces. Each loop operates with a degree of independence that looks a lot like a startup within a company 3 , shared infrastructure, shared values, shared context, but autonomous execution. The weekly meeting where twelve teams give status updates starts to feel unnecessary when each loop is self-contained enough to not need the other eleven to unblock it. This also changes what “management” means. In a function-based org, managers exist partly to coordinate across the relay chain, to make sure the handoff from Product to Engineering actually happens, that priorities align, that nothing falls through the cracks. In a loop-based org, that coordination job mostly evaporates. What remains is setting direction (which loops should exist, what problems they should target) and developing people (training the taste, agency, and judgment that make loops work). The claim is “one person closes a much wider loop,” not “one person does everything.” An Account Owner who closes the full revenue loop is not also the person who takes a customer bug to a fix in production. The loops are distinct; they share infrastructure and context, not responsibilities. The common binding constraint is cross-domain taste. A person can only hold good judgment across so many domains simultaneously. Every artifact on the loop that doesn’t meet the bar is slop, and there’s no ( in the loop ) downstream review to catch it. This is what determines how wide a loop can get: not execution capacity (agents handle that), but the breadth of judgment one person can maintain with quality. The pool of people who can close a full product loop, from customer problem to deployed fix, with taste at every step, is smaller than the pool of people who can do any one step well. Training taste is hard; you can teach someone a new framework in a week, but teaching them to recognize when an architecture decision will cause problems six months from now takes years of pattern-matching. When a loop is too wide for one person’s taste to cover, you pair loops. A revenue loop that requires both deep technical credibility and relationship selling splits into a technical loop co-owner and a commercial one working the same customer from complementary angles. This mirrors what happens in agent systems. A single agent with tools has zero internal coordination cost. But the moment you have two agents sharing a database or a file system, you need orchestration. You’ve reduced coordination from O(n) handoffs per problem to O(k) dependencies between loops, where k is much smaller than n. Two loops sharing a dependency (a demo environment, a roadmap, a brand promise, a production database) still create coordination costs between them. Between loops, you still need shared context, shared infrastructure, and someone who holds the picture of how the loops interact. That “someone” is probably the exec team or an uber-architect. Some steps in a loop are irreducibly human. Enterprise sales requires being in a room with a business leader. People management requires the kind of trust that doesn’t transfer through an API. These touchpoints are an additional hard floor on how much a loop can compress. A revenue loop can collapse AE, SE, and Deal Desk into one person, but that person still needs to show up to the dinner, still needs to have the relationship, still needs to read the room. The agent handles the demo, the quote, the technical deep-dive. It doesn’t handle the handshake. This also means that the loops with the most human touchpoints can be the ones that compress the least. Support for a self-serve product can compress dramatically. Enterprise sales to Fortune 500 accounts, less so. The transpose is not uniform across the matrix. A human can make a fixed number of high-quality decisions per day 4 . Agents handle execution, but every loop still requires judgment calls: what to prioritize, when to ship, whether the output meets the bar. The number of loops a person can own is also bounded by their decision capacity. And not all decisions are equal. Thirty deployment calls are different from five calls that each blend technical judgment, customer empathy, and business risk. The weight and variety of decisions matters as much as the count. A loop that requires constant high-stakes calls across multiple domains drains capacity faster than one with routine decisions in a familiar domain. This means you can’t just keep adding loops to a person until they’re doing everything. At some point, decision fatigue degrades the quality of every loop they touch. The right load is the number of loops where taste stays high, and that number is probably lower than most executives think. Even if you buy the model, most organizations are actively incentivizing against it. Performance frameworks often reward functional depth: you get promoted for being a better engineer, not for closing a wider loop. Compensation structures may assume specialization. Career ladders push for a specific function to climb. The latent motivation of what people think their job is, “I’m a designer,” “I’m in sales”, cements as identity-level, not just structural. Transposing the org requires transposing the reward functions. Performance need to measure loop ownership and outcomes, not functional output. Compensation needs to reward breadth of judgment, not depth of specialization. And the bar needs to move continuously: what counted as “full loop ownership” three months ago is the baseline today, because the agents keep getting better and the definition of what one person can drive execution on keeps expanding. Expectations ratchet every N months. Organizations that don’t explicitly reset this bar will find their people settling into a comfortable local maximum that’s already behind. If one person owns an end-to-end loop and they leave, you lose the entire capability. Specialist orgs have redundancy baked in: three engineers all know the billing system, so losing one is survivable. In a transposed org, the product loop owner carries the full context of their vertical. The mitigation is structural, and it introduces two roles that survive the transpose: loop managers and trainees. A loop manager owns the people-development loop: new hire → ramped loop owner is their recurring end-to-end responsibility. They set the targets for their loops (what problems to attack, what the bar looks like), develop the people running them, and step in when someone is out or ramping. They don’t coordinate handoffs, because there are no handoffs. They develop loop owners. Training someone into a loop is the harder problem. In a specialist org, onboarding is narrow: learn one tool, one codebase, one function. In a loop, the new person needs to develop judgment across the full chain. The ramp could look something like: shadow the current loop owner, run the loop with training wheels (the loop manager reviews output and flags where taste is off), then take full ownership as the manager steps back. The shared organizational context, the system guardrails, skill files, and encoded judgment that every loop inherits, means the trainee doesn’t start from zero. They start from the accumulated taste of the institution. The more the organization invests in making its context explicit and portable, the lower the risk on any individual loop and the faster new people ramp. If the transpose compresses loops and agents handle execution, why have a company at all? Why not a single “make money” loop run by one person with a swarm of agents? You can. And for many problems, a solo operator with agents will outperform a team. But companies still exist for the things that don’t fit inside a single loop: pooled trust, legal liability, shared infrastructure, the cross-org context layer that makes every loop better. A cluster of individuals with taste, sharing context and compounding each other’s judgment, outperforms the same individuals operating independently. The size of a transposed company is something like: Size = sum of all loops, where each loop is bounded by: The generalist bound. Can you hire someone capable of closing the full loop with taste? The wider the loop, the rarer the person. The human touchpoint floor. How many steps in the loop require a human talking to another human? These are the execution steps agents can’t absorb. The decision capacity ceiling. How many high-judgment calls does the loop require per day, and how heavy are they? Weight and variety matter as much as count. The volume threshold. A support loop for 10 customers and one for 10,000 are different loops entirely. At volume, loops split into sub-verticals, each still owned end-to-end. The authority surface. Can the person actually close the loop? Deploy access, customer access, spending authority. Without these, wider ownership is just more steps in the same relay chain. The shared surface. How many dependencies does this loop share with other loops? A production database, a brand promise, a deployment pipeline: each creates a coordination edge. More loops sharing more surfaces means more governance overhead, even with zero handoffs within any single loop. The company ends up feeling smaller than its pre-transpose version, not because it necessarily has fewer people, but because each person is more autonomous and the coordination overhead between them drops. The “startup within a startup” cliche becomes more structurally real rather than aspirationally fake. The hiring question follows directly from the formula. In the old model, you hire to fill columns: another engineer, another designer, another sales rep. In Company T , you hire to widen rows: people who can span more of the loop with judgment, not just execution. The value of a hire is roughly how many columns they can cover with taste, multiplied by how many loops they can carry at once. The profile that thrives in a transposed org is the person who has built things end-to-end before, someone who has shipped a product, closed a deal, debugged a system, and talked to the customer 5 . Builders, founders, people who’ve run their own thing. You’ll see companies increasingly marketing to exactly this profile , and it won’t just be for engineering roles. The revenue loop owner who can build their own demos, the support loop owner who can ship their own fixes. If you’re currently in a specialist role reading this, the question isn’t whether your function disappears; it’s whether you’re the person whose judgment becomes a loop, or the person whose execution becomes an agent. The path from specialist to loop owner is: widen. What infrastructure does Company T need? The loop only works if agents can actually provide the specialist capabilities at every step. That means the right agent tooling, the right context systems, and the right permissions model that gives loop owners the authority to actually close their loops. Just how generalist can a person be? The system creates incentives for maximal generalists, people who can close the widest possible loop with taste. But there’s presumably some bound on how many domains a person can hold enough judgment in simultaneously. Or maybe post-ai-neo-SaaS does just look like a group of micro-CEOs. Who owns organizational taste? A company of micro-CEOs each closing their own loops will develop their own judgment about what good looks like. Some of that divergence is healthy: the support loop owner knows what good support feels like better than the exec does. But the product a company sells still needs to be cohesive (at least I think so). The customer shouldn’t experience three different philosophies depending on which loop they touch. What gets decided at the loop level versus the org level, and how you maintain a shared thesis across independent loop owners without re-introducing the coordination overhead you just eliminated, is a core architectural challenge of the transposed org. Thanks for reading Shrivu’s Substack! Subscribe for free to receive new posts and support my work. I discuss this a bit in my more technical posts on Building Multi-Agent Systems . People building on AI consistently underestimate the cross-domain expertise of LLMs while underestimating the coordination and confusion cost of co-execution. This post in some ways is my realization that this is true for AI-augmented human organizational design as well. “But technically a transpose is …” Ok fine then I’ll use the wider, less linear algebra specific, definition: https://www.merriam-webster.com/dictionary/transpose This isn’t a new aspiration. “Pizza teams”, “squads”, “microenterprises” all promised autonomous units within a larger org. Most delivered partial results with significant coordination tradeoffs. What’s structurally different now is that agents collapse the execution gap that previously required those units to either stay small and limited or grow and re-specialize. The loop owner has access to specialist execution without needing specialist headcount. See Decision Fatigue . On top of generalization, I think adaptable is another key characteristic. Ideally what I call ‘ derivative thinkers ’. Cartoon via Nano Banana. This post proposes Company T (“company-transpose” 2 ), an operation that fundamentally flips how software companies organize talent and operate. Why I think this will happen, what will go wrong, and how an AI-native software company probably doesn’t even really look like a ‘software company’. Company T The transposed organization organizes around what I’ll call “loops” rather than traditional specialities. A loop is the full chain of decisions between a problem and a deployed solution, owned by one person. It’s recurring, not one-shot, which is what makes it an org design unit rather than a project. Customer bugs → fixes in prod is a loop. Revenue pipeline → closed deals is a loop. The strongest loops close back to an external source (the customer who reported the bug gets the fix), but internal loops exist too. Here’s today’s org matrix. Rows are problems to solve, columns are specialist roles. Each cell is the person who handles that step. A single problem touches many people: A toy example of the org matrix, where rows are problems and columns are specialists. Traditional organizations organize around specialists. Read down any column: the same set of specialists handle that function across every problem. Bob writes the code, Carol does the design, Dave handles ops. A customer reports a bug and it touches five people across four handoffs. Frank, the person closest to the customer, has zero ability to fix the problem. Bob, the person who can fix it, has no direct relationship with the customer. Now transpose it. Same grid, but read across any row: the same generalist closes every step of that problem. A toy example of the “transposed” org matrix, where rows are loops and columns are specialties. Transposed organizations organize around loops. Bold means native skill; regular means AI-augmented. Alice was a PM; now she owns the full product loop, with agents handling the code, design, and ops she couldn’t do before. Eve was in sales; now she closes the full revenue loop. Frank was in support; now he diagnoses and fixes issues rather than just logging them. The loop owner’s role is closer to an architect than a traditional individual contributor. They’re not writing code the way an engineer writes code. They’re directing agent execution, making architectural and judgment calls across the full chain, deciding what to build and what “good” looks like before delivery. The primary output is decisions, not artifacts. Bob, Carol, and Dave didn’t disappear. Their roles shifted from direct execution to platform: encoding their specialist taste into the systems, guardrails, and context that every loop owner’s agents rely on. While this has always been the promise of platform teams, AI makes it more tractable because the encoding target is now a context layer that agents can interpret. It’s still hard, but it’s now the dedicated role rather than a side-effect. Bob doesn’t write the fix for Alice’s customer bug anymore, but he built the testing framework and code quality standards that Alice’s agents run against. Carol encoded the design system that ensures Eve’s agent-generated demos meet the brand bar. Dave built the deployment pipeline and observability stack that Frank’s agents use to ship and monitor fixes. Their expertise became infrastructure. They run their own loops now (platform reliability, design systems, etc.), and their output compounds across every other loop without requiring direct coordination with any of them. The relay chain between them collapsed into a shared context layer, made possible by agents that can absorb and apply specialist judgment at the point of execution rather than requiring the specialist to be in the chain. Today this works for some loops better than others. The gap between agent-assisted and specialist-quality output is real, and it narrows unevenly. A PM shipping production code via agents is not yet the same as a staff engineer shipping it. But the trajectory is directional, and the organizational question is whether you begin to restructure around where the capability is heading, wait until it arrives, or continue to assume things will stay where they are today. Thanks for reading Shrivu’s Substack! Subscribe for free to receive new posts and support my work. With loops… The relay chain collapses Every handoff in a relay chain has (at least) three costs: delay (waiting for the next person to pick it up), context loss (the next person only gets a summary of what the last person understood), and coordination overhead (syncs, updates, ticket management). These costs can be so deeply embedded in how companies operate that often we’ve stopped seeing them as costs. In a loop, there’s nobody to hand off to. The same person who heard the customer describe the problem is the one deploying the fix. There’s no “let me loop in engineering.” The context is native because the same human carried it the entire way. For external-facing loops, the output ships directly back to the source of the problem: customer reports a bug, customer gets a fix in prod. Prospect asks a question, prospect gets a live demo built for their use case. For internal loops, the closure is the same structure: system hits a scaling wall, the infra loop owner ships the fix; new hire joins, the people-development loop owner ramps them into a functioning loop owner. The topology is the same whether the source is a customer or the organization itself. This also changes accountability in ways that are hard to overstate. When six people touch a customer issue, responsibility diffuses. “X is working on it” is a phrase that defacto implies nobody specific is responsible for the outcome. When one person owns the full loop, they own the outcome. The best is the default Innovation in a transposed org is encouraged and diffuses rapidly. It’s each loop owner getting better at their end-to-end vertical and being creative about how they close it. The product loop owner who finds a novel way to diagnose customer problems, the revenue loop owner who invents a new demo format that converts better: these emerge from depth within a loop, not from a separate R&D department. That doesn’t mean loop owners innovate in isolation. Shared skills and harnesses propagate ideas between loops through agents. Demo hours, brownbags, and on-the-loop peer review across loop owners propagate them between humans. The innovation surface is the combination both depth within a loop plus breadth across the shared context layer. The codified institutional knowledge (standards, conventions, judgment artifacts) compounds across every loop without requiring direct coordination between them. This doesn’t happen automatically. Skill and context governance, deciding which artifacts propagate, resolving conflicts between competing approaches, maintaining quality as the context layer grows, becomes a critical organizational capacity in an AI-native company. The company feels smaller When you reorganize around loops, the company compresses. Not necessarily fewer people, but fewer coordination surfaces. Each loop operates with a degree of independence that looks a lot like a startup within a company 3 , shared infrastructure, shared values, shared context, but autonomous execution. The weekly meeting where twelve teams give status updates starts to feel unnecessary when each loop is self-contained enough to not need the other eleven to unblock it. This also changes what “management” means. In a function-based org, managers exist partly to coordinate across the relay chain, to make sure the handoff from Product to Engineering actually happens, that priorities align, that nothing falls through the cracks. In a loop-based org, that coordination job mostly evaporates. What remains is setting direction (which loops should exist, what problems they should target) and developing people (training the taste, agency, and judgment that make loops work). The hard realities… Wider loops, not infinite loops The claim is “one person closes a much wider loop,” not “one person does everything.” An Account Owner who closes the full revenue loop is not also the person who takes a customer bug to a fix in production. The loops are distinct; they share infrastructure and context, not responsibilities. The common binding constraint is cross-domain taste. A person can only hold good judgment across so many domains simultaneously. Every artifact on the loop that doesn’t meet the bar is slop, and there’s no ( in the loop ) downstream review to catch it. This is what determines how wide a loop can get: not execution capacity (agents handle that), but the breadth of judgment one person can maintain with quality. The pool of people who can close a full product loop, from customer problem to deployed fix, with taste at every step, is smaller than the pool of people who can do any one step well. Training taste is hard; you can teach someone a new framework in a week, but teaching them to recognize when an architecture decision will cause problems six months from now takes years of pattern-matching. When a loop is too wide for one person’s taste to cover, you pair loops. A revenue loop that requires both deep technical credibility and relationship selling splits into a technical loop co-owner and a commercial one working the same customer from complementary angles. Coordination compresses, it doesn’t vanish This mirrors what happens in agent systems. A single agent with tools has zero internal coordination cost. But the moment you have two agents sharing a database or a file system, you need orchestration. You’ve reduced coordination from O(n) handoffs per problem to O(k) dependencies between loops, where k is much smaller than n. Two loops sharing a dependency (a demo environment, a roadmap, a brand promise, a production database) still create coordination costs between them. Between loops, you still need shared context, shared infrastructure, and someone who holds the picture of how the loops interact. That “someone” is probably the exec team or an uber-architect. Humans like working with humans Some steps in a loop are irreducibly human. Enterprise sales requires being in a room with a business leader. People management requires the kind of trust that doesn’t transfer through an API. These touchpoints are an additional hard floor on how much a loop can compress. A revenue loop can collapse AE, SE, and Deal Desk into one person, but that person still needs to show up to the dinner, still needs to have the relationship, still needs to read the room. The agent handles the demo, the quote, the technical deep-dive. It doesn’t handle the handshake. This also means that the loops with the most human touchpoints can be the ones that compress the least. Support for a self-serve product can compress dramatically. Enterprise sales to Fortune 500 accounts, less so. The transpose is not uniform across the matrix. Decisions per day are finite A human can make a fixed number of high-quality decisions per day 4 . Agents handle execution, but every loop still requires judgment calls: what to prioritize, when to ship, whether the output meets the bar. The number of loops a person can own is also bounded by their decision capacity. And not all decisions are equal. Thirty deployment calls are different from five calls that each blend technical judgment, customer empathy, and business risk. The weight and variety of decisions matters as much as the count. A loop that requires constant high-stakes calls across multiple domains drains capacity faster than one with routine decisions in a familiar domain. This means you can’t just keep adding loops to a person until they’re doing everything. At some point, decision fatigue degrades the quality of every loop they touch. The right load is the number of loops where taste stays high, and that number is probably lower than most executives think. The incentives aren’t set up for this Even if you buy the model, most organizations are actively incentivizing against it. Performance frameworks often reward functional depth: you get promoted for being a better engineer, not for closing a wider loop. Compensation structures may assume specialization. Career ladders push for a specific function to climb. The latent motivation of what people think their job is, “I’m a designer,” “I’m in sales”, cements as identity-level, not just structural. Transposing the org requires transposing the reward functions. Performance need to measure loop ownership and outcomes, not functional output. Compensation needs to reward breadth of judgment, not depth of specialization. And the bar needs to move continuously: what counted as “full loop ownership” three months ago is the baseline today, because the agents keep getting better and the definition of what one person can drive execution on keeps expanding. Expectations ratchet every N months. Organizations that don’t explicitly reset this bar will find their people settling into a comfortable local maximum that’s already behind. One person, one loop, one bus factor If one person owns an end-to-end loop and they leave, you lose the entire capability. Specialist orgs have redundancy baked in: three engineers all know the billing system, so losing one is survivable. In a transposed org, the product loop owner carries the full context of their vertical. The mitigation is structural, and it introduces two roles that survive the transpose: loop managers and trainees. A loop manager owns the people-development loop: new hire → ramped loop owner is their recurring end-to-end responsibility. They set the targets for their loops (what problems to attack, what the bar looks like), develop the people running them, and step in when someone is out or ramping. They don’t coordinate handoffs, because there are no handoffs. They develop loop owners. Training someone into a loop is the harder problem. In a specialist org, onboarding is narrow: learn one tool, one codebase, one function. In a loop, the new person needs to develop judgment across the full chain. The ramp could look something like: shadow the current loop owner, run the loop with training wheels (the loop manager reviews output and flags where taste is off), then take full ownership as the manager steps back. The shared organizational context, the system guardrails, skill files, and encoded judgment that every loop inherits, means the trainee doesn’t start from zero. They start from the accumulated taste of the institution. The more the organization invests in making its context explicit and portable, the lower the risk on any individual loop and the faster new people ramp. How big is Company T , and who works there? If the transpose compresses loops and agents handle execution, why have a company at all? Why not a single “make money” loop run by one person with a swarm of agents? You can. And for many problems, a solo operator with agents will outperform a team. But companies still exist for the things that don’t fit inside a single loop: pooled trust, legal liability, shared infrastructure, the cross-org context layer that makes every loop better. A cluster of individuals with taste, sharing context and compounding each other’s judgment, outperforms the same individuals operating independently. The size of a transposed company is something like: Size = sum of all loops, where each loop is bounded by: The generalist bound. Can you hire someone capable of closing the full loop with taste? The wider the loop, the rarer the person. The human touchpoint floor. How many steps in the loop require a human talking to another human? These are the execution steps agents can’t absorb. The decision capacity ceiling. How many high-judgment calls does the loop require per day, and how heavy are they? Weight and variety matter as much as count. The volume threshold. A support loop for 10 customers and one for 10,000 are different loops entirely. At volume, loops split into sub-verticals, each still owned end-to-end. The authority surface. Can the person actually close the loop? Deploy access, customer access, spending authority. Without these, wider ownership is just more steps in the same relay chain. The shared surface. How many dependencies does this loop share with other loops? A production database, a brand promise, a deployment pipeline: each creates a coordination edge. More loops sharing more surfaces means more governance overhead, even with zero handoffs within any single loop. What infrastructure does Company T need? The loop only works if agents can actually provide the specialist capabilities at every step. That means the right agent tooling, the right context systems, and the right permissions model that gives loop owners the authority to actually close their loops. Just how generalist can a person be? The system creates incentives for maximal generalists, people who can close the widest possible loop with taste. But there’s presumably some bound on how many domains a person can hold enough judgment in simultaneously. Or maybe post-ai-neo-SaaS does just look like a group of micro-CEOs. Who owns organizational taste? A company of micro-CEOs each closing their own loops will develop their own judgment about what good looks like. Some of that divergence is healthy: the support loop owner knows what good support feels like better than the exec does. But the product a company sells still needs to be cohesive (at least I think so). The customer shouldn’t experience three different philosophies depending on which loop they touch. What gets decided at the loop level versus the org level, and how you maintain a shared thesis across independent loop owners without re-introducing the coordination overhead you just eliminated, is a core architectural challenge of the transposed org.

0 views
Brain Baking 2 days ago

App Defaults In March 2026

It’s been almost three years since sharing my toolkit defaults (2023) . High time to report an update. There’s a second reason to post this now: I’ve been trying to get back into the Linux groove (more on that later), so I’m hoping to either change the defaults below in the near future or streamline them across macOS & Linux. When the default changed I’ll provide more information; otherwise see the previous post as linked above. Some more tools that have been adapted that don’t belong in one of the above categories: A prediction for this post in 2027: all tools have been replaced with Emacs. All silliness aside; Emacs is the best thing that happened to me in the last couple of months. Related topics: / lists / app defaults / By Wouter Groeneveld on 29 March 2026.  Reply via email . Backup system : Still Restic, but I added Syncthing into the loop to get that 1-2-3 backup number higher. I still have to buy a fire-proof safe (or sync it off-site). Bookmarks and Read It Later systems : Still Alfred & Obsidian. Experimenting with Org-mode and org-capture; hoping to migrate this category to Emacs as well. Browser : Still Firefox. Calendar and contacts : Still Self-hosted Radicale. Chat : Mainly Signal now thanks to bullying friends into using it . Cloud File Storage : lol, good one. Coding environment : For light and quick scripting, Sublime Text Emacs! Otherwise, any of the dedicated tools from the JetBrains folks. and can only do so much; it’s dreadful in Java. Image editor : Still ImageMagick + GIMP. Mail : Apple Mail for macOS for brainbaking Mu4e in Emacs! and Microsoft Outlook for work Apple Mail for the work Exchange server. I didn’t want to mix but since Mu cleared up Mail, that’s much better than Outlook. Music : Still Navidrome. Notes : Still pen & paper but I need to remind myself to take up that pen more often. Password Management : Still KeePassXC. Photo Management : Still PhotoPrism. I considered replacing it but I barely use it; it’s just a photo dump place for now. Podcasts : I find myself using the Apple Podcast app more often than in 2023. I don’t know if that’s a bad thing—it will be if I want to migrate to Linux. Presentations : Haven’t found the need for one. RSS : Still NetNewsWire but since last year it’s backed by a FreshRSS server making cross-platform reading much better. Android client app used is Randrop now, so that’s new. Spreadsheets : For student grading, Google Sheets or Excel if I have to share it with colleagues . My new institution is pro Teams & Office 365. Yay. Text Editor : I’m typing this Markdown post in Sublime Text Emacs. Word Processing : Still Pandoc if needed. Terminal : emulator: iTerm2 Ghostty, but evaluating Kitty as well (I hated how the iTerm2 devs shoved AI shit in there); shell: Zsh migrated to Fish two days ago! The built-in command line option autocomplete capabilities are amazing. Guess what: more and more I’m using eshell and Emacs. Karabiner Elements to remap some keys (see the explanation ) I tried out Martha as a Finder alternative. It’s OK but I’d rather dig into Dired (Emacs)—especially if I see the popularity of tools like that just steal Dired features. I replaced quite a few coreutils CLI commands with their modern counterparts: now is , now is , now is , now is , and can be used to enhance shell search history but Fish eliminated that need. AltTab for macOS replaces the default window switcher. The default didn’t play nice with new Emacs frames and I like the mini screenshot.

1 views

Walking backwards into the future – A look at descriptor heap in Granite

It seems like I can never quite escape the allure of fiddling with bits more efficiently every passing year. I recently went through the process of porting over Granite’s Vulkan backend to use VK_EXT_descriptor_heap. There wasn’t exactly a burning need to do this work, but science demands I sacrifice my limited free time for these experiments. My name may or may not be on the extension summary, and it’s important to eat your own dog food. In this post, I want to explore ways in which we can port over an old school binding model to newer APIs should the need arise. Granite’s binding model is designed for really old Vulkan. The project started in January 2017 after all, at which point Vulkan was in its infancy. Bindless was not really a thing yet, and I had to contend with really old mobile hardware. Slot-based bindings have been with us since OpenGL and early D3D. I still think it’s a fine model from a user’s perspective. I have no problem writing code like: It’s very friendly to tooling and validation and I just find it easy to use overall. GPU performance is great too since vendors have maximal flexibility in how to implement the API. The major downside is the relatively heavy CPU cost associated with it since there are many API calls to make. In my projects, it’s rarely a concern, but when doing heavy CPU-bound workloads like PS2 GS emulation, it did start to matter quite a bit When SPIR-V shaders are consumed in Granite, they are automatically reflected. E.g., with GLSL: I automatically generate VkDescriptorSetLayout for each unique set, and combine these into a VkPipelineLayout as one does. VkDescriptorSetLayouts is hash’n’cached into a DescriptorSetAllocator. The implicit assumption by shaders I write is that low-frequency updates have lower set values. This matches Vulkan’s pipeline layout compatibility rules too. Given the hardcore descriptor churn this old model can incur, UBOs originally used VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC. Since linearly allocating new UBOs per draw is a hot path, I wanted to avoid having to allocate and write new descriptor sets all the time. This is precisely what the dynamic buffer types were designed for. I did not use it for SSBOs since DYNAMIC has some unfortunate interactions with descriptor size, since you cannot change the size, only offset. The size of UBOs is somewhat irrelevant, and I just hardcoded in a 64K window. There are two main strategies for allocating sets from a VkDescriptorPool, both which are kinda bad. The typical model I believe most do is the “jumbo” allocator where you create a big pool with many sets and many descriptors with different descriptor types and pray for the best. When the pool is OOM-ed, allocate another. One unfortunate thing about the jumbo pool is that you can’t really know up front exactly how to balance the descriptor types properly. It will always be a shaky heuristic. In raw Vulkan 1.0, it was straight up illegal to allocate any further once a limit had been reached, causing even more headaches. The very first maintenance extension to Vulkan fixed this and added OUT_OF_POOL_MEMORY which allows applications to just keep going until the pool is exhausted. Fun fact is that some vendors would never exhaust the pool and just straight up ignore what you pass into vkCreateDescriptorPool, so that’s fun. Granite went the route of a slab allocator per VkDescriptorSetLayout instead, one allocator per thread. Allocate a group of like 64 VkDescriptorSets in one go and parcel them out as needed. Main advantage here was no need to keep calling vkAllocateDescriptorSets over and over, and in the early years, I even hash’n’cached the descriptor sets. The primary reason for doing that was that some early mobile drivers were extreeeeeeeemely slow at vkUpdateDescriptorSets for some reason. Not a great time. This slab approach lead to memory bloat though. At some point VK_KHR_descriptor_update_template was added which aims to accelerate vkUpdateDescriptorsSets. Instead of having the driver parse the structs and switching on the descriptorType to write descriptors, the update template allows drivers in theory to “precompile” a highly optimized function that updates descriptors based on the template that is provided in vkCreateDescriptorUpdateTemplate. This was a nice incremental thing to add to Granite. I don’t think the promise of update templates really worked out in the end though. Most drivers I think just resorted to parsing the original template instead, leading to no speedup. Push descriptors were designed quite early on in Vulkan’s life, but its adoption was … spotty at best. It didn’t make it into core until Vulkan 1.4! Push descriptors solved some issues for us slot and binding troglodytes since there was simply no need to mess around with allocating sets and pools when we could just push descriptors and the driver would deal with it. The major downside is that only one descriptor set can be a push set, but in Granite’s case, I could design for that limitation when writing shaders. The last set index in a VkPipelineLayout would get assigned as a push set. After going push descriptors, I dropped the old UBO_DYNAMIC path, since push descriptors are not compatible with it, and the UBO_DYNAMIC wins were … questionable at best anyway. It took a while to move to this model though. AMD Windows driver was infamously dragging its feet for years before finally accepting reality and at that point I was ready to move over. It’s still not a hard requirement in Granite due to mobile concerns, but then the driver hits the slow path, and I don’t really care anymore At some point, any modern renderer has to deal with this and Granite hit this wall with clustered shading, where an array of shadow maps became a hard necessity. I’m not a big fan of “everything is bindless” myself, since I think it makes debugging way more annoying and stresses tooling and validation more than it should, but sometimes the scissor juggling is necessary. When Granite reflects a shader looking like this: The set layout is converted into an UPDATE_AFTER_BIND set with VARIABLE_COUNT array length. There is also a special helper function to aid in allocating these bindless sets where the API mostly turns into: The CPU overhead of this isn’t quite trivial either, but with the set and pool model, it’s not easy to escape this reality without a lot of rewrites. For now, I only support sampled images with bindless and I never really had any need or desire to add more. For bindless buffers, there is the glorious buffer_device_address instead. This model has served and keeps serving Granite well. Once this model is in place, the only real reason to go beyond this for my use cases is performance (and curiosity). VK_EXT_descriptor_buffer asks the question of what happens when we just remove the worst parts of the descriptor API: Sets are now backed by a slice of memory, and pools are replaced by a big descriptor buffer that is bound to a command buffer. Some warts remain however, as VkDescriptorSetLayout and PipelineLayout persist. If you’re porting from the legacy model like I was, this poses no issues at all, and actually reduces the friction. Descriptor buffers are a perfectly sound middle-ground alternative for those who aren’t a complete bindless junkie yet, but want some CPU gains along the way. In the ideal use case for descriptor buffers, we have one big descriptor buffer that is always bound. This is allocated with PCI-e BAR on dGPUs, so DEVICE_LOCAL | HOST_VISIBLE. Instead of allocating descriptor sets, command buffer performs a linear allocation which is backed by slices allocated from the global descriptor buffer. No API calls needed. The size to allocate for VkDescriptorSet is queried from the set layout itself, and each descriptor is assigned an offset that the driver controls. There is a wart in the spec where the min-spec for sampler descriptor buffers is very small (4K samplers). In this case, there is a risk that just linearly allocating out of the heap will trivially OOM the entire thing and we have to allocate new sampler descriptor buffers all the time. In practice, this limitation is completely moot. Granite only opts into descriptor buffers if the limits are reasonable. There is supposed to be a performance hit to rebinding descriptor buffers, but in practice, no vendor actually ended up implementing descriptor buffers like that. However, since VK_EXT_descriptor_heap will be way more strict about these kinds of limitations, I designed the descriptor_buffer implementation around the single global heap model to avoid rewrites later. There is certainly a risk of going OOM when linearly allocating like this, but I’ve never hit close to the limits. It’s not hard to write an app that would break Granite in half though, but I consider that a “doctor, my GPU hurts when I allocate like this” kind of situation. This is where we should have a major win, but it’s not all that clear. For each descriptor type, I have different strategies on how to deal with them. The basic idea of descriptor buffers is that we can call vkGetDescriptorEXT to build a descriptor in raw bytes. This descriptor can now be copied around freely by the CPU with e.g. memcpy, or even on the GPU in shaders (but that’s a level of scissor juggling I am not brave enough for). These are the simplest ones to contend with. Descriptor buffers still retain the VkImageView and VkSampler object. The main addition I made was to allocate a small payload up front and write the descriptor once. E.g.: Instead of vkUpdateDescriptorSets, we can now replace it with a trivial memcpy. The memcpy functions are function pointers that resolve the byte count. This is a nice optimization since the memcpy functions can unroll to perfectly unrolled SIMD load-store. Allocating bindless sets of sampled images with this method becomes super efficient, since it boils down to a special function that does: I rarely use these, but they are also quite neat in descriptor buffers. VkBufferView is gone now, so we just need to create a descriptor payload once from VkDeviceAddress and it’s otherwise the same as above. This descriptor type is somewhat of a relic these days, but anyone coming from a GL/GLES background instead of D3D will likely use this descriptor type out of old habit, me included. The API here is slightly more unfortunate, since there is no obvious way to create these descriptors up-front. We don’t necessarily know all the samplers an image will be combined with, so we have to do it last minute, calling vkGetDescriptorEXT to create the combined descriptor. We cannot meaningfully pre-create descriptors for UBOs and SSBOs so we’re in a similar situation where we have to call vkGetDescriptorEXT for each buffer last-minute. Unfortunately, there is no array of descriptor version for GetDescriptorEXT, so in the extreme cases, descriptor buffers can actually have worse CPU overhead than legacy model. DXVK going via winevulkan .dll <-> .so translation overhead has been known to hit this, but for everyone else I’d expect the difference to be moot. Since descriptor buffer is an incremental improvement over legacy model, we retain optional support for push descriptors. This can be useful in some use cases (it’s critical for vkd3d-proton), but Granite does need it. Once we’re in descriptor buffer land, we’re locked in. Descriptor buffers are battle tested and very well supported at this point. Perhaps not on very old mobile drivers, but slightly newer devices tend to have it, so there’s that! RenderDoc has solid support these days as well. At a quick glance, descriptor heap looks very similar to D3D12 (and it is), but there are various additions on top to make it more compatible with the various binding models that exist out there in the wild, especially for people who come from a GL/Vulkan 1.0 kind of engine design. The normal D3D12 model has some flaws if you’re not fully committed to bindless all day every day, mainly that: This is to match how some hardware works, nothing too complicated. I allocate for the supported ~1 million resource descriptors and 4096 samplers. There is a reserved region for descriptors as well which is new to this extension. In D3D12 this is all abstracted away since applications don’t have direct access to the descriptor heap memory. For the resource heap, we have a 512 K descriptor area which can be freely allocated from, like we did with descriptor buffer. Unlike descriptor buffer where we hammer this arena allocator all the time, we will only rarely need to touch it with descriptor heap. The next ~500k or so descriptors are dedicated to holding the descriptor payload for VkImageView, VkSampler and VkBufferView. All of these objects are now obsolete. When Granite creates a Vulkan::ImageView, it internally allocates a free slab index from this upper region, writes the descriptor there and stores the heap index instead. This enables “true” bindless in a performant way. We could have done this before if we wanted to, but in descriptor buffer we would have eaten a painful indirection on a lot of hardware, which is not great. Some Vulkan drivers actually works just like this internally. You can easily tell, because some drivers report that an image descriptor is just sizeof(uint32_t). We’d have our index into the “heap”, which gets translated into yet another index into the “true” (hidden) heap. Chasing pointers is bad for perf as we all know. We keep a copy of the descriptor payload in CPU memory too, in case we have to write to the arena allocated portion of the heap later. The upper region of ~10k descriptors or so (depends on the driver) is just a reserved region we bind and never touch. It’s there so that drivers can deal with CmdResolveImage, CmdBlitImage and other such special APIs that internally require descriptors. For samplers, there is no arena allocator. It’s so tiny. Instead, when creating a sampler, we allocate a slab index and return a dummy handle by just pointer casting the index instead. We’ll make good use of the mapping APIs later to deal with this lack of arena allocation. In fact, we will never have to copy sampler descriptor payloads around, and we don’t have to mess around with static samplers either, neat! For the static sampler crowd, there is full support for embedded samplers which functions just like D3D12 static samplers, so there’s that but Granite doesn’t use it. It was a non-trivial amount of code to get to this point, but hey, that’s what happens when you try to support 3 descriptor models at once I guess … Core Vulkan 1.0 settled on 128 bytes of push constants being the limit. This was raised in Vulkan 1.4 but Granite keeps the old limit (I could probably live with 32 or 64 bytes to be fair). Push data expands to 256 byte as a minimum, and the main idea behind descriptor heap is that pipeline layouts are completely gone, and we get to decide how the driver should interpret the push data space. This is similar to D3D12 root parameters except it’s not abstracted behind a SetRootParameter() kind of interface that is called one at a time. In Vulkan, we can call CmdPushDataEXT once. VkPipelineLayout and VkDescriptorSetLayout is just gone now, poof, does not exist at all. This is huge for usability. Effectively, we can pretend that the VkPipelineLayout is now just push constant range of 256 bytes, and that’s it. If you’re fully committed to go bindless, we could just do the equivalent of SM 6.6 ResourceDescriptorHeap and SamplerDescriptorHeap and buffer_device_address to get everything done. However, Granite is still a good old slot based system, so I need to use the mapping features to tell the driver how to translate set/binding into actual descriptors. This mapping can be different per-shader too, which fixes a lot of really annoying problems with EXT_graphics_pipeline_library and EXT_shader_object if I feel like going down that path in the future. The natural thing to do for me was to split up the space into maximum 128 byte push constants, then 32 bytes per descriptor set (I support 4 sets, Vulkan 1.0 min-spec). It’s certainly possible to parcel out the data more intelligently, but that causes some issues with set compatibility which I don’t want to deal with. For every set, I split it up into buffers and images and decide on a strategy for each. Buffers are decided first since they have the largest impact on performance in my experience. This is very simple. If there are 3 or fewer buffers in a set (24 bytes), we can just stuff the raw pointers into push data and tell the driver to use that pointer. This is D3D12 root descriptors in a nutshell. Especially for UBOs, this is very handy for performance. We lose robustness here, but I never rely on buffer robustness anyway. The push data layout looks something like this: This is a new Vulkan speciality. Without modifying the shaders, we can tell the driver to load a buffer device address from a pointer in push data instead. This way we don’t have to allocate from the descriptor heap itself, we can just do a normal linear UBO allocation, write some VkDeviceAddresses in there and have fun. Given the single indirection to load the “descriptor” here, this looks a lot like Vulkan 1.0 descriptor sets, except there’s no API necessary to write them. This isn’t the ideal path, but sometimes we’re forced to allocate from the heap. This can happen if we have one of these cases: This is a pretty much D3D12’s root tables, but in Vulkan we can be a bit more optimal with memory since buffer descriptors tend to be smaller than image descriptors and we can pack them tightly. D3D12 has one global stride for any resource descriptor while Vulkan exposes separate sizes that applications can take advantage of. vkWriteResourceDescriptorsEXT is required here to write the SSBO descriptors. After buffers are parceled out for a descriptor set, we have some space left for images. At minimum, we have 8 bytes left (32 – 3 * sizeof(VkDeviceAddress)). This is the common and ideal case. If we don’t have any arrays of images, we can just have a bunch of uint32_t indices directly into the heap. At image view and buffer view creation time, we already allocated a persistent index into the heap that we can refer to. No API calls required when emitting commands. Combined image samplers work quite well in this model, because Vulkan adds a special mapping mode that packs both sampler index and the image index together. This fixes one of the annoying issues in EXT_descriptor_buffer. If we cannot use the simple inline indices, we have two options. The preferred one right now is to just allocate space in the descriptor heap just like the descriptor buffer path, because I’m quite concerned with unnecessary indirections when possible. At least we get to copy the payloads around without API commands. This path is also used for bindless sets. Unlike the descriptor buffer path, there is a major problem which is that linearly allocating from the sampler heap is not viable. The sampler heap is really small now just like in D3D12. In this case, Vulkan has an answer. This is a special Vulkan feature that functions like an indirect root table. This one is similar to INDIRECT_ADDRESS in that we don’t have to allocate anything from the heap directly and we can just stuff heap indices straight into a UBO. Overall, I think these new mapping types allows us to reuse old shaders quite effectively and it’s possible to start slowly rewriting shaders to take full advantage of descriptor_heap once this machinery is in place. For GPU performance, it seemed to be on-par with the other descriptor models on NVIDIA and AMD which was expected. Granite does not really hit the cases where descriptor_heap should meaningfully improve GPU performance over descriptor_buffer, but I only did a rough glance. For CPU performance, things were a bit more interesting, and I learned that Granite has quite significant overhead on its own, which is hardly surprising. That’s the cost of an old school slot and binding model after all, and I never did a serious optimization pass over it. A more forward looking rendering abstraction can eliminate most, if not all this overhead. The numbers here are for RADV, but it’s using the pending merge request for descriptor_heap support. – ~27 us to write 4096 image descriptors on a Ryzen 3950x with a RX 6800. This is basically exactly the same. ~13 us. This is really just a push_back and memcpy bench at this point. This case hits the optimal inline BDA case for heap. ~ 279 ns per dispatch. Doesn’t feel very impressive. Basically same perf, but lots of overhead has now shifted over to Granite. Certainly things can be optimized further. GetDescriptorEXT is somehow much faster than UpdateDescriptorSetWithTemplate though. ~ 157 ns / dispatch now, and most of the overhead is now in Granite itself, which is ideal. I added an extra buffer descriptor per set which hits the INDIRECT_ADDRESS path. Heap regressed significantly, but it’s all in Granite code at least. Likely related having to page in new UBO blocks, but I didn’t look too closely. ~ 375 ns / dispatch, hnnnnnng. The other paths don’t change much as is expected. About ~ 310 ns / dispatch for legacy and descriptor buffer models. This is the happy path for descriptor heap. ~ 161 ns / dispatch ~ 166 ns. Quite interesting that it got slower. The slab allocator for legacy sets seems to be doing its job very well. The actual descriptor copying vanished from the top list at least. ~ 145 ns. A very modest gain, and most of the overhead is now just Granite jank. All the paths look very similar now. ~ 170 ns or so. On RTX 4070 with 595 drivers. The improvements especially for buffers is quite large on NV, interestingly enough. For the legacy buffer tests, it’s heavily biased towards driver overhead: For the image tests the gains are modest, which is somewhat expected given how NV implements image descriptors before descriptor heap. It’s just some trivial u32 indices. Overall, it’s interesting how well the legacy Vulkan 1.0 model holds up here, at least on RADV on my implementation. Descriptor buffer and heap cannot truly shine unless the abstraction using it is written with performance in mind. This sentiment is hardly new. Just porting OpenGL-style code over to Vulkan doesn’t give amazing gains, just like porting old and crusty binding models won’t magically perform with newer APIs either. Either way, this level of performance is good enough for my needs, and the days of spamming out 100k draw calls is kinda over anyway, since it’s all GPU driven with large bindless data sets these days. Adding descriptor buffer and heap support to Granite was generally motivated by curiosity rather than a desperate need for perf, but I hope this post serves as an example of what can be done. There’s a lot of descriptor heap that hasn’t been explored here. GPU performance for heavily bindless workloads is another topic entirely, and I also haven’t really touched on how it would be more practical to start writing code like: which would side-step almost all Granite overhead. Overall I quite like what we’ve got now with descriptor heap as an API, a bastard child of descriptor buffer and D3D12 that gets the job done. As tooling and driver support matures, I will likely just delete the descriptor buffer path, keeping the legacy stuff around for compatibility. VkDescriptorSet VkDescriptorPool vkUpdateDescriptorSets (kinda) VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE VK_DESCRIPTOR_TYPE_STORAGE_IMAGE VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT VK_DESCRIPTOR_TYPE_SAMPLER VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER VK_DESCRIPTOR_TYPE_STORAGE_BUFFER VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR You very quickly end up having to call CopyDescriptorsSimple a LOT to shuffle descriptors into the heap. Since this is a call into the driver just to copy a few bytes around, it can quickly be a source of performance issues. In vkd3d-proton, we went to hell and back to optimize this case because in many titles, it was the number 1 performance overhead. Dealing with samplers is a major pain. The 2K sampler heap limit can be rather limiting since there is no good way to linearly allocate on such a small heap. Static samplers are quite common as a result, but they have other problems. Recompiling shaders because you change Aniso 4x to 8x in the settings menu is kinda a hilarious situation to be in, but some games have been known to do just that … The shader is using OpArrayLength on an SSBO. We need real descriptors in this case. The current implementation just scans the SPIR-V shader module for this instruction, but could be improved in theory. The shader is using an array of descriptors. For buffers, this should be very rare, but the PUSH_ADDRESS and INDIRECT_ADDRESS interfaces do not support this. Robustness is enabled. Test #1: Write 4096 image descriptors: 17.6 us (copies u32 indices) Test #2: 693 ns Test #3: 726 ns Test #4: 377 ns Test #5: 408 ns Test #1: 10.2 us (copies u32 indices) Test #2: 434 ns Test #3: 479 ns Test #4: 307 ns Test #5: 315 ns Test #1: 11 us (copies real 32 byte descriptors) Test #2: 389 ns Test #3: 405 ns Test #4: 321 ns Test #5: 365 ns

0 views
Kev Quirk 2 days ago

I Think I've Been Scammed On eBay!

Back in November I pre-ordered an Ollee Watch , which was delivered in February. After playing with the watch, I decided I didn't want it, so I posted it up on eBay - never worn, so brand new. A week or so later it sold and I posted it off to its new owner. A day or 2 later, the buyer messaged me saying the backlight wasn't working. This immediately raised my suspicions as the watch was brand new and I had packaged it up well. Anyway, I gave them the benefit of the doubt after they had sent a video of the apparent problem, accepted the return, and paid for them to return the watch to me. I took delivery of the watch yesterday, opened the package and all of the Olle packaging has been removed, as well as the original Casio module. It came back like this, with only half of a Casio box: I tested the backlight; low and behold, it's working fine! So now I have a new Ollee watch, with no packaging, and no Casio module. So it's worth a lot less than it was previously. Fucking brilliant. I've asked eBay to step in and help resolve the situation, so we will see what happens. But there's a lot of buyer protection on eBay (and rightly so) but there's very little in the way of seller protection, even though I'm not a business. So I have a feeling they will find in favour of the buyer and I'll be out a few quid. Double fucking brilliant. I messaged the buyer once I'd received the watch back, politely asking WTF? and they replied with: I'm sorry I thought the original packaging was all there, there definitely was a problem with the backlight and I think the original Casio thing is in the little compartment on the stand. The module is 100% not there. Now, the buyer may be legit. The backlight may not have been functioning properly while they had it. They may have binned all the Ollee packaging 1 , and the Casio module, but I find it hard to believe. The backlight works flawlessly. It's not like it works occasionally or anything like that. You tap the light button and it lights up every single time. It's in perfect working condition. My guess is that they've done this to get the Ollee packaging, then they're going to scam some other poor bastard by selling them a standard F-91W (which costs around £15) dressed up as an Ollee watch for around £100. Anyway, we will see what happens as eBay get involved. If you're in the UK and interested in getting yourself a fully working, brand new, Ollee watch (albeit with no Ollee packaging) for cheap, get in touch. Why would you keep the other half of the packaging though??  ↩ Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment . Why would you keep the other half of the packaging though??  ↩

0 views
(think) 2 days ago

Batppuccin: My Take on Catppuccin for Emacs

I promised I’d take a break from building Tree-sitter major modes, and I meant it. So what better way to relax than to build… color themes? Yeah, I know. My idea of chilling is weird, but I genuinely enjoy working on random Emacs packages. Most of the time at least… For a very long time my go-to Emacs themes were Zenburn and Solarized – both of which I maintain popular Emacs ports for. Zenburn was actually one of my very first open source projects (created way back in 2010, when Emacs 24 was brand new). It served me well for years. But at some point I got bored. You know the feeling – you’ve been staring at the same color palette for so long that you stop seeing it. My experiments with other editors (Helix, Zed, VS Code) introduced me to Tokyo Night and Catppuccin , and they’ve been my daily drivers since then. Eventually, I ended up creating my own Emacs ports of both. I’ve already published emacs-tokyo-themes , and I’ll write more about that one down the road. Today is all about Catppuccin. (and by this I totally mean Batppuccin!) There’s already an official Catppuccin theme for Emacs , and it works. So why build another one? A few reasons. The official port registers a single theme and switches between flavors (Mocha, Macchiato, Frappe, Latte) via a global variable and a reload function. This is unusual by Emacs standards and breaks the normal workflow – theme-switching packages like need custom glue code to work with it. It also loads color definitions from an external file in a way that fails when Emacs hasn’t marked the theme as safe yet, which means some users can’t load the theme at all. Beyond the architecture, there are style guide issues. is set to the default text color, making variables invisible. All levels use the same blue, so org-mode headings are flat. forces green on all unstyled code. Several faces still ship with magenta placeholder colors. And there’s no support for popular packages like vertico, marginalia, transient, flycheck, or cider. I think some of this comes from the official port trying to match the structure of the Neovim version, which makes sense for their cross-editor tooling but doesn’t sit well with how Emacs does things. 1 Batppuccin is my opinionated take on Catppuccin for Emacs. The name is a play on my last name (Batsov) + Catppuccin. 2 I guess you can think of this as ’s Catppuccin… or perhaps Batman’s Catppuccin? The key differences from the official port: Four proper themes. , , , and are all separate themes that work with out of the box. No special reload dance needed. Faithful to the style guide. Mauve for keywords, green for strings, blue for functions, peach for constants, sky for operators, yellow for types, overlay2 for comments, rosewater for the cursor. The rainbow heading cycle (red, peach, yellow, green, sapphire, lavender) makes org-mode and outline headings actually distinguishable. Broad face coverage. Built-in Emacs faces plus magit, vertico, corfu, marginalia, embark, orderless, consult, transient, flycheck, cider, company, doom-modeline, treemacs, web-mode, and more. No placeholder colors. Clean architecture. Shared infrastructure in , thin wrapper files for each flavor, color override mechanism, configurable heading scaling. The same pattern I use in zenburn-emacs and emacs-tokyo-night-theme . I didn’t really re-invent anything here - I just created a theme in a way I’m comfortable with. I’m not going to bother with screenshots here – it looks like Catppuccin, because it is Catppuccin. There are small visual differences if you know where to look (headings, variables, a few face tweaks), but most people wouldn’t notice them side by side. If you’ve seen Catppuccin, you know what to expect. The easiest way to install it right now: Replace with , , or for the other flavors. You can also switch interactively with . I remember when Solarized was the hot new thing and there were something like five competing Emacs ports of it. People had strong opinions about which one got the colors right, which one had better org-mode support, which one worked with their favorite completion framework. And that was fine! Different ports serve different needs and different tastes. The same applies here. The official Catppuccin port is perfectly usable for a lot of people. Batppuccin is for people who want something more idiomatic to Emacs, with broader face coverage and stricter adherence to the upstream style guide. Both can coexist happily. I’ve said many times that for me the best aspect of Emacs is that you can tweak it infinitely to make it your own, so as far as I’m concerned having a theme that you’re the only user of is perfectly fine. That being said, I hope a few of you will appreciate my take on Catppuccin as well. This is an early release and there’s plenty of room for improvement. I’m sure there are faces I’ve missed, colors that could be tweaked, and packages that deserve better support. If you try it out and something looks off, please open an issue or send a PR. I’m also curious – what are your favorite Emacs themes these days? Still rocking Zenburn? Converted to modus-themes? Something else entirely? I’d love to hear about it. That’s all from me, folks! Keep hacking! The official port uses Catppuccin’s Whiskers template tool to generate the Elisp from a template, which is cool for keeping ports in sync across editors but means the generated code doesn’t follow Emacs conventions.  ↩︎ Naming is hard, but it should also be fun! Also – I’m a huge fan of Batman.  ↩︎ The official port uses Catppuccin’s Whiskers template tool to generate the Elisp from a template, which is cool for keeping ports in sync across editors but means the generated code doesn’t follow Emacs conventions.  ↩︎ Naming is hard, but it should also be fun! Also – I’m a huge fan of Batman.  ↩︎

0 views