Posts in Html (20 found)

What's going on with Gemini?

Google is in a strange spot right now. They've got arguably the deepest research bench in the industry, their own custom silicon, and effectively unlimited money - and yet most developers I talk to barely touch Gemini day-to-day. The recent Google I/O announcements crystallised a lot of what I find confusing about their AI strategy, so I wanted to write down where I think they actually stand. The consensus seems to be that currently Anthropic and OpenAI are very much in the lead for frontier model intelligence, with each of those two labs trading blows every month. This may change in the near future - if Anthropic releases Mythos-class models that OpenAI doesn't have an answer to - but right now I think most practitioners would agree that GPT5.5 and Opus 4.8 are roughly in the same ballpark. After that, you have Google, with Gemini 3.1 Pro being in benchmarks ahead of the Chinese models but behind the flagship Anthropic/OpenAI models. In my personal experience though I've had better results from the best-in-class Chinese models (GLM 5.1 and Qwen 3.7) than Gemini 3.1 Pro at software engineering tasks. The main model announcement at Google I/O was Gemini 3.5 Flash. The benchmarks of it were underwhelming at coding: Gemini 3.5 Flash on the Artificial Analysis Coding Index - solidly mid-pack. Source: Artificial Analysis . However, the model is super fast - roughly 4x faster in tokens per second than the aforementioned Anthropic/OpenAI models: Output tokens per second - Gemini 3.5 Flash at 206 t/s, far ahead of Opus 4.8 and GPT-5.5. Source: Artificial Analysis . This definitely is really interesting development, especially for user facing applications which can appear very sluggish to users. But - the big but - is the huge price increase they announced - 3x more expensive than the previous flash release. At $9/MTok it is vastly more expensive than the best in class Chinese models, and I'm struggling to see where this fits - if you want best in class intelligence you pay the extra for Opus/GPT5.5, if you want cheap but not-as-clever the Chinese models fit the bill well. The risks around Chinese models are somewhat overplayed in my opinion - you can self-host a lot of them, or use US-based inference providers via OpenRouter. Having said all that, perhaps really this model isn't designed for external use in the same way that the OpenAI/Anthropic models are. Clearly Google consumes an enormous amount of tokens internally - for all their products like AI mode, Gmail, etc. If you look at it that way, the model makes far more sense. The speed of the model really matters for a lot of the Google use cases - AI mode is very user driven and Google knows better than anyone that speed really matters. And the actual serving cost Google pays is almost certainly a fraction of the external facing price, so that becomes irrelevant. The most interesting part of this story though, is this excellent comment on Hacker News from someone that estimated the size of the model and the fact that it should run on one TPU 8i card (Google's latest custom inference hardware). This does give Google a huge advantage. They are the only frontier lab that (currently) designs its own AI hardware. While other labs certainly optimise their models to the hardware, and also no doubt have a lot of say in driving the Nvidia/AMD roadmaps to their specifications, the model teams and hardware teams in Google almost certainly collaborate to a far greater level than the other labs. This really matters. If you have a very good steer on upcoming hardware you know the right size of models to target training runs to aim for. And equally, research from Google Deepmind can go straight into the hardware roadmap without any negotiations. [1] It'll be very interesting to see how this continues to develop. Inference efficiency will be the key driver to actual unit economics in AI, and Google may develop an outsized lead in this. The one real weakness I think Google has though, is their confusing and incoherent strategy on coding agents. While Anthropic has Claude Code, and OpenAI has Codex, in true Google style they have ended up with a smorgasbord of tools. There is currently Antigravity, Jules, Gemini Code Assist, Gemini CLI and AI Studio all doing slightly different things. This doesn't include some other agentic SWE tools they have for specialised purposes (like Android Studio). They announced that Gemini CLI is being discontinued and folded into Antigravity, but I very rarely come across any developer using Google-based SWE tooling. This is a huge issue for Google - there is no doubt that Claude Code and Codex is producing a lot of very detailed telemetry and training data that can be used to improve further models. Without this being resolved, Google does have an extreme weakness in the fastest growing - at least revenue-wise - segment of AI. While I definitely wouldn't write Google off - they do have enormous structural advantages in other areas - I get the feeling that because Google has such a bespoke internal software development workflows [2] their isolation from what "the rest of the industry" does in software is so large it's perhaps hard for them to really reason about agentic tooling for the rest of the industry. My read is that Google is playing a genuinely different game to OpenAI and Anthropic. Gemini 3.5 Flash only looks strange if you assume it's meant to win the same race - priced and tuned for Google's own gigantic internal token consumption, with the TPU advantage baked in, it makes complete sense. Where they're actually behind is the developer-facing surface: a confused tangle of coding tools and an org that struggles to reason about how the rest of us build software. If Google sorts out the agent story, the structural advantages underneath - the silicon, the research, the integration - could make them very hard to beat. That's a big if. But I wouldn't bet against them. While it's hard to say if there was any truth in this - or it was just a negotiation strategy - there were rumours of OpenAI being unhappy with direction/progress Nvidia was making earlier this year: https://finance.yahoo.com/news/sam-altman-pushes-back-report-213000823.html ↩︎ Google engineers have an enormous amount of home built/custom/internal tooling that is uncommon outside of Google-scale companies. They use different source control, build tooling, testing infrastructure and build deployment to the rest of the industry - for very good reasons! But this stack is absolutely overkill for 99% of companies, and when you are used to thinking about SWE at Google scale I suspect it is very difficult to reason how people build software outside of that ecosystem. ↩︎ While it's hard to say if there was any truth in this - or it was just a negotiation strategy - there were rumours of OpenAI being unhappy with direction/progress Nvidia was making earlier this year: https://finance.yahoo.com/news/sam-altman-pushes-back-report-213000823.html ↩︎ Google engineers have an enormous amount of home built/custom/internal tooling that is uncommon outside of Google-scale companies. They use different source control, build tooling, testing infrastructure and build deployment to the rest of the industry - for very good reasons! But this stack is absolutely overkill for 99% of companies, and when you are used to thinking about SWE at Google scale I suspect it is very difficult to reason how people build software outside of that ecosystem. ↩︎

0 views
Simon Willison 2 days ago

Claude Opus 4.8: "a modest but tangible improvement"

Anthropic shipped Claude Opus 4.8 today. My favourite thing about it is this note in the release announcement: Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor. There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost. It's so refreshing to see an AI lab honestly describe a release as a minor incremental improvement over the previous model! Honesty seems to be a theme. Here's my other favorite note from that announcement: One of the most prominent improvements in Opus 4.8 is its honesty . We train all our models to be honest---for instance, to avoid making claims that they can't support. But a general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. This is borne out in our evaluations , which show that Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked. That linked system card includes the following: Claude Opus 4.8 had the lowest incorrect-rate of the six models on every benchmark—the most direct measure of factual hallucination. It achieved this mainly by abstaining on questions about which it was uncertain rather than by answering more questions correctly. Not much has changed since 4.7. It's priced the same as Opus 4.5/4.6/4.7 - $5/million input and $25 per million output. "Fast mode" is twice that price, which is a significant reduction from their previous models - fast mode on 4.6/4.7 remains at $30/$150. Note that fast mode is only available to organizations that are part of the research preview, "Contact your account manager to request access". Both the reliable knowledge cutoff and the training data cutoff are January 2026, the same as for 4.7. The context window is still 1,000,000 tokens, and the max output is 128,000 tokens. The What's new in Claude Opus 4.8 document has some of the more interesting details. These caught my eye: Mid-conversation system messages . Claude Opus 4.8 accepts messages immediately after a user turn in the array (subject to placement rules ). This lets you append updated instructions later in a long-running conversation without restating the full system prompt, which preserves prompt cache hits on the earlier turns and reduces input cost on agentic loops. See also this update to the Anthropic Python SDK. Being able to steer the system prompt mid-conversation sounds really powerful. I was worried this would be incompatible with the abstraction provided by my own LLM library , which expects a single system prompt per conversation... but it turns out my recent redesign should handle that just fine . Lower prompt cache minimum . The minimum cacheable prompt length on Claude Opus 4.8 is 1,024 tokens, lower than on Claude Opus 4.7. I checked and 4.7's minimum was 4,096 . Here are pelicans riding bicycles for all five thinking levels, , , , , and : This time I ran them using the LLM CLI , exported the logs to Markdown and then had Claude Opus 4.8 build me an HTML tool that could render that Markdown with the fenced code blocks displayed as SVGs on the page. (I later had GPT-5.5 xhigh in Codex update that code to remove any XSS holes. I'm sure Claude could have done that if I'd asked, but GPT-5.5 is my code security blanket at the moment.) The max one was clearly the best, but it did take 25 input, 17,167 output tokens for a total cost of 43 cents ! You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

0 views
ava's blog 2 days ago

AI blog question challenge

Rishabh made an AI blog question challenge and invited me to fill it out. Let's go! 1. How was your first experience with AI models? I used to have fun playing around with NeuralBlender, and used it to inspire glitch art of mine that I drew. Back when ChatGPT launched, I used it to teach myself HTML and CSS. 2. Do you use AI or are you completely against using it? On average, I use it once a week or less; weeks can go by where I don't use it. Due to my field of interest, I want to keep up to date on some use cases and capabilities, and make my own experiences instead of relying on what the hype online says. I feel like I can't properly write about my criticisms or privacy concerns if I don't use it at all, or don't test the use cases people rave about (which often leave me deeply disappointed). Occasionally, my boss will also ask me to trial out some use cases at work. Situations I use it for in private when I am not testing what others are doing: 3. Do you have any preference among different models, for example Claude vs ChatGPT? If yes, how do you choose? I only use ChatGPT and Lumo, and I'm trying to permanently move to Lumo. I no longer want to use anything made by OpenAI. 4. What aspect of AI models do you like and what do you not like? I hate the sycophancy and wordiness. Even when I adjust settings to be short and precise, they still yap. I don't like all the subheadings and bullet point lists, I prefer a full text. I turned emojis off. I also hate when they constantly repeat my name, so I removed that again. I also hate how mean Lumo can get; I want no sycophancy and the fucker will start bullying me for some reason. I like the aspect of being able to ask something when no one else is available (either due to the sensitive matter, embarrassment, or time issues). 5. How do you feel about AI generated images? Does it annoy you if someone uses them in a blog post? Seeing an AI generated image on a blog post is about as nice as being greeted by a steaming turd. Even worse when I know it isn't a bot blog and the person spent time crafting the text, only to include a graphic that has several errors, spelling mistakes and other unfitting or illogical stuff. Do you have absolutely no shame or quality standards? You wanna tell me you looked at that picture that said "thseism" instead of "theme" somewhere in it and thought " Yup, that's it, best I can do, hope my readers enjoy this total eye candy, can't see anything wrong with that "? What is it supposed to convey to me as a reader - that you didn't even look at it, or that you were too lazy to formulate a second or third prompt? 6. Internet is flooded with AI slop now, full of generated text, images, audio and videos. How do you filter it from authentic human creation? Do you have a strategy? I'm not on any of the big platforms or their replacements, and I consume the internet through my highly curated RSS feed reader where I follow real people who don't use it like that, or the Discover page. It's easier to avoid when your internet use is limited, in a niche, and mostly used for blogging, reading and studying. I have a good grip on detecting generated text and images, but I've noticed that videos and gifs can easily fool me by now. 7. Are you hopeful for a better future with A.I. or a dystopian one? Hard to say; I think AI is absolutely a dystopian nightmare when used in surveillance and war. For the rest, I assume the bubble will pop and few dedicated models for specific niches and use cases will remain that have proven to be useful and worth the cost, and the rest will fade away. I hope it can do some good in healthcare, but that may be wishful thinking. If AI went away completely, I would not miss it. Reply via email Published 28 May, 2026 I can't find something specific (like a specific word, jargon, saying, concept, item name etc.) via normal search engine use or can't find a clear explanation for something I find difficult to understand. Needing an easy language version for a really difficult paragraph, law text passage, case part etc. that I can't seem to crack on my own. Career and job questions I am unable to ask anyone both offline or online, because people I know in real life can't help, and I'd have to reveal too much to others if I asked online. Career trajectory brainstorming, 3-year and 5-year plan stuff.

0 views
Andre Garzia 4 days ago

We need to own our computing experience

Originally when I talked about owning our own platform is this blog, I meant owning the stack that powers and serves the blog. Moving to your own VPS or servers or static pages in which you didn't depend on some *Blog As A Service* company such as Wordpress.com. Eventually, I [started talking about owning the workflow that empowered your blog experience](https://andregarzia.com/2026/02/building-your-own-blogging-tools-is-a-fun-journey.html) not only your posting experience but your reading experience. To that effect, I showed how I created my own blog reader and integrated that into Firefox and also my own blog editor. Recently, I think that we need to move further into owning more and more of our computing experience. The avalanche of LLM/AI based slop solutions being force fed into our lives is radicalising me towards a very specific path in which owning my own platform now needs to mean controlling my own computing experience. I been an Apple user for a very long time and have [spoken previously about my recent desire to leave the platform](https://andregarzia.com/2026/03/apple-just-lost-me.html) because of a recent decrease in quality of macOS, change in priority for Apple in regards to being an independent developer in their ecosystem, and a general feeling that I must move away from big tech. In that post, I outlined my desire to move to an [MNT Pocket Reform](https://mntre.com), [Fairphone Gen 6 with potentially Murena /e/OS](https://fairphone.com) and maybe a NAS. I already purchased the Pocket Reform and am waiting for assembly and shipment, but I changed my approach for the next two items in that list. Instead of buying a NAS, I decided first to experiment with self-hosting and homelabbing by converting an old x86 MacBook Pro into a server using [Yunohost](https://yunohost.org). That server is going surprisingly well for me and I am moving more and more of my computing to inside the house. I will eventually get a proper NAS or build one, but at the moment that server is all I need. I am even hosting my [fediverse account](https://social.soapdog.org/@soapdog) in it using [GoToSocial](https://gotosocial.org). I reckon that I will spend close to 500 pounds to get the Fairphone with /e/OS. I don't have that budget right now and am afraid of doing it blind cause I been checking the forums and it seems like WhatsApp stopped working in the last update and not all features of Halifax UK bank app are working. I don't want a switch to a deGoogled OS to prevent me from talking to my friends or using my bank. I know that sucks, but those are not easily solvable problems. Like my original plan with the NAS, I think I might be able to test the waters of e/OS/ by buying an old second-hand smartphone and installing it and seeing for myself how well it works. That will cost me much less and then if I like it enough, I can make the move to a Fairphone. So now the issue is figuring out what phone to buy on a budget of 150 pounds or less. Moving back to Linux on open hardware and to Android but deGoogled is my slow journey towards computing autonomy. Google was never worth trust, but the recent move to prevent side-loading on Android and stop showing links on their search result page, becoming a de facto slop as service engine, is something I can't really abide. Apple hypermaniacal need to control the experience of their users and milk both developers and users as much as possible reached a tiping point for me. My Macbook Air doesn't feel like mine since there are piling frictions when trying to run software that is not coming from the App Store. I'm done with that. What is left then? We need to return to a human-focused FOSS community. Not the fast turnaround LLM/AI commits into every single repo cause whoever is sponsoring this project needs it to move FAST. The best thing about the free and open source community has never been the code, but the ethos. Made by humans, to be understandable by humans, to be modifiable by humans. This crazy trend towards LLM assisted coding is removing the understandable part. Lots of commits are being generated by machine and reviewed by machines without a single person actually having read the whole thing. That will erode skills and also lead to code that is impossible to maintain cause no one has ever fully understood it. Hence why I am starting to also build my own tools. There are of course tools I depend on that are too large for me to build from scratch, goddess forbid trying to build a web browser, in those cases it is okay to use a FOSS solution like Firefox. But things that are dear to me like blogging, well I can build my own tools for that. Or epub manipulation tools, or small decentralisation apps. The more I build, the more I can be sure I can maintain it in the long run. I don't want a Web where all we do as creators is feed training models so that gigantic greedy corporations can get it all wrong and regurgitate shit to users. FAANG erected a wall inside the internet and creators are now on the outside. Fighting back is not done by creating local models, or ethical AI companies, fighting back is done by walking away and playing a different game. We can't win over Google and Apple at their own game. It is rigged. But we can play a different game in which they don't matter. For me that game is building offline-first, local-first, decentralised tools and apps for my friends and whoever else can benefit from them. Create for those around you, for those that matter. Forget web scale, think in terms of a village. Get back to Linux, deGoogle yourself if you're able to. Create FOSS and also use the tools you create. Use repairable tech if you can afford it and make sure to step out of this consumption and slop cycle the digital world has become.

0 views
Blargh 5 days ago

RustRadio UI improved

This is just a short followup to the last RustRadio post. If you came for more rants about C , you’ll be disappointed. I’ve never been that interested in writing UI code, including HTML. You can see the “programmer art” in the screenshots linked from www.habets.pp.se . And then the slightly different tech section , that doesn’t serve much of a purpose now that we have github. I’ve not been happier with GTK, QT, and the others either. But [RustRadio][rustradio] needs a UI. I feel like the browser is the most stable and portable UI. So I’d already decided on that. So now I have to manually do a bunch of DOM manipulation, to create an interactive UI? Or worse, learn the React/Angular/Whatever flavor of the day, that will be obsolete by next afternoon? Gag me with a spoon. For now I’m just continuing to focus on the SDR and architectural parts of RustRadio, and I’m letting the LLM-written code do the HTML manipulation. Yeah, it’s kinda vibe coding. But doesn’t use , and it demonstrably outputs what I want. (I mean, sure it may require some follow-up prompts), so who cares? The vibe coding is isolated to the files doing the drawing. If I want to artisanally craft better code in the future, that’s the file that needs to be rewritten. Until then, it works. <iframe width=”560” height=”315” src=”https://www.youtube.com/embed/7k0JNT6itaI frameborder=”0” allowfullscreen></iframe> See the quick start instructions in the ruwasm repo for how to run this UI live with an RTL-SDR.

0 views
James Stanley 1 weeks ago

How to make a hyperlink

I am driven to write this post because of repeated disappointing experiences where coding agents seemingly don't know how to make hyperlinks. It occurred to me that this may just be a sign of the times, and maybe many younger human developers don't know how to make hyperlinks either. So in the hopes that this is going to be beneficial to future coding agents, and perhaps to humans as well, to the extent that humans are going to even be writing code... here is how you make a hyperlink: Here is how you don't make a hyperlink: Why? What's the difference? Real hyperlink JavaScript click handler Browser status bar shows link target ✓ ✗ Less code ✓ ✗ Doesn't need JavaScript ✓ ✗ Easier to find link targets when scraping html ✓ ✗ Right-click to open in new tab/window ✓ ✗ Right-click to copy link target URL ✓ ✗ Middle-click opens in new tab ✓ ✗ Accessible to screen-readers etc. ✓ ✗ Keyboard navigation with tab key ✓ ✗ If you're a coding agent and you make a link this way, you may find that the next request you get is something along the lines of: OK but can we make it so middle-click opens in new tab? Thanks I do not want to see this kind of thing: Don't try to work around the obvious drawbacks of using fake links by implementing all of the things that browsers do with real links. That way only madness lies. Please use real hyperlinks and it will all work automatically. Thanks for listening to my TED talk.

0 views
flowtwo.io 1 weeks ago

Othello World

I was introduced to the board game Othello (also known as Reversi) on a recent trip to Japan. It's one of those games where you can learn the rules in 5 minutes, but the gameplay dynamics are surprisingly deep. When I saw it's played on an 8x8 board, like chess is, I immediately started thinking about how to program a game engine for it. The 8x8 board is helpful because it allows you to represent the board state with 64-bit longs; each set bit in the number indicates the presence of a piece on that square. When you perform a bitwise operation on these numbers you're essentially computing multiple piece movements in parallel with a single CPU instruction. This computational efficiency enables deep searching of the move tree. I purposely started out without reading too much about game strategies because I wanted to explore it through coding the engine logic. It didn't take long to create an algorithm that is significantly stronger than me. Although it's not a high bar. There's a demo available here if you're interested in playing it. The basic building blocks of the game engine are as follows: Once you have these four elements built and wired together, you have a functional game engine to play against. The first two pieces are fairly straightforward—the real strength of an engine comes from how the last two are implemented. Like I mentioned above, we can represent the complete board state with just two 64-bit numbers. One number represents the black piece positions and the other for the white pieces. How you encode the 64 squares to the 64 bits is arbitrary, but I chose to represent each row as one byte (8 bits) and from left to right, top to bottom in terms of bit significance. In other words: And that's all that's needed to represent the piece positions. I created an immutable data class to encapsulate this: In Othello, if one player has no legal moves at any point in time, they skip their turn and the other player gets to go again. If both players have no legal moves, the game ends. Instead of computing both player's legal moves every time to check for those situations, I created a enum so that information somewhat pre-computed. The combination of and provides everything needed to determine the state of the game for the other stages in the engine. This is where things get tricky. Move generation requires codifying the rules of Othello in such a way that, given a board state, all the legal moves for either player can be computed—quickly, ideally. In Othello, you can only place a piece somewhere that will "sandwich" the other player's piece(s) between the piece you're placing and another "anchor" piece of yours. There can't be any blank spaces either. This rule applies to any of the 8 directions of the board (diagonals count). This screenshot illustrates the valid moves for black in this position: This function will calculate all the eligible squares for a single direction of movement (up, down, up-left etc.). What's cool is that it calculates eligible squares for all 8 rows/columns/diagonals at the same time. It's invoked as follows. For each of the 8 directions, you pass in a movement function and an ineligible square bitmask if required for that direction. For example, if shifting towards the left, you need to mask out the pieces on the leftmost column to prevent wrapping to the other side of the board (similarly for moving right). Moving up or down doesn't require a mask because shifting the bits "up" or "down" enough will just drop them from the number entirely. The function will return all valid moves for a given position for the "moving" pieces (the 1st argument). The moves are returned as a where each set bit is a valid square to place a piece. This part was interesting to me as I don't know much about strategy in Othello besides that the corners are important. The corners are important because once you claim a corner it can't be unflipped by the other player. Also, simply maximizing for the most pieces isn't the best strategy either, apparently. I do have a "greedy" algorithm that you can select in the demo app if you want to see that strategy in action. But of course, closer to the end of the game, having more pieces is more important since that's how the winner is determined. I represented this in the eval function by linearly shifting the weighting towards piece score as you get closer to the end of the game. I have two piece scores actually. The is a step function that only returns 1 or -1 depending on which piece colour has more pieces. But in the heuristic evaluation, I look at the actual piece differential score which returns between -100% and +100% depending on what "percentage" of the overall possible pieces the leading player has. That score is given 40% weighting in the heuristic evaluation function, the other 60% is a positional score based on the following square values I came up with: This was my best guess at which squares matter most. My reasoning is that the more central the square is, the more likely it is to be flipped. The closer to the edge it is, the less likely it is to be flipped and the more likely it is to be used as an anchor piece. So putting this all together, the heuristic evaluation is computed as follows: And that's it. The top-level function provides a relative score between -1.0 and +1.0 which represents the strength of a given position, relative to black. Since Othello is a zero-sum game, a good score for one player is an equivalently bad score for the other player. This is important in the next phase, the move search algorithm. This part of the engine is fairly "textbook". There's lots of explanation for how these algorithms work on wikipedia and chessprogramming.org is an incredible knowledge base for this sort of thing too. For zero-sum games, you can use a variant of minimax search called Negamax . That's what's shown here: For Othello specifically, the Negamax function needs to handle the case that the moving player has no legal moves and must pass to the opposing player. This is in the branch in the middle. We check if we're already in a position where the previous player had to pass, which means both players can't move and the game would be over in this branch. If not, we simply call again with the SAME and reverse the score returned from that call. With those 4 components built, I now had a functional engine to play against. I created an class that accepts a move selection algorithm. It exposes 3 methods: - for showing valid player moves in the UI - which validates and then applies a specified player move - which chooses and applies the best move using the I exposed the via a stateless REST API. Each request needs to supply the current game state information in order to make a move. For example: For the demo , it uses HTMX instead to return a rendered board component. The request format is the same but it returns HTML instead of JSON. I read this article recently that took a contrarian view on agentic coding and it's pitfalls. The author makes a lot of good points and it was thought-provoking. While I don't agree that using agentic coding will make you dumber per se ... I do think there's something to be said for regularly exercising the critical thinking and problem solving part of your brain if you want to be a good software engineer. Side projects like this are a great opportunity to do that. The incredible rise in coding competency for AI agents over the last 12 months has made a project like this into a one-shot, one prompt task for a recent LLM. I obviously didn't do that, because the point of this project was the act of doing it, not the end result. I learned a bit about Othello and refreshed myself on bitwise operations. The parts I wasn't interested in doing, the UI and the API wiring, I delegated to an agent to implement for me. To me, that's one of the best parts about coding with AI. I can now offload the tasks I'm not interested in or that's not as critical, and focus on the parts of the system I want to work on. It's never been easier to build and bring ideas to life with software. Board representation Move generation Position evaluation Game tree search

0 views
Unsung 1 weeks ago

Lisa’s copy (and cut, and paste)

I love looking at origins of obvious things, because of two things: I’ve been emulating the Apple Lisa recently, and I was struck by how many of its UI strings were slightly or wholly different than what we’re used to. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/1.1600w.avif" type="image/avif"> It makes sense. Lisa came out in 1983 as Mac’s predecessor and really the first GUI that is directly linked to what we’re using today. Even though it borrowed things from work done at Xerox , tons of conventions were not established yet. So, I thought it would be fun to actually take a closer look. For context, Lisa was as slow as it was expensive, and generally considered a failure. It was basically abandoned by 1985. Not much third-party software has ever been written, but Lisa shipped with 7 impressive office apps with fantastic names: LisaWrite, LisaCalc, LisaDraw, LisaGraph, LisaList, LisaProject, and LisaTerminal. The screenshots below come from an emulator and from manuals (this links to the 1984 version, but each manual also includes a link to the original 1983 edition). The emulator is pretty harrowing; please upvote the idea of Lisa in Infinite Mac if you would want to see it! = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/2.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/2.1600w.avif" type="image/avif"> As Lisa powers up, we see the appearance of the “wait” dialog box. We’ll encounter more symbols like this triangle, inspired by traditional flowcharts. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/3.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/3.1600w.avif" type="image/avif"> Let’s start with menus, as these really were the treasure map to the whole system. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/4.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/4.1600w.avif" type="image/avif"> The Desk menu is basically the equivalent of the dock today. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/5.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/5.1600w.avif" type="image/avif"> The File menu has Print appended to it, indicating how important printing was still then; a truly “paperless office” won’t really be possible for two more decades (and seemingly still hasn’t fully arrived ). There is no Window menu yet, so the menu also contains some of that burgeoning functionality. Set Aside is what we would call Minimize today. Save & Continue is basically a contemporary Save, and Save & Put Away a hypothetical Save & Close. Revert to Previous Version is the same as today’s Revert. By the way, in the Revert dialog I appreciated the nice gesture of telling the user how much time passed since the last save, and a warning about undo (we’ll get back to this): = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/6.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/6.1600w.avif" type="image/avif"> Print Current Selection would today be just Print Selection. Print As Is is basically Print… but skipping the setup dialog with number of copies, etc. It was added later in Lisa’s life, and today, we’d probably call it Print Again? If you’re noticing a pattern already, it is more wordiness compared to what we see these days. It makes sense. Our growing familiarity with these concepts is what will allow these strings to become tighter over time. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/7.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/7.1600w.avif" type="image/avif"> This is that Print… dialog, by the way, with beautiful “while you wait” and “while you work” verbiage (although usually I do not condone strings getting so close to each other). The manual explains : “You can have the Lisa use most of its attention to print your document while you wait. A document will print more quickly if you choose While You Wait, but you won’t be able to use the Lisa for any other tasks.” The other strings feel less typical. Format For Printer… is Page Setup, but with a lot of quirks. Printers were not usually yet WYSIWYG , able to mirror stuff exactly on the screen. They often came with their own fonts, so some matching was necessary: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/8.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/8.1600w.avif" type="image/avif"> The manual had an entire section called “When Settings Don’t Match a Printer,” and there were I imagine god knows how many error cases that had to be covered, including: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/9.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/9.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/10.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/10.1600w.avif" type="image/avif"> And Monitor The Printer… is today’s Print Center: a way to see the real-time printing status. Note a lot of writing here elaborates further on the “while you wait/​while you work” dichotomy: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/11.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/11.1600w.avif" type="image/avif"> Monitor The Printer was important, by the way, since the manual warned you your printer might occasionally become haunted: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/12.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/12.1600w.avif" type="image/avif"> But, let’s go back to the File/​Print menu. I actually found a version of this menu that comes from a 1982 pre-release Lisa , never launched to the public. Let me show them side by side: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/13.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/13.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/14.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/14.1600w.avif" type="image/avif"> It’s fun to see designers figuring it all out. You will notice the lack of dividers and ellipses actually touching the work-in-progress strings. 1983’s Set Aside is 1982’s very modern Close. Save & Put Away is Put Back. And, at the bottom, it seems the team didn’t yet figure out that the menu options need to consistently use verbs for commands, and adjectives or nouns for toggles – so we see Intended for Printer… (rather than Format For Printer…) and Printing in Progress… (rather than Monitor The Printer…). = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/15.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/15.1600w.avif" type="image/avif"> Lastly, in a released version of LisaList, this menu would come bearing a harrowing Fix Damaged Document command. Not only it doesn’t even have an ellipsis, but the manual also says “there is always the chance that the recovery process will make things worse instead of better.” Vaya con dios, I suppose. Let’s move on to the Edit menu. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/16.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/16.1600w.avif" type="image/avif"> Today’s Select All is a verbose Select All Of Document, and since this is the first public appearance of undo, that feature is also more descriptive, appearing as Undo Last Change. But otherwise the menu feels surprisingly modern, shortcuts and all. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/17.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/17.1600w.avif" type="image/avif"> Unsurprisingly, the first undo wasn’t as developed. We saw earlier in this post “Once you click OK, you will not be able to change your mind, even with Undo,” which today would probably say “This is not undoable.” You could also see a frightening error message arriving without any further clarification, like above. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/18.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/18.1600w.avif" type="image/avif"> Sometimes, the app would warn you undo doesn’t have your back. We’ve seen this before, and here’s another example. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/19.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/19.1600w.avif" type="image/avif"> Since undo only had one step, LisaCalc and LisaList also had Restore Previous Entry for when you changed your mind after editing a cell in the spreadsheet. You had to employ this strategically, as you did the already-mentioned Revert to Previous Version. “You can even undo Undo!” bragged the manual, and I imagine there must have been interfaces where undo came without a matching redo. But the eventual solution, of course, was bidirectional undo/​redo with many steps. This basically only needed more memory, still very expensive in 1983. Above we also see Clear Entries that would just be called Clear today. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/20.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/20.1600w.avif" type="image/avif"> Elsewhere in Edit menu, Clear Lines Off Top would appear in LisaTerminal only, and was a charming (and I would argue better) way of saying Clear Scrollback. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/21.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/21.1600w.avif" type="image/avif"> The next menu, Type Style, would be called Font today. “Type” is typewriter nomenclature – Lisa was meant to be a typewriter replacement. The point/​pitch convention for font sizes and letter spacing also comes from typewriters, and in an older version of that menu even font names arrive from that universe (PS = Proportionally Spaced!): = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/22.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/22.1600w.avif" type="image/avif"> Otherwise, notable is the deterministic Plain Text reset with a P shortcut that would in time lose to printing. I miss this sometimes, this “reset” idea, as I think it would nicely compliment Paste And Match Style. (By the way, Lisa was the last computer to use Apple logo as a modifier key .) = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/23.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/23.1600w.avif" type="image/avif"> While Type Style is for selection, Format ¶ is all about paragraphs – HTML people know this distinction as “inline vs. block.” (The pilcrow symbol means “paragraph,” although I did not expect it to be common use even then.) The flyout menus with their convoluted mechanics weren’t invented yet, but in some sense there was no need for them as the options were very limited. It is interesting to see Margin/Tab Ruler as two options with deterministic shortcuts ([ and ]). But the most unbelievable shortcut must be Same As On Clipboard. It reformats the current selection to match what you have in the clipboard – an early salvo in an endless battle that later brought us Paste Special, Paste And Match Style, Paste And Retain Style, Copy/​Paste Properties, Paint Format and so on, and so on. And it was given S, rather than spending it on Save (& Continue). Otherwise Left Flush and Right Flush would be called aligning today, and the ¶ pilcrow symbol would be replaced by a simple Paragraph Spacing. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/24.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/24.1600w.avif" type="image/avif"> In LisaCalc, Format is missing the ¶ because, well, there are no paragraphs in spreadsheets! I love Words Left/Nos. Right, and empathize with trying to align the digits. But it wasn’t even close , was it. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/25.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/25.1600w.avif" type="image/avif"> Page Layout shows that we’ve had UI boolean problems from day one. Show Page Ruler and Hide Page Ruler do it deterministically, with one always disabled, and without checkmarks. Preview Pages and Don’t Preview Pages do the checkmark, but introduce a dreaded double negative. (These last options, by the way, is the “pages/​pageless format” showing page margins and dividers, that bother us so much about Google Docs.) Today, these would all be in the View menu that doesn’t exist yet. And speaking of boolean challenges, here are some top-level menus from LisaList with even more conventions: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/26.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/26.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/27.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/27.1600w.avif" type="image/avif"> But, back to the Page Layout Menu. Insert Page Mark would be Insert Page Break today. I really love Allow To Cross Pages as the opposite of Keep On Same Page, and the incredible O and Q shortcuts. In LisaCalc, this particular menu comes with a beautifully named For Your Information (sentence capped, for some reason)… = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/28.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/28.1600w.avif" type="image/avif"> …throwing up a sheet-like window showing basic stats. Today, that window would have a more boring name and probably land in the File menu: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/29.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/29.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/30.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/30.1600w.avif" type="image/avif"> The Search menu is fascinating – why wasn’t it called Find like its items are? I am particularly enjoying W keyed off of Find What (today: Find), while F is taken by Find Next Occurrence (today: Find Again). There is some mnemonic sense to it all, but I like today’s proximity of ⌘F/G better. What we know as Replace is Change here, and I am particularly loving Cases Must Agree and Cases Need Not Agree (today usually called “case sensitivity.”) Hide Dialog Box is a string with surprising to me amount of UI jargon. The H shortcut was added later in Lisa’s life, presumably at users’ behest. It’s strange today to see a shortcut like this to hide one specific floating dialog box. Similarly, Insert Wild Card with a confusing ellipsis allows you to insert a symbol in your find dialog that stands for “match anything here” – top-level menu options reaching inside specific dialog boxes were not uncommon in early years of GUIs, but I think fell out of favor over time as the idea can be conceptually confusing. The menu below is from LisaWrite, and I like how comparing it with other apps makes us see the team trying to settle on a convention. In LisaList there are no ellipsis, but question marks! = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/31.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/31.1600w.avif" type="image/avif"> And in LisaCalc, there are… both: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/32.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/32.1600w.avif" type="image/avif"> You can notice that it wasn’t clear where one would put Find-related commands and their today’s presence in Edit menu doesn’t really make a lot of sense, either. We just got used to it. (Also note the “occurence” typo.) = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/33.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/33.1600w.avif" type="image/avif"> Spelling menu has a bunch of fun options and conventions, and an extremely generous use of keyboard shortcuts: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/34.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/34.1600w.avif" type="image/avif"> LisaDraw sports the Arrangement menu, which will look very familiar to anyone using Illustrator, Sketch, Figma, and so on. This is where Bring To Front and Send To Back started! With a tiny bit of editing (Arrangement is now Arrange, and some of the Objects nouns would be omitted), this would feel pretty modern. I love these visual menus, and I think we lost that kind of stuff along the way: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/35.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/35.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/36.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/36.1600w.avif" type="image/avif"> Okay, let’s move on from menus. The system also relied a lot of dialogs. Let’s look at some of them: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/37.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/37.1600w.avif" type="image/avif"> This wordy dialog would become a small loading state today. The verbose “To terminate the operation, hold down the Apple key while you type a period” probably felt necessary because other than Shift on a typewriter, people were not familiar with modifier keys. Lisa doesn’t have the Esc key, and Mac still respects the ⌘. convention in many places in 2026. (By the way, why would you want to stop saving? Presumably because it could take quite a while.) = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/38.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/38.1600w.avif" type="image/avif"> In this similar dialog, you can see a reference to a “micro diskette.” Even though Lisa’s “Twiggy” disks seem gargantuan today, they were smaller compared to the original, 8″ floppy disk . (In a similar way, Lisa and other machines of the era were called “microcomputers.”) = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/39.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/39.1600w.avif" type="image/avif"> Lisa had some proprioception : In this dialog, the disk put in the first drive is called an “upper diskette.” (Also note: more undo education.) = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/40.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/40.1600w.avif" type="image/avif"> Disks were not large, so sometimes you had to deal with this kind of horror. It’s interesting how the dialog plain sends you to the manual – an early equivalent to eventual Learn More links. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/41.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/41.1600w.avif" type="image/avif"> This is another example of a rather verbose set of instructions. On one hand, this is better than “Error 456” and nothing else. On the other hand, it feels like a lot of stuff to memorize. Also of note, the beautiful Housekeeping menu. I actually forgot about the Finder (or, in Lisa’s parlance, Desktop), so here’s a screenshot of it also: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/42.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/42.1600w.avif" type="image/avif"> Housekeeping was basically the junk drawer – on the Mac a year later, this will be named Special. It also has some stuff that today would be in the View menu. (This later version of Lisa calls Trash the same as the Mac. Earlier on, you would see it named a Wastebasket instead.) = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/43.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/43.1600w.avif" type="image/avif"> Of note elsewhere in Desktop is the use of the term Stationery, roughly meaning “template,” but with extra sprinkling of desktop-metaphor skeuomorphism. Also, Attributes Of is an early version of Get Info. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/44.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/44.1600w.avif" type="image/avif"> Another verbose dialog (compare with Abort/​Retry/Ignore from around the same time). This is before we invented hint text that we’d just put under the buttons themselves. In case you haven’t noticed by now, Lisa’s strings all have two spaces after a full stop! = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/45.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/45.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/46.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/46.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/47.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/47.1600w.avif" type="image/avif"> There was lot of “you cannot” dialogs, walking you through some recovery steps. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/48.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/48.1600w.avif" type="image/avif"> Plug and play didn’t yet exist (this would all happen in the 1990s), so that had to be explained also. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/49.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/49.1600w.avif" type="image/avif"> I also love the anthropomorphic phrasing “Preferences has been told,” which I don’t believe you see anywhere today. And I think we can round up this post with a few small delightful language details like this one. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/50.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/50.1600w.avif" type="image/avif"> As a huge fan of the slightly pretentious “presently” over “currently,” I smiled seeing this next to the printing status. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/51.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/51.1600w.avif" type="image/avif"> “Just a moment, please…” feels so old-fashioned, somehow. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/52.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/52.1600w.avif" type="image/avif"> And I want to end on a pre-release version of the Edit menu we’ve already seen. You can spot here Select Entire Document (instead of eventual Select All Of Document), but of course the best thing is the Copy, Cut, & Paste with an ampersand! I find it so, so charming. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/53.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/lisas-copy-and-cut-and-paste/53.1600w.avif" type="image/avif"> I hope you enjoyed this tour. It was interesting to me to see how many of these became the standard back there and then, how many were tweaked a little bit, and which ones had to be redone more thoroughly. Now, excuse me as I have to go deal with my whistling printer. #history #interface design #writing They help me get unstuck. If you go far enough, you will find out that even the most ossified conventions that are older than you haven’t always been this way. They put me in the mood of “what of the things that feel normal today that deserve to feel dated, obsolete, or awkward?” Find Next Misspelling (you don’t often see that word!) Suggest Corrections + Paste Guess (this is just replacing the word with the suggestion – interesting use of the clipboard metaphor) Put In Dictionary (today: Learn Spelling)

0 views
Manuel Moreale 1 weeks ago

Piri

This week on the People and Blogs series we have an interview with Piri, whose blog can be found at pketh.org . Tired of RSS? Read this in your browser or sign up for the newsletter . People and Blogs is supported by the "One a Month" club members. If you enjoy P&B, consider becoming one for as little as 1 dollar a month. Hey, I'm Piri. I'm a software designer, engineer, and artist of sorts. I build Kinopio , and have been blogging about the craft of making software for 12+ years (:O). I went to school in Toronto for biology and urban planning. There I learned that I liked illustration a lot more than writing boring reports and papers. After school, I got a job at a startup as an illustrator, that turned into product design, when also turned into writing code so I could build the ideas in my head. I can't remember a time when I didn't have some kind of blog. In university, I met a lot of new friends around the world by doing more angst-y cringe-y livejournal-y style writing. I started designing pketh.org while on a flight to SF, paid for by Yahoo, for a job interview at Flickr (times sure have changed). If you’re curious about the green design, I was inspired by the 1956 Jaguar D-Type, which I still think has such a unique prototype race car shape. My posts are usually long essays that take about a week or two to write and produce, so I try and make them timeless. When I have an idea for a post, I'll make a Kinopio space for it and collect thoughts, images, and URLs in it for a while. If after weeks or months it’s still on my mind, I'll start connecting and organizing everything into a rough outline. From there I'll start pasting things in and typing it up in either IA Writer or TextEdit. When the draft is done, I usually have someone proof-read it and use that feedback to make final edits. Then the final HTML formatting bits are done in my code editor of choice, SublimeText. Writing is like a muscle that atrophies when you don't use it. Mine's out of shape so the process is quite painful. When I finally a new post out to the world, I just want to lie down and never get up again. Probably related, but I end up throwing away 1/2 to 2/3 of what I write in a blog post. If I had the time to write more often I suspect it'd get easier. I think I could get pretty good at it. I prefer different places and tools depending on where I'm at in the process. I collect notes, inspiration, and connect related ideas wherever I am, usually on my phone. I like doing the early writing stage in a coffee shop or in bed. Anywhere that doesn't make me feel like I’m doing “Real Work™” yet. When I get really into it, I like to type on a desk with a good keyboard (I'm a big HHKB fan), on a screen big enough for me to keep my context windows (dictionary.app, Kinopio spaces, related web pages) next to my writing window. My blog uses Jekyll and is published on Github Pages. The domain stuff is done through Hover. It's quite basic. I might use something newer and nicer than Jekyll, but it would probably be compiled from markdown files the same way. The current design is a bit of a Ship of Theseus that I've been slowly and gently updating it over years, so it's kind of grown on me. I think the domain name is $20~/yr and I think that's it. I'm split on blogs with paid content: If writing is your job, then monetizing somehow totally makes sense. Quality independent writing and journalism is really important and should be compensated (I like Craig Mod's approach ). But for basically everyone else, blogging is a thing they do on the side for fun, and I think it sucks when people feel pressured to turn everything they do into a passive-income side-hustle potential-business-empire. Skimming the depths of my RSS feeds, I realized that I’ve subscribed to literally 1000s of blogs. But sadly most have withered away over the ages. Funkaoshi has been around for even longer than I've been writing – I consider the author my Toronto blogging senpai. I really enjoy Alexotos' in depth mechanical keyboard reviews. It's really cool and encouraging to see newer people blogging the same way we did. Lilly Ashton’s blog is worth reading If you're looking for something more personal and cozy. Since 2018, I've been building Kinopio , a spatial note-taking tool to collect and connect your thoughts, ideas, and plans. You can use it to make sense of your thorniest problems and grow your coolest new ideas into plans. I hope you enjoy it. Now that you're done reading the interview, go check the blog and subscribe to the RSS feed . If you're looking for more content, go read one of the previous 142 interviews . People and Blogs is possible because kind people support it.

0 views
Circus Scientist 1 weeks ago

Installing SmartPoi D1 Mini version with Arduino IDE V2

4. Go to Tools -> Boards -> Boards Manager and select esp8266 to install (may need to re-start Arduino IDE before it shows up) 5. Install the ESP8266 LittleFS Uploader program in Arduino: Step 1: Download the Plugin You need to put this file in a specific directory. If the folder doesn’t exist yet, you will need to create it. The workflow to actually upload files is identical to the old version: The console at the bottom will compile your file system image and push it straight to the flash memory! 5. Get SmartPoi from the SmartPoi Firmware Downloader website 6. Select options in Arduino IDE 2.0: 7. Compile and Upload 8. Do the LittleFS Filesystem Upload mentioned above (step 5.4) The post Installing SmartPoi D1 Mini version with Arduino IDE V2 appeared first on Circus Scientist . Download and install Arduino IDE V2 Go to Tools -> Manage Libraries and install FastLED 3.7.5 (ESP8266 version of SmartPoi will not work with the latest FastLED!) Go to File -> Preferences and input the following in “Additional boards manager URLs” (adding ESP8266 boards support) : http://arduino.esp8266.com/stable/package_esp8266com_index.json Open your web browser and go to the official GitHub releases page for the tool: GitHub: arduino-littlefs-upload Download the latest version ending in (for example: ). Windows: 1. Navigate to: 2. Look for a hidden folder named (note the dot at the front).3. Inside , create a new folder named .4. Move the file into that folder. macOS / Linux: Open Finder/File Manager and go to your home directory: (You may need to hit on Mac to see hidden folders). Create a folder named inside it. Drop the file into that folder. Restart Arduino IDE 2. In IDE 1.8, the tool lived in the Tools menu. In IDE 2, it lives in the Command Palette . Open the Command Palette by pressing: Windows/Linux: + + macOS: + + Type into the prompt. You should see the option: Upload LittleFS to Pico/ESP8266/ESP32 . Open your Arduino sketch. Go to Sketch > Show Sketch Folder . Create a folder named exactly alongside your file. Place whatever HTML, TXT, or config files you want inside it. Important: Select your D1 Mini board and port, and close the Serial Monitor (if open, it blocks the upload). Open the Command Palette ( ) and click Upload LittleFS to Pico/ESP8266/ESP32 . CPU Frequency: 160mhz Board: LOLIN(WEMOS) D1 R2 & Mini Flash Size: “4MB (FS:3MB OTA: ~512KB)” Debug Port: Serial (if you want to see serial ouput – optional) Select your port (COM1, USB0 …) Leave everything else on default settings

0 views
Jack Vanlightly 1 weeks ago

Introducing Dimster, a performance benchmarking tool for Apache Kafka

Dimster = DIMensional teSTER for Apache Kafka On GitHub: https://github.com/dimster-hq/dimster Most of my career in distributed systems has been as a tester, performance engineer and formal verification specialist. I’ve written performance benchmarking tools in the past, for RabbitMQ and Apache Pulsar but in recent years I’ve used OpenMessagingBenchmark (OMB) to run benchmarks against Apache Kafka and other messaging systems. But OMB is hard to deploy and has several limitations compared to more sophisticated benchmarking systems I’ve developed in the past. With Claude becoming so much better since Christmas I decided to write a Kafka-centric performance benchmarking tool, with a lot of inspiration from OMB. I took the bits I like about OMB and the things I like about the tooling I’ve built in the past, to make a performance testing tool for testing Apache Kafka. In this post I’ll introduce some aspects of Dimster that are core to its design: Dimensional testing Shareable, self-contained results with reproducibility in mind Benchmark prep and post-processing Kubernetes as a standardized runtime A benchmarking and stress testing technique I’ve used for years is something I have called “Dimensional Testing”. We can think of all the configs and workload aspects as forming N-dimensional space. Within that space we can explore the impact of points in that space along a single dimension, or even co-varying dimensions. Take a config or an aspect of a workload as a dimension, and run a series of identical benchmarks where a set of points along that dimension are explored (while everything else remains the same). The dimension could be a client config, such as batch.size or acks. It could be an aspect of the workload such as number of consumers, type of consumer, number of consumer groups, the partition count, the produce rate and so on. There are hundreds of dimensions to explore, which requires some patience and care lest you become overwhelmed. The below depicts just three dimensions, and a set of three scenarios which test performance along one or two dimensions at a time. Fig 1. Three examples of varying or co-varying an aspect of a workload as dimensions Each of the above 16 test points (across 3 scenarios) is a separate benchmark, with a fresh topic, warm-up time, recorded time, and cooldown time etc. The generated charts for throughput and various latencies are repeated for each of the three scenarios, with each test point within a scenario plotted as a series/bar on those charts. This makes it easy to compare the performance results of varying the values of a single dimension (or co-varying values across multiple dimensions). Fig 2. Each scenario maps to a set of charts, with the test points as data series. With share groups being relatively new, I could compare the performance of regular consumers against share group consumers, with identical benchmarks where the dimension explored is consumer type (CONSUMER_GROUP|SHARE_GROUP). The following test has as the base workload of ten topics with each topic having 6 partitions, 6 consumers and 4 producers. Each scenario changes the producer rate, and compares consumer groups to share groups. Record keys are used, so batch sizes will be small, which is a tougher workload than a no-key test which typically results in larger batches. The charts below show the results for an EKS deployment with Kafka deployed on 3x m6i.2xlarge with 300 MB/s provisioned gp3. At 50 MB/s we see that p99 end-to-end latency is stable, with roughly 15 ms overhead for share groups. At 200 MB/s, p99 end-to-end exhibits peaks in a periodic fashion. Dimster uses environments. The sizing of a test is determined by which environment is used. I ran some share group consumer scaling tests, with full mTLS, on Kafka clusters assigned 2, 4, and 8 CPUs. These are the equivalent of vCPUs, as my Threadripper has SMT (hyperthreading) enabled. 2-CPU environment on my Threadripper: I ran the following workload with the above environment, with the CPU requests/limit of 2, 4 and 8. Then I used the dimster compare command to generate comparison charts based on the JSON result files of each run. Each chart compares each test point side-by-side. 10k msg/s - 1000 consumers (6th test point in 1st scenario) We see that 2 CPUs fare a lot worse than 4 and 8 CPUs. 100k msg/s, 250 consumers (4th test point, 3rd scenario) The 2 CPU cluster simply can’t keep up with 100k msg/s and 250 consumers. If we unselect 2-CPU, we see that 4-CPU and 8-CPU was ok. Dimster charts are interactive. Series can be toggled, time and percentile ranges can be selected. One thing I really like about OMB is that it produces a JSON file for the results. These files are easy to store and easy to share. But there was also a lot missing for full traceability and reproducibility. Dimster includes the following in every test campaign result (a set of files in a result directory): Results :  The JSON result file which contains all the test point performance results. For each test point, it includes the effective workload and client configuration. It also includes the hardware and other metadata to know what the benchmark was run against. A CSV file generated from the result JSON file (to make it easy to put in a spreadsheet or run custom visualizations). Source configs : The source workload file itself, as well as any additional files such as any dedicated client config file, the broker config file, the version of Kafka, the version of the Kafka clients, and the CPU/memory/disk given to the brokers and clients. Log files : the log files of dimster-core, the benchmarking framework, and each Kafka broker. Charts : Throughput and latency charts (clickable, zoomable) generated from the result JSON file. Dashboards : Grafana dashboards converted to interactive HTML files. I can run a test campaign then send you the results and you’ll be able to reproduce the results because you know exactly what was run and on what. The results are also completely self-contained, if you want to see the dashboard to look at Kafka metrics during the test, it’s right there as an HTML file in the results. No need for access to Grafana and Prometheus and no need to keep monitoring infrastructure around, it can be ephemeral. Dimster comes with four test modes (which all support dimensional testing): Run : Fixed throughput benchmarks, plus: Live-interaction . Run-mode also supports live interaction with the user. The user can change the producer rate, number of producers and consumers, message size, etc.  Availability : Optionally measure availability (producer/consumer/aggregate) during the standard run-mode benchmark. Explore : Discover the highest sustainable throughput while staying under a target end-to-end latency and percentile. Drain-backlog : Build a backlog and time how long it takes for the consumers to drain it. Optionally set a producer rate during the drain phase, such as when testing if a cluster is big enough to drain a backlog while under normal producer load. Correctness : Detects data loss, data corruption, out-of-order delivery and duplicates.  Example 1: Peak sustainable throughput, 1 partition, share group consumers Explore mode on my Threadripper. The idea was to see the bottleneck of a single partition, as consumers are scaled out. The rule was for p75 e2e latency to stay below 50ms. Example 2: Consumer group vs share group with 1 ms processing time The prior example was an unrealistic synthetic test where the consumer spent no time processing. This explore test added 1 ms consumer processing time per message with 300 consumers. It compared a 300 member consumer group with 300 partitions, vs a 300 member share group, with 5, 10, 25 and 50 partitions. Share groups managed the same throughput (95% of theoretical max based on 1 ms processing time and consumer count), on only 10 partitions. Consumers groups needed 300 partitions. Personally, explore and run are my bread and butter benchmark modes. For a given workload I usually start by finding the throughput limit where Kafka transitions from normal stable performance into degraded territory. I either use run mode and use live interaction to discover the performance limit, or I use explore which is slower but I can leave to run and it discovers the limit in an automated way. For latency benchmarks, once I know the limit, I can craft benchmarks that fit inside the performance envelope for that workload on the specific version of Kafka on the specific hardware I am using. The Dimster CLI has some commands that help before running benchmarks and for post-processing. Dimster resources command The resources command calculates the network and disk throughput required to service a workload. This is important in the cloud for selecting the right instances, ensuring that baseline network and disk throughput are greater than the workload’s demands. Dimster compare command Compare different runs that were executed on different hardware, different broker configurations, different broker versions etc. Dimster pivot command You can slice and dice the data any way you want based on the CSV data. However, you can also pivot the results and generate a chart with the pivot command. This compares the Nth test point across all scenarios. Dimster is easiest to use with Kubernetes. Dimster has a CLI you use from your laptop which speaks Kubernetes and leverages it to run benchmarks on any hardware, any cloud, any laptop or workstation using the exact same orchestration logic. All it needs is a properly configured k8s cluster. It could be minikube or k3d on a laptop or workstation, or AWS EKS or Google Cloud GKE or your own in-house cluster. You can tell Dimster to deploy Apache Kafka to a stateful set in the k8s cluster: Fig 3. Dimster architecture in full deploy mode Or point Dimster (deployed to k8s) at a Kafka service or in-house Kafka cluster. When testing a Kafka service, you can provision a single powerful instance for the Dimster coordinator and worker, and deploy them to a local k8s distro such as Minikube, K3d or Kind. A single worker will happily consume all the cores and memory you give it. Fig 4. Dimster architecture in external deploy mode Or run a super-slim full setup in a tiny minikube/kind/etc local k8s distro: Fig 5. Dimster deployed in a tiny local k8s cluster The workflow is the same. If you can provide a k8s cluster, then Dimster does the rest. Deployment is really simple, monitoring, gathering results, troubleshooting is all simplified via a mix of the CLI being relatively capable, and k8s providing a well-understood platform. K8s is not obligatory , you can run dimster-core directly as a Java program, and point it at a Kafka cluster already provisioned. But you lose many features such as monitoring, live-interaction, automatic gathering of logs, automatic chart and CSV generation and so on. However, you can use the post-processing command dimster chart to generate the charts of a result JSON file. Run the Java directly via the benchmark script: ./bin/benchmark -w path/to/workload file I will be publishing a blog post regularly about Dimster and what you can do with it. So stay tuned. I invite you to go and play around with Dimster , even if it's just running benchmarks on your laptop or workstation. You can get an idea of what charts get produced, what kinds of benchmarks you can run, trying out dimensional testing etc. The docs are pretty decent and should cover most of it. It’s fully featured but still a 0.X version. Myself and a Confluent colleague are the only ones who have run it thus far, so there may be bugs you encounter, if you do encounter a problem, please open an issue with repro steps. If you want to run serious benchmarks, you’ll likely need an EKS or GKE type of Kubernetes cluster. Dimster comes with a special CLI for EKS to deploy EKS with node groups for Kafka, Dimster workers/coordinator, Grafana/Prometheus, as well as storage classes for gp3.  While evaluating consumer group vs share group consumers, I’ve been running benchmarks in k3d on my beefy Threadripper 9980X workstation with 64 cores (128 threads), 256 GB RAM and an Samsung 9100 PRO 8TB SSD, which is plenty to run an entire medium sized Kafka cluster plus workers on it. I’ll be sharing some share group benchmarks tomorrow. Happy testing! Dimensional testing Shareable, self-contained results with reproducibility in mind Benchmark prep and post-processing Kubernetes as a standardized runtime Results :  The JSON result file which contains all the test point performance results. For each test point, it includes the effective workload and client configuration. It also includes the hardware and other metadata to know what the benchmark was run against. A CSV file generated from the result JSON file (to make it easy to put in a spreadsheet or run custom visualizations). Source configs : The source workload file itself, as well as any additional files such as any dedicated client config file, the broker config file, the version of Kafka, the version of the Kafka clients, and the CPU/memory/disk given to the brokers and clients. Log files : the log files of dimster-core, the benchmarking framework, and each Kafka broker. Charts : Throughput and latency charts (clickable, zoomable) generated from the result JSON file. Dashboards : Grafana dashboards converted to interactive HTML files. Run : Fixed throughput benchmarks, plus: Live-interaction . Run-mode also supports live interaction with the user. The user can change the producer rate, number of producers and consumers, message size, etc.  Availability : Optionally measure availability (producer/consumer/aggregate) during the standard run-mode benchmark. Explore : Discover the highest sustainable throughput while staying under a target end-to-end latency and percentile. Drain-backlog : Build a backlog and time how long it takes for the consumers to drain it. Optionally set a producer rate during the drain phase, such as when testing if a cluster is big enough to drain a backlog while under normal producer load. Correctness : Detects data loss, data corruption, out-of-order delivery and duplicates.

0 views
flowtwo.io 1 weeks ago

PHP's Oddities

I've been coding in PHP at work for the last 5 years. My org's entire backend is written in PHP—a decision made in 2007 when the company first started. It's not a language I ever imagined myself using prior to working there, but life takes you in all sorts of directions you don't expect. PHP gets a bad rep in the industry, despite being a mature and commonly used language . But it's mostly based on out-of-date understanding of what PHP can do. Recent versions have caught up with most other languages in terms of features; by this point it's a pretty versatile general-purpose language. Certainly not just for serving HTML, as it was originally designed. I'm no longer working at the aforementioned company, so I'm reflecting on my experience with PHP after all these years and there's some things I've always found odd about it. And more than just odd, some of it's language features are really unintuitive and have been prone to cause bugs. This comes from personal experience and many previous headaches at work. I'll explain two of the biggest offenders in this post—in short: PHP's standard library basically only has one data structure: the . This was intentional; it was designed to be a general-purpose, flexible data structure that can cover a variety of use cases. It's technically an ordered key-value dictionary , not an array in the traditional sense . Unfortunately, with flexibility comes complexity. If you want to create a collection of fixed-size objects in an allocated memory block, you can't really do that. PHP pretends to support them, but the illusion breaks down in unexpected ways. Let's say I have a bunch of fruits. PHP let's me define a fruits "array" and I can do normal array things with it. Everything looks fine but you get into trouble whenever you perform a mutation on this "simple" array; it will be exposed as being a key-value store. When you use one of PHP's built-in functions for standard array operations like sorting or filtering, it will operate on the keys AND values of your array. If it mutates the array in-place or by a return value, the key order will likely become inconsistent. why can't I hold all these indices??? The only way to put these arrays back into a naturally indexed state is to use the function. You just have to know that, or else you end up with subtle bugs. It's just strange to me that PHP doesn't support simple collections of objects. It's annoying to have to manage these arbitrary numeric keys when all you really want is ordinal indexing like 99% of the time. It feels like a leaky abstraction. In PHP5, a native type system was added to the language. It was expanded over time and by PHP7 you could define the types for your class's properties. Although PHP is a scripting language, type declarations will help catch bugs during testing, or even during development with the use of static analysis tools like PHPStan . But PHP's type system has some quirks since it was built on an existing dynamically typed language. The rules had to be designed after the behaviour was already there. For class properties, there's a hidden uninitialized state that can pop up if you're not careful. Let's define a class with three properties: Here, I'm illustrating all the ways of declaring the type for a string property: Before PHP7, all class properties were (1): untyped. Since the type system is optional, it has to live alongside the "legacy" behaviour which has weird consequences. For example, what do you think the values of these three properties will be after we instantiate a object? Trick question! Only the untyped property will have a value, and that value is . That seems fine and is roughly in line with how I'd expect a language to use a value. But the other two properties will NOT have a value because they don't exist, or rather they could exist but haven't been initialized yet. This example exposes the "uninitialized" state that a property can be in, which is NOT the same as . This distinction frustratingly comes up when you try to do a null check on these properties: Not a warning—a FATAL error occurs if you try to access an uninitialized property. This comes up a lot in cases where you try to deserialize data into a PHP object. If a field's data isn't present you might not initialize the property at all. ahh yes, NULL...who was that by again? This lax behaviour for property definitions makes writing code around them harder. Especially when you take into account that any object can have properties dynamically added to them: So I feel like the class property type system does little to help you understand what a given object is composed of, and in some respects has made it less clear because it's introduced this new uninitialized state. As a developer, it's hard to write defensive code because you're never sure which checks to do for all these situations: , (), , ... it's not obvious which functions cover which states. I'd argue that uninitialized did not need to be a state at all. For nullable typed properties, just default them to the way untyped properties are. And for non-nullable types, require them to be be defined as constructor promoted parameters OR require a default value at declaration. Similar requirements already exist for the attribute, so it's certainly feasible for the PHP execution engine to enforce it. But there's probably some nuance or historical reason I'm missing here. Let me know in the comments if you know. Despite all the critiquing I've done in this article, I still think the amount of hate PHP gets is undeserved. Like any language, it has it's quirks and tradeoffs, but you can still accomplish any task using PHP that you could in another language. The more you know about a language, the better you can structure things to work "with the grain" and write more idiomatic code. Some things I do enjoy about PHP: Thanks for reading! Arrays are weird and overloaded The type system is clunky It's a string It's nullable string It's a scripting language, so development friction is low. Make a file change and it instantly takes effect. Laravel is a solid web framework with tons of extensible functionality. It's opinionated and definitely leans into the "auto-magical" framework style, but it was designed well so you don't mind. All the $ signs help remind you what you're doing it all for at the end of the day 🤑

0 views
Ivan Sagalaev 1 weeks ago

Shoppy

Meet Shoppy ! It's a helper app for my recently revived shopping list , with which I'm hoping to grow the dataset for categories prediction. In fact, even early beta tests have made Shoppy significantly more savvy about alcoholic drinks (the initial data comes from my own shopping, and my entire family happens to be non-drinkers). See if you can confuse it about something it doesn't know! But besides that, there's a few deeper philosophical and technical notes I wanted to share. It's a very, very simple Django app . When I first had the idea to build it I entertained some thoughts about trying some front-end based technology, because, you know, it's an "app"… But then after actually thinking about what it's going to be — a handful of static screens and a couple of forms — I decided to go the familiar way. Now I have a small, view-source 'able HTML app which I'm proud to offer as an example of how you can build something interactive without the layers of modern front-end technology. If you're new here, simplicity is kind of my thing in software engineering. Although it's really hard to convince people to do simple. Trying modern CSS after a long break felt really exciting! Nested blocks, variables, complete control over the box model, new useful units (like ), and niceties like — all of these made my life much simpler. I was especially impressed with which allowed me to make speech and form bubbles flexible. Without it, trying to make text of variable length look nice in a fixed-size bubble caused me a lot of frustration. For layout, I tried flexbox and grid, but they didn't really work for me. It's my own fault, really. You see, ever since I bought into the idea of separating the roles of markup and style, I dislike adding extra structure to markup purely for styling convenience. Markup needs to mean something! And the one thing that grids and flexboxes really like is having straightforward container s with stuff inside of them. But what I have is a which consists of naked , , and , in this order — and that's just not enough structure to say "this goes here, and that goes there". So I ended up with good old absolute positioning and some paddings around Shoppy's avatar. CSS variables really do shine for things like this. And! It was my first time making a responsive layout that looks nice both on mobile and desktop! Tell me if something is broken on your particular setup. The model is a mapping from "terms" to categories . I learned to build such things while working on the Search team at Shutterstock, and their simplicity still amazes me! Here's how it works: You get a search query, like "Honeycrisp apples". You split it into words, stem them and sort them, which gives you — a predictable set of keys independent of morphology and the input order (they're called unigrams). Then you generate all two-word combinations (called bigrams) from this set, which in this case gives you just , and add them to unigrams. And then you look up each of the search terms in the dataset and pick the entry that comes the earliest. In this case, there's only one: . But there's a few non-obvious tricks it lets you do: You don't need to list all the apple varieties, unknown words are simply ignored, and you just recognize any apple as produce. But what of "apple juice"? For that it has an entry , which is deliberately placed before the apples, so it gets picked up instead. In fact, what it means is that "any kind of juice is a drink, regardless of what it's made of". Same goes for "oat milk " (drink), " diced tomatoes" (canned products), etc. Now think of "apple sauce". "Apple" is produce, "sauce" is (usually) a condiment. But "apple sauce" is a snack! This is where bigrams come into play: the bigram entry comes before both and , which resolves the conundrum. (In fact, all of the bigrams must come before all the unigrams, because they're always more specific.) There's some more to it all, and there are downsides, but I won't go any deeper right now. It's 2026, so I can't not talk about it, can I? Generative AI happened to the world right in between of me first coming up with the idea of category prediction and having a chance to actually implement it. And I admit of having thoughts that may be there's no point in building your own model for such a thing now. After all, just ask any LLM "which grocery category is dill weed" and it will tell you… a lot of text with several variants, which you can't really use in a precise manner :-) So of course I went back to my own idea, because it's much, much simpler. And local. And free. And ethical. Luckily, the simpler solution doesn't really lose on feeling magical and intelligent. I've seen people play with the app and really engage with it, and be impressed! One of the testers, when trying to come up with a random grocery item for the first time, said, "There's probably a million of them!" It doesn't matter that my entire model is just around 500 entries, it still feels like it knows much more simply because people overestimate the size of the problem :-) You see, I can process photos, I can do business graphics, and I'm known to have put together a few toolbar icons in my time… but for the life of me I can't draw! And even if I could, I'm particularly hopeless at coming up with what to draw. So I commissioned the graphics from an artist , who also introduced me to the concept of "object shows" and the whole OSC fandom . Not sure I'm joining as a fan yet, but I'm definitely very happy with the original character of Shoppy! Oh, and the background. You get a search query, like "Honeycrisp apples". You split it into words, stem them and sort them, which gives you — a predictable set of keys independent of morphology and the input order (they're called unigrams). Then you generate all two-word combinations (called bigrams) from this set, which in this case gives you just , and add them to unigrams. And then you look up each of the search terms in the dataset and pick the entry that comes the earliest. In this case, there's only one: . You don't need to list all the apple varieties, unknown words are simply ignored, and you just recognize any apple as produce. But what of "apple juice"? For that it has an entry , which is deliberately placed before the apples, so it gets picked up instead. In fact, what it means is that "any kind of juice is a drink, regardless of what it's made of". Same goes for "oat milk " (drink), " diced tomatoes" (canned products), etc. Now think of "apple sauce". "Apple" is produce, "sauce" is (usually) a condiment. But "apple sauce" is a snack! This is where bigrams come into play: the bigram entry comes before both and , which resolves the conundrum. (In fact, all of the bigrams must come before all the unigrams, because they're always more specific.)

0 views
Ahead of AI 2 weeks ago

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

After a short family break, I am excited to be back and catching up on a busy few weeks of open-weight LLM releases. The thing that stood out to me is how much newer architectures are focused on long-context efficiency. As reasoning models and agent workflows keep more tokens around (for longer), KV-cache size, memory traffic, and attention cost quickly become the main constraints, and LLM developers are adding a growing number of architecture tricks to reduce those costs. The main examples I want to look at are KV sharing and per-layer embeddings in Gemma 4, layer-wise attention budgeting in Laguna XS.2, compressed convolutional attention in ZAYA1-8B, and mHC plus compressed attention in DeepSeek V4. Most of these changes look like small tweaks in my architecture diagrams, but some of them are quite intricate design changes that are worth a more detailed discussion. Figure 1. LLM architecture drawings of recent, major open-weight releases (April to May). You can find the images, and more details, in my LLM architecture gallery . Not all model sizes are shown; Qwen3.6 includes the 27B and 35B-A3B variants, and ZAYA1 is represented by the 8B model (omitting ZAYA1-base and ZAYA1-reasoning-base). The architectures in the dotted boxes are covered in more detail in this article. Note that this article is about architecture designs, so I will mostly skip dataset mixtures, training schedules, post-training details, RL recipes, benchmark tables, and product comparisons. Even with that narrower scope, there is a lot to cover. And, like always, the article turned out longer than I expected, so I will keep the focus on what changes inside the transformer block, residual stream, KV cache, or attention computation. Please also note that I am only covering those topics that are interesting (new) design choices and that I haven’t covered elsewhere, yet. This list includes: KV sharing and per-layer embeddings in Gemma 4 Compressed convolutional attention in ZAYA1 Attention budgeting in Laguna XS.2 mHC and compressed attention in DeepSeek V4 Before getting into the new parts, here are the two previous articles I will refer back to. The first one gives a broader architecture background on recent MoE models, routed experts, active parameters, and model-size comparisons. The second one covers the attention background that comes up repeatedly below, including MHA, MQA, GQA, MLA, sliding-window attention, sparse attention, and hybrid attention designs. I also turned several of these explanations into short, standalone tutorial pages in the LLM Architecture Gallery . For example, readers can find compact explainers for GQA, MLA, sliding-window attention, DeepSeek Sparse Attention, MoE routing, and other concepts linked from the corresponding model cards and concept labels. For this tour of architecture advances and tweaks, we will go back to the beginning of April when Google released their new open-weight Gemma 4 suite of models. They come in 3 broad categories: the Gemma 4 E2B and E4B models for mobile and small, local (embedded) devices (aka IoT), the Gemma 4 26B mixture-of-experts (MoE) model, optimized for efficient local inference, and the Gemma 4 31B dense model, for maximum quality and more convenient post-training (since MoEs are trickier to work with) Figure 2: Gemma 4 architecture drawings. The first small architecture tweak in the E2B and E4B variants is that they adopt a shared KV cache scheme, where later layers reuse key-value states from earlier layers to reduce long-context memory and compute. This KV-sharing was not invented by Gemma 4. For instance, see Brandon et al. , “ Reducing Transformer Key-Value Cache Size with Cross-Layer Attention ” (NeurIPS 2024). But it’s the first popular architecture where I saw this concept applied. (Cross-layer attention is not to be confused with cross-attention .) Before explaining KV-sharing further, let’s briefly talk about the motivation. As I wrote and talked about in recent months, one of the main recent themes in LLM architecture design is KV cache size reduction. In turn, the motivation behind KV cache size reduction is to reduce the required memory, which allows us to work with longer contexts, which is especially relevant in the age of reasoning models and agents. For more background on KV caching, see my “Understanding and Coding the KV Cache in LLMs from Scratch” article: Practically all of the popular attention variants I described in my previous A Visual Guide to Attention Variants in Modern LLMs article are designed to reduce the KV cache size: To pick a classic example (that Gemma 4 still uses): Grouped Query Attention (GQA) already shares key-value (KV) heads across different query heads to reduce the KV cache size, as illustrated in the figure below. Figure 3: Grouped Query Attention (GQA) shares the same key (K) and value (V) heads among multiple query (Q) heads. As mentioned before, Gemma 4 uses GQA. However, in addition to the KV sharing among queries as part of GQA, Gemma 4 also shares KV projections across different layers instead of computing it as part of the attention module in each layer. This KV-sharing scheme, also called cross-layer attention, is illustrated in the figure below. Figure 4: Regular transformer blocks compute separate Q, K, and V projections in each attention module (left). Cross-layer attention designs (right) share the same K and V projections across multiple layers. As briefly hinted at in the architecture overview in Figure 2, Gemma 4 E2B uses regular GQA and sliding window attention in a 4:1 pattern. (More precisely, Gemma 4 E2B uses MQA, which is the one-KV-head special case of GQA). In the case of GQA (or MQA), the KV-sharing works like this. Later layers no longer compute their own key and value projections but reuse the KV tensors from the most recent earlier non-shared layer of the same attention type. In other words, sliding-window layers share KV with a previous sliding-window layer. Full-attention layers share KV with a previous full-attention layer. The layers still compute their own query projections, so each layer can form its own attention pattern, but the expensive and memory-heavy KV cache is reused across several layers. For example, Gemma 4 E2B has 35 transformer layers, but only the first 15 compute their own KV projections; the final 20 layers reuse KV tensors from the most recent earlier non-shared layer of the same attention type. Similarly, Gemma 4 E4B has 42 layers, with 24 layers computing their own KV and the final 18 layers sharing them. How much does this actually save? Since we share roughly half of the KVs across layers, we save approximately half of the KV cache size. For the smallest E2B model, this results in a 2.7 GB saving (at bfloat16 precision) in long 128K contexts, as shown below. (For the E4B variant, this saves about 6 GB at 128K.) Figure 5: KV cache memory savings from GQA and cross-layer KV sharing in a Gemma 4 E2B-like setup. For simplicity, additional savings from sliding window attention are not shown. The downside of KV-sharing is, of course, that it’s an “approximation” of the real thing. Or, more precisely, it reduces model capacity. However, according to the cross-layer attention paper, the impact can be minimal (for small models that were tested). The Gemma 4 E2B and E4B variants include a second efficiency-oriented design choice called per-layer embeddings (PLE). This is separate from the KV-sharing scheme above. KV sharing reduces the KV cache. PLE is instead about parameter efficiency, where it lets the small Gemma 4 models use more token-specific information without making the main transformer stack as expensive as a dense model with the same total parameter count. For instance, the “E” in Gemma 4 E2B and E4B stands for “effective”. Concretely, Gemma 4 E2B is listed as 2.3B effective parameters, or 5.1B parameters when the embeddings are counted. (Similarly, Gemma 4 E4B is listed as 4.5B effective parameters, or 8B parameters with embeddings). In short, in the “E” models, the main transformer-stack compute is closer to the smaller number, while the larger number includes the additional embedding-table layers. (For an illustration of how embedding layers work, see my “ Understanding the Difference Between Embedding Layers and Linear Layers ” code notebook.) Conceptually, the new PLE path looks like this: Figure 6: Simplified Gemma 4 block with the PLE residual path. The normal block first computes the attention and feed-forward residual updates. The resulting hidden state gates the layer-specific PLE vector, and the projected PLE update is added as an extra residual update at the end of the block. The PLE vectors themselves are prepared outside the repeated transformer blocks. In simplified form, there are two inputs to the PLE construction. First, the token IDs go through a per-layer embedding lookup. Second, the normal token embeddings go through a linear projection into the same packed PLE space. These two pieces are added, scaled, and reshaped into a tensor with one slice per layer. Note that each block then receives its own slice. Figure 7: Simplified PLE construction. The token IDs provide a per-layer embedding lookup, while the normal token embeddings are projected into the same space. The two contributions are combined and reshaped so that each transformer block receives its own layer-specific PLE slice. The important detail is that PLE does not give each transformer block a full independent copy of the normal token embedding layer. Instead, the per-layer embedding lookup is computed once. Then, as mentioned before, it gives each layer a small token-specific embedding slice (via “reshape / select layer l”. So, for each input token, Gemma 4 prepares a packed PLE tensor that contains one small vector per decoder layer. Then, during the forward pass, layer l receives only its own slice (ple_l in the Gemma4WithPLEBlock in figure 6). Inside the transformer block, the regular attention and feed-forward branches run as usual. First, the block computes the attention residual update. Then it computes the feed-forward residual update. After that second residual add, the resulting hidden state, which I denoted as z in the pseudocode in figure 6, is used to gate the layer-specific PLE vector. The gated PLE vector is projected back to the model hidden size, normalized, and added as one extra residual update. So the useful mental model is that the transformer block still has the same main attention and feed-forward path, but Gemma 4 adds a small layer-specific token vector after the feed-forward branch. This increases representational capacity through embedding parameters and small projections. This adds computational overhead but avoids the cost of scaling the entire transformer stack to the larger parameter count. But why PLEs? The simpler alternative would be to make the dense model smaller, using fewer layers, narrower hidden states, or smaller feed-forward networks. That would reduce memory and latency, but it also removes capacity from the parts of the model that do the main computation. The PLE design keeps the expensive transformer blocks closer to the smaller “effective” size, while storing additional capacity in per-layer embedding tables. These are much cheaper to use than adding more attention or FFN weights, since they are mainly lookup-style parameters that can be cached. Also, we have to take Google’s word here that this is an effective and worthwhile design choice. It would be interesting to see some comparison studies to see how this E2B design compares to a regular Gemma 4 2.3B model and a regular Gemma 4 5.1B model. Also, in principle, PLE is not inherently limited to small models. We could attach per-layer embedding slices to larger models, too. However, larger models already have sufficient capacity where these extra embeddings may not help that much. Also, for larger models, we already use MoE designs as a trick to increase capacity while keeping the compute footprint smaller. By the way, if you are interested in a relatively simple and readable code implementation, I implemented the Gemma 4 E2B and E4B models from scratch here . Figure 8: Snapshot of my Gemma 4 from-scratch implementation . Laguna is the first open-weight model by Poolside , a Europe-based company focused on training LLMs for coding applications. Several of my former colleagues joined Poolside in recent years, and they have a great team with lots of talent. It’s just nice to see more companies also releasing some of their models as open-weight variants. Anyways, the Laguna XS.2 architecture depicted below looks very standard at first glance. However, one detail that I didn’t show (/try to cram into there) is a concept we can refer to as “Layer-wise attention budgeting”. Figure 9: Poolside’s Laguna XS.2 architecture. Part of the idea behind the attention budgeting here is that instead of giving every transformer layer the same full attention budget, Laguna XS.2 varies the attention cost by layer. It has 40 layers total, with 30 sliding-window attention layers and 10 global/full attention layers. As usual, the sliding-window layers only attend over a local window (here: 512 tokens), which keeps the KV cache and attention computation cheaper. The global layers are more expensive but preserve the ability to access all information in the context window. This mixed sliding-window + global/full attention pattern is not unique to Laguna XS.2 and is used by many other architectures (including Gemma 4). But what’s new is the use of per-layer query-head counts. For instance, the Hugging Face model hub config.json includes a setting, so layers can have different numbers of query heads while keeping the KV cache shape compatible. Figure 10: Per-layer query-head budgeting in Laguna, where full attention layers use 6 query heads per KV head, and sliding window attention layers use 8 query heads per KV head. So Laguna XS.2 gives more query heads to sliding-window layers and fewer query heads to global layers, while keeping the KV heads fixed at 8. That is the actual layer-wise head budgeting in the config. Laguna XS.2 is one of the most prominent recent examples of this per-layer query-head budgeting in a production-style open model. But the broader idea of varying model capacity by layer goes back to (at least) Apple’s 2024 OpenELM . And again, what’s the point of such a design? Similar to KV-sharing, the point is to spend attention capacity where it is most useful, instead of giving every layer the same budget. Specifically, full-attention layers are expensive because they look across the whole context, so Laguna gives them fewer query heads compared to sliding window attention modules. (Besides, another smaller implementation detail is that Laguna also applies per-head attention-output gating; this is somewhat similar to Qwen3-Next and others, which I also omit here since I covered it in earlier articles.) Similar to Laguna, ZAYA1-8B is another new player on the open-weight market. It is developed by Zyphra , and one of the interesting details around the release is that the model was trained on AMD GPUs rather than the more common NVIDIA GPU (or Google TPU) setup. The main architecture detail, though, is Compressed Convolutional Attention (CCA), used together with grouped-query attention. Unlike MLA-style designs that mainly use a latent representation as a compact KV cache format, CCA performs the attention operation directly in the compressed latent space, but more on that later. (Sidenote: the ZAYA1-8B config.json lists 80 alternating layer entries rather than 40 conventional transformer blocks. These entries alternate between CCA/GQA attention and MoE feed-forward layers. But for the architecture figure, it is more convenient to visualize this as 40 repeated attention + MoE pairs, which is conceptually equivalent.) Figure 11: Zaya1 (8B) with transformer blocks featuring compressed convolutional attention. As hinted at in the figure above, ZAYA1-8B uses Compressed Convolutional Attention (CCA) together with a 4:1 GQA layout. The key point is that its attention block is built around CCA rather than a standard sliding-window attention block. What is Compressed Convolutional Attention? I would say CCA is related in spirit to Multi-head Latent Attention (MLA) in DeepSeek’s models, since both introduce a compressed latent representation into the attention block. However, they use that latent space differently. MLA mainly uses the latent representation to reduce the KV cache. In MLA, the KV tensors are stored compactly and then projected into the attention-head space for the actual attention computation. Figure 12: Regular Multi-head Attention (MHA) and Multi-head Latent (MLA) attention side by side. CCA compresses Q, K, and V and performs the attention operation directly in the compressed latent space. This is why CCA can reduce not only KV cache size, but also attention FLOPs during prefill and training. Figure 13: Multi-head Latent Attention (MLA) and Compressed Convolutional Attention (CCA) side by side. As Figure 13 above illustrates, in CCA, the compressed, latent representations enter the attention mechanism directly, and the resulting compressed attention vector is then up-projected. Note that this is called Compressed Convolutional Attention, not just Compressed Attention, since there is an additional convolutional mixing happening on the latent K and Q representations. The convolutional mixing part is not shown in Figure 12, because it would have been too crammed, but it’s relatively straightforward. As hinted at in Figure 12, the convolutional mixing happens directly on the compressed Q and K tensors. The point is that compression makes Q, K, and V narrower, which saves compute and cache, but it can also make attention less expressive. The convolutions are a cheap way to give the compressed Q and K vectors more local context before they are used to compute attention scores. (The convolutional mixing is only applied to Q and K, not V, because Q and K determine the attention scores, while V represents the content that gets averaged via these scores). Figure 14: conceptual overview of the sequence-mixing convolution Next to the sequence mixing shown in Figure 13, there is also a channel mixing component. It’s in principle similar though, so I am omitting the illustration. CCA appears to be a Zyphra-introduced attention mechanism that predates the ZAYA1-8B technical report . The standalone CCA paper, Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space , was first posted in October 2025 and explicitly introduces CCA. ZAYA1-8B then uses this mechanism as one of the core pieces. But the question is, “is it better than MLA”? According to the CCA paper’s own experiments, yes, they report CCA outperforming MLA under comparable compression settings. Figure 15: Annotated figures from the CCA paper, https://arxiv.org/abs/2510.04476 . Overall, the interesting part here is really the new attention mechanism. The model also uses a pretty extreme (= very sparse) MoE setup, with only one routed expert active per token, but that part is more familiar. CCA is more unusual because it performs the attention operation directly in a compressed latent space, and then uses convolutional mixing on the compressed Q and K representations to make this compressed attention less limiting. So, in short, ZAYA1-8B is not only trying to save compute in the feed-forward layers, but also in the attention mechanism itself. DeepSeek V4 was the biggest release of the year so far, both in terms of hype and model size. Interestingly, DeepSeek V4-Pro is also the most parameter-sparse MoE among the models in the table below, measured by active-parameter share, as summarized in the table below. Figure 16: Percent active parameter plot for MoE models. You can also find an HTML version at https://sebastianraschka.com/llm-architecture-gallery/active-parameter-ratio/ . Caveat: active parameter share is only one lens. It does not capture KV cache size, attention pattern, context length, routing overhead, hardware efficiency, or training quality. But it is a helpful, quick check when comparing sparse models. There’s a lot to say about DeepSeek V4, but since it’s been all over the news already, and to stay on topic regarding architecture tweaks, I will focus on the two most relevant parts that are new compared to previous architectures: mHC for a wider residual pathway, CSA/HCA for long-context attention compression and sparsity Looking at the DeepSeek V4 architecture drawing below, there seems to be a lot going on. The useful way to read it is to separate the residual-path change, mHC, from the attention-path changes, CSA/HCA, and compressed attention caches. Figure 17: DeepSeek V4-Pro architecture overview. Let’s start with the mHC component of DeepSeek V4. This goes back to a research paper that the DeepSeek team shared last year (31 Dec 2025, mHC: Manifold-Constrained Hyper-Connections ). However, in this paper, the technique was only tested on an experimental 27B scale model. Now, we see it in their flagship release, which is a good sign that this idea actually works well in production. The main idea behind mHC here is to modernize the design of the residual connections inside the transformer block, which is refreshing, because architecture tweaks are usually focused on the attention mechanism, normalization layer placement, and MoE parts. Now, mHC is based on previous work on hyper-connections (see Hyper-connections by Zhu et al., 2024), which we should briefly discuss first. Hyper-connections essentially modify the single residual stream inside the transformer block by replacing it with several parallel residual streams and learned mappings between them. (For those new to residual connections, I made a video on residual neural networks many years ago, where I explained the general mechanism.) The idea behind hyper-connections is to widen the residual stream. We can think of this as keeping several parallel residual streams, with an additional Res Mapping linear transformation that mixes them across layers. Since the Attention or MoE layer itself still operates on the normal hidden size, hyper-connections also add a Pre Mapping that combines the parallel residual streams into one normal hidden vector for the layer, and a Post Mapping that distributes the layer output back across the parallel residual streams. This is visually summarized in the figure below. Figure 18: Regular transformer block (top) vs transformer block with hyper-connections (bottom) using annotated figures from the mHC paper, https://arxiv.org/abs/2512.24880 . The figure below focuses on the attention-layer portion of the transformer block, but the same concept applies to the second residual branch around the MoE layer. The purpose of hyper-connections is to make the residual pathway more expressive without making the actual Attention or MoE layer wider. This is only mildly more expensive in FLOPs because the extra mappings operate over the small residual-stream axis, for example, n = 4 in DeepSeek V4, not over a huge hidden dimension. In the original hyper-connections paper, the 7B OLMo MoE experiment goes from 13.36G to 13.38G FLOPs per token, which is basically unchanged. In terms of reported gains, there were modest (but consistent) improvements, as shown in the figure below. (However, only looking at FLOPs is a bit simplistic. The widened residual state still has to be stored, moved through memory, mixed, etc. So the practical overhead can come more from memory traffic and implementation complexity than from arithmetic, which is not explicitly measured. However, given that DeepSeek V4 is all about efficiency, it seems to be a worthwhile addition.) Figure 19: Hyper-connections performance versus baseline, using an annotated figure from the hyper-connections paper, https://arxiv.org/abs/2409.19606 . Also, as shown in the figure above, metrics reached the baseline’s performance using roughly half the training tokens. The main change from regular hyper-connections (HC) to manifold-constrained hyper-connections (mHC) is that the mappings are no longer left unconstrained. In regular HC, the Res Mapping is a learned matrix that mixes the parallel residual streams, but stacking many such matrices can amplify or shrink signals unpredictably. In mHC, this residual mapping is projected onto the manifold of doubly stochastic matrices, meaning all entries are non-negative and each row and column sums to 1. This makes the residual mixing behave more like a stable redistribution of information across streams. The Pre Mapping and Post Mapping are also constrained to be non-negative and bounded, which avoids cancellation when reading from and writing back into the widened residual state. In short, mHC keeps the richer residual mixing of HC, but adds constraints so it scales more safely, which becomes more relevant for larger (deeper) models. Otherwise, the main idea of using parallel residual streams remains, as shown in the figure below. Figure 20: Transformer block with hyper-connections (HC) and manifold-constrained hyper-connections (mHC) using annotated figures from the mHC paper, https://arxiv.org/abs/2512.24880 . In the mHC paper, using a 27B parameter model for the experiments, the DeepSeek team’s optimized implementation (with fusion, recomputation, and pipeline scheduling) adds only 6.7% additional training time overhead for 4 residual streams (n = 4) throughout all transformer blocks compared to the single-stream baseline. To sum up this section, HC/mHC changes how information is carried around these layers by replacing the single residual stream with several interacting residual streams, with the additional stability constraints added in mHC, while adding minimal compute overhead. Also, it pairs well with the CSA/HCA attention changes, which modify other parts of the transformer block, which I will discuss below. The other major DeepSeek V4 architecture change is on the attention side. Again, the motivation is that at very long context lengths, attention becomes expensive not only because of the attention score computation, but also because the KV cache grows with the sequence length. DeepSeek V4 addresses this issue with a hybrid of two compressed-attention mechanisms, Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). For a refresher, I recommend checking out my previous “ A Visual Guide to Attention Variants in Modern LLMs ” article, which covers Multi-head Latent Attention (MLA) and DeepSeek Sparse Attention (DSA), among others. The first thing to note is that CSA/HCA in DeepSeek V4 is a different kind of compression than the MLA-style compression used in DeepSeek V2/V3. Where MLA mainly compresses the per-token KV representation, CSA and HCA compress along the sequence dimension. So, instead of keeping one full (or compressed) KV entry for every previous token, they summarize groups of tokens into fewer compressed KV entries. Consequently, the cache gets shorter. DeepSeek V4 also uses compact compressed entries and shared-KV attention, but the main distinction from MLA is the sequence-length compression. This is illustrated in the figure below. Figure 21: Conceptual comparison of MLA-style per-token latent caching, CSA, and HCA. MLA compresses the stored KV representation but keeps one latent entry per token. CSA shortens the sequence more mildly with m=4 and sparse top-k selection, while HCA uses much heavier sequence compression with m’=128 and dense attention over the shorter cache. The quality tradeoff for CSA/HCA is also different from MLA. As shown in the figure above, MLA compresses the representation stored for each token, but it still keeps one latent KV entry per token. CSA and especially HCA go further by reducing the number of sequence entries themselves, so the model gives up some token-level info in exchange for much lower long-context cost. Again, it’s all about reducing long-context cost, but this trade-off can hurt modeling quality if the compression is too strong, which is why DeepSeek V4 does not rely on one compression scheme alone but alternates between CSA and HCA. CSA uses a milder compression rate and a DeepSeek Sparse Attention (DSA)-style selector, HCA uses much heavier compression for cheaper global coverage, and both keep a local sliding-window branch for recent uncompressed tokens. This sparse selection in CSA builds on DeepSeek Sparse Attention (DSA), which I discussed in more detail in my earlier DeepSeek V3.2 write-up . HCA is the more aggressive variant of the two. It compresses every 128 tokens into one compressed KV entry, but then uses dense attention over those heavily compressed entries. In other words, CSA keeps more details but uses sparse selection, while HCA keeps far fewer entries and can afford dense attention over them, as illustrated in the figure below. This makes the two mechanisms somewhat complementary, which is why DeepSeek V4 interleaves CSA and HCA layers rather than using only one of them. Figure 22: CSA selects a sparse set of compressed history blocks, while HCA attends densely over more heavily compressed blocks. Both paths also include recent uncompressed KV entries through a 128-token sliding-window branch. The DeepSeek V4 paper reports that, at a 1M-token context length, DeepSeek V4-Pro uses only 27% of the single-token inference FLOPs and 10% of the KV cache size compared with DeepSeek V3.2, which uses MLA and DeepSeek Sparse Attention (DSA). DeepSeek V4-Flash is even smaller, at 10% of the FLOPs and 7% of the KV cache size relative to DeepSeek V3.2. Figure 23. Reported 1M-context efficiency numbers from the DeepSeek V4 paper, relative to DeepSeek V3.2. By the way, I would not describe CSA/HCA as “better” than MLA in a general sense. CSA/HCA is a more aggressive long-context design. And it’s also more complicated for sure. Unfortunately, there is no ablation study in the paper. But overall, the paper reports strong overall modeling results, including DeepSeek V4-Flash-Base outperforming DeepSeek V3.2-Base on a majority of base-model benchmarks and strong 1M-token retrieval results, but these results are for the full DeepSeek V4 recipe, which also includes better data, Muon-based optimization, mHC, precision/storage optimizations, and training/inference-system changes. Personally, for now, I would treat CSA/HCA as an efficiency-focused long-context design that appears to preserve modeling quality well in their large flagship model(s) but not necessarily universally better than MLA. Overall, the interesting pattern this year is that most new open-weight models try to make long-context inference cheaper without just shrinking the model in terms of total parameters. For instance, Gemma 4 reduces KV-cache memory with cross-layer KV sharing and adds capacity via per-layer embeddings. Laguna XS.2 tweaks how much attention capacity each layer gets. ZAYA1-8B moves attention into a compressed latent space. DeepSeek V4 adds constrained residual-stream mixing and compressed long-context attention. All of these tweaks add more complexity, which seems to be where LLM architecture is going right now. My main takeaway is that the transformer block is still changing, but in fairly targeted ways. The basic recipe is still based on the original GPT decoder-only transformer architecture, but many parts are upgraded or replaced, and they get more specialized for longer contexts and more efficient inference, whereas the qualitative modeling performance seems largely driven by data quality (and quantity) and training recipes. The question many of you asked me in the past is centered on when (or if) transformers are being replaced with something else. Of course, there are other designs like diffusion models, but transformers remain the status quo for state-of-the-art architecture releases. However, with each increasing yearly release quarter, we get more and more tweaks. While it was possible to implement a basic transformer block in perhaps 50-100 lines of PyTorch code, these tweaks (esp. around the attention variants) probably 10x the code complexity. This is not an inherently bad thing as these tweaks reduce (not increase) runtime costs. However, it’s becoming increasingly difficult to gain a clear understanding of the individual components and their interactions. Figure 24: The evolution from GPT-2 (2019) to DeepSeek V4-Pro (2026) For instance, I am fairly certain that someone who is diving into LLM architectures for the first time will be totally overwhelmed when seeing the DeepSeek V4 source code. However, by starting with the original decoder-style LLM (GPT/GPT-2) and then gradually adding / learning about these new components one at a time, we can keep the learning effort manageable. The moral of the story, I guess, is to keep learning, one architecture at a time :). By the way, I am very excited to share that I finished writing Build A Reasoning Model (From Scratch) and all chapters are in early access now. The publisher and I worked hard on the final layouts in the past month, and it’s going to be send to the printer this week. (Good news: the print version will be in color this time!) This is probably my most ambitious book so far. I spent about 1.5 years writing it, and a large number of experiments went into it. It is also probably the book I worked hardest on in terms of time, effort, and polish, and I hope you’ll enjoy it. Build a Reasoning Model (From Scratch) on Manning and Amazon . The main topics are evaluating reasoning models inference-time scaling self-refinement reinforcement learning distillation There is a lot of discussion around “reasoning” in LLMs, and I think the best way to understand what it really means in the context of LLMs is to implement one from scratch! Amazon (pre-order of Kindle ebook and print paperback) Manning (complete book in early access , pre-final layout, 528 pages) Figure 1. LLM architecture drawings of recent, major open-weight releases (April to May). You can find the images, and more details, in my LLM architecture gallery . Not all model sizes are shown; Qwen3.6 includes the 27B and 35B-A3B variants, and ZAYA1 is represented by the 8B model (omitting ZAYA1-base and ZAYA1-reasoning-base). The architectures in the dotted boxes are covered in more detail in this article. Note that this article is about architecture designs, so I will mostly skip dataset mixtures, training schedules, post-training details, RL recipes, benchmark tables, and product comparisons. Even with that narrower scope, there is a lot to cover. And, like always, the article turned out longer than I expected, so I will keep the focus on what changes inside the transformer block, residual stream, KV cache, or attention computation. Please also note that I am only covering those topics that are interesting (new) design choices and that I haven’t covered elsewhere, yet. This list includes: KV sharing and per-layer embeddings in Gemma 4 Compressed convolutional attention in ZAYA1 Attention budgeting in Laguna XS.2 mHC and compressed attention in DeepSeek V4 the Gemma 4 E2B and E4B models for mobile and small, local (embedded) devices (aka IoT), the Gemma 4 26B mixture-of-experts (MoE) model, optimized for efficient local inference, and the Gemma 4 31B dense model, for maximum quality and more convenient post-training (since MoEs are trickier to work with) Figure 2: Gemma 4 architecture drawings. The first small architecture tweak in the E2B and E4B variants is that they adopt a shared KV cache scheme, where later layers reuse key-value states from earlier layers to reduce long-context memory and compute. This KV-sharing was not invented by Gemma 4. For instance, see Brandon et al. , “ Reducing Transformer Key-Value Cache Size with Cross-Layer Attention ” (NeurIPS 2024). But it’s the first popular architecture where I saw this concept applied. (Cross-layer attention is not to be confused with cross-attention .) Before explaining KV-sharing further, let’s briefly talk about the motivation. As I wrote and talked about in recent months, one of the main recent themes in LLM architecture design is KV cache size reduction. In turn, the motivation behind KV cache size reduction is to reduce the required memory, which allows us to work with longer contexts, which is especially relevant in the age of reasoning models and agents. For more background on KV caching, see my “Understanding and Coding the KV Cache in LLMs from Scratch” article: Practically all of the popular attention variants I described in my previous A Visual Guide to Attention Variants in Modern LLMs article are designed to reduce the KV cache size: To pick a classic example (that Gemma 4 still uses): Grouped Query Attention (GQA) already shares key-value (KV) heads across different query heads to reduce the KV cache size, as illustrated in the figure below. Figure 3: Grouped Query Attention (GQA) shares the same key (K) and value (V) heads among multiple query (Q) heads. As mentioned before, Gemma 4 uses GQA. However, in addition to the KV sharing among queries as part of GQA, Gemma 4 also shares KV projections across different layers instead of computing it as part of the attention module in each layer. This KV-sharing scheme, also called cross-layer attention, is illustrated in the figure below. Figure 4: Regular transformer blocks compute separate Q, K, and V projections in each attention module (left). Cross-layer attention designs (right) share the same K and V projections across multiple layers. As briefly hinted at in the architecture overview in Figure 2, Gemma 4 E2B uses regular GQA and sliding window attention in a 4:1 pattern. (More precisely, Gemma 4 E2B uses MQA, which is the one-KV-head special case of GQA). In the case of GQA (or MQA), the KV-sharing works like this. Later layers no longer compute their own key and value projections but reuse the KV tensors from the most recent earlier non-shared layer of the same attention type. In other words, sliding-window layers share KV with a previous sliding-window layer. Full-attention layers share KV with a previous full-attention layer. The layers still compute their own query projections, so each layer can form its own attention pattern, but the expensive and memory-heavy KV cache is reused across several layers. For example, Gemma 4 E2B has 35 transformer layers, but only the first 15 compute their own KV projections; the final 20 layers reuse KV tensors from the most recent earlier non-shared layer of the same attention type. Similarly, Gemma 4 E4B has 42 layers, with 24 layers computing their own KV and the final 18 layers sharing them. How much does this actually save? Since we share roughly half of the KVs across layers, we save approximately half of the KV cache size. For the smallest E2B model, this results in a 2.7 GB saving (at bfloat16 precision) in long 128K contexts, as shown below. (For the E4B variant, this saves about 6 GB at 128K.) Figure 5: KV cache memory savings from GQA and cross-layer KV sharing in a Gemma 4 E2B-like setup. For simplicity, additional savings from sliding window attention are not shown. The downside of KV-sharing is, of course, that it’s an “approximation” of the real thing. Or, more precisely, it reduces model capacity. However, according to the cross-layer attention paper, the impact can be minimal (for small models that were tested). 2. Per-Layer Embeddings and “Effective” Size (Gemma 4 E2B/E4B) The Gemma 4 E2B and E4B variants include a second efficiency-oriented design choice called per-layer embeddings (PLE). This is separate from the KV-sharing scheme above. KV sharing reduces the KV cache. PLE is instead about parameter efficiency, where it lets the small Gemma 4 models use more token-specific information without making the main transformer stack as expensive as a dense model with the same total parameter count. For instance, the “E” in Gemma 4 E2B and E4B stands for “effective”. Concretely, Gemma 4 E2B is listed as 2.3B effective parameters, or 5.1B parameters when the embeddings are counted. (Similarly, Gemma 4 E4B is listed as 4.5B effective parameters, or 8B parameters with embeddings). In short, in the “E” models, the main transformer-stack compute is closer to the smaller number, while the larger number includes the additional embedding-table layers. (For an illustration of how embedding layers work, see my “ Understanding the Difference Between Embedding Layers and Linear Layers ” code notebook.) Conceptually, the new PLE path looks like this: Figure 6: Simplified Gemma 4 block with the PLE residual path. The normal block first computes the attention and feed-forward residual updates. The resulting hidden state gates the layer-specific PLE vector, and the projected PLE update is added as an extra residual update at the end of the block. The PLE vectors themselves are prepared outside the repeated transformer blocks. In simplified form, there are two inputs to the PLE construction. First, the token IDs go through a per-layer embedding lookup. Second, the normal token embeddings go through a linear projection into the same packed PLE space. These two pieces are added, scaled, and reshaped into a tensor with one slice per layer. Note that each block then receives its own slice. Figure 7: Simplified PLE construction. The token IDs provide a per-layer embedding lookup, while the normal token embeddings are projected into the same space. The two contributions are combined and reshaped so that each transformer block receives its own layer-specific PLE slice. The important detail is that PLE does not give each transformer block a full independent copy of the normal token embedding layer. Instead, the per-layer embedding lookup is computed once. Then, as mentioned before, it gives each layer a small token-specific embedding slice (via “reshape / select layer l”. So, for each input token, Gemma 4 prepares a packed PLE tensor that contains one small vector per decoder layer. Then, during the forward pass, layer l receives only its own slice (ple_l in the Gemma4WithPLEBlock in figure 6). Inside the transformer block, the regular attention and feed-forward branches run as usual. First, the block computes the attention residual update. Then it computes the feed-forward residual update. After that second residual add, the resulting hidden state, which I denoted as z in the pseudocode in figure 6, is used to gate the layer-specific PLE vector. The gated PLE vector is projected back to the model hidden size, normalized, and added as one extra residual update. So the useful mental model is that the transformer block still has the same main attention and feed-forward path, but Gemma 4 adds a small layer-specific token vector after the feed-forward branch. This increases representational capacity through embedding parameters and small projections. This adds computational overhead but avoids the cost of scaling the entire transformer stack to the larger parameter count. But why PLEs? The simpler alternative would be to make the dense model smaller, using fewer layers, narrower hidden states, or smaller feed-forward networks. That would reduce memory and latency, but it also removes capacity from the parts of the model that do the main computation. The PLE design keeps the expensive transformer blocks closer to the smaller “effective” size, while storing additional capacity in per-layer embedding tables. These are much cheaper to use than adding more attention or FFN weights, since they are mainly lookup-style parameters that can be cached. Also, we have to take Google’s word here that this is an effective and worthwhile design choice. It would be interesting to see some comparison studies to see how this E2B design compares to a regular Gemma 4 2.3B model and a regular Gemma 4 5.1B model. Also, in principle, PLE is not inherently limited to small models. We could attach per-layer embedding slices to larger models, too. However, larger models already have sufficient capacity where these extra embeddings may not help that much. Also, for larger models, we already use MoE designs as a trick to increase capacity while keeping the compute footprint smaller. By the way, if you are interested in a relatively simple and readable code implementation, I implemented the Gemma 4 E2B and E4B models from scratch here . Figure 8: Snapshot of my Gemma 4 from-scratch implementation . 3. Layer-Wise Attention Budgeting (Laguna XS.2) Laguna is the first open-weight model by Poolside , a Europe-based company focused on training LLMs for coding applications. Several of my former colleagues joined Poolside in recent years, and they have a great team with lots of talent. It’s just nice to see more companies also releasing some of their models as open-weight variants. Anyways, the Laguna XS.2 architecture depicted below looks very standard at first glance. However, one detail that I didn’t show (/try to cram into there) is a concept we can refer to as “Layer-wise attention budgeting”. Figure 9: Poolside’s Laguna XS.2 architecture. Part of the idea behind the attention budgeting here is that instead of giving every transformer layer the same full attention budget, Laguna XS.2 varies the attention cost by layer. It has 40 layers total, with 30 sliding-window attention layers and 10 global/full attention layers. As usual, the sliding-window layers only attend over a local window (here: 512 tokens), which keeps the KV cache and attention computation cheaper. The global layers are more expensive but preserve the ability to access all information in the context window. This mixed sliding-window + global/full attention pattern is not unique to Laguna XS.2 and is used by many other architectures (including Gemma 4). But what’s new is the use of per-layer query-head counts. For instance, the Hugging Face model hub config.json includes a setting, so layers can have different numbers of query heads while keeping the KV cache shape compatible. Figure 10: Per-layer query-head budgeting in Laguna, where full attention layers use 6 query heads per KV head, and sliding window attention layers use 8 query heads per KV head. So Laguna XS.2 gives more query heads to sliding-window layers and fewer query heads to global layers, while keeping the KV heads fixed at 8. That is the actual layer-wise head budgeting in the config. Laguna XS.2 is one of the most prominent recent examples of this per-layer query-head budgeting in a production-style open model. But the broader idea of varying model capacity by layer goes back to (at least) Apple’s 2024 OpenELM . And again, what’s the point of such a design? Similar to KV-sharing, the point is to spend attention capacity where it is most useful, instead of giving every layer the same budget. Specifically, full-attention layers are expensive because they look across the whole context, so Laguna gives them fewer query heads compared to sliding window attention modules. (Besides, another smaller implementation detail is that Laguna also applies per-head attention-output gating; this is somewhat similar to Qwen3-Next and others, which I also omit here since I covered it in earlier articles.) 4. Compressed Convolutional Attention (ZAYA1-8B) Similar to Laguna, ZAYA1-8B is another new player on the open-weight market. It is developed by Zyphra , and one of the interesting details around the release is that the model was trained on AMD GPUs rather than the more common NVIDIA GPU (or Google TPU) setup. The main architecture detail, though, is Compressed Convolutional Attention (CCA), used together with grouped-query attention. Unlike MLA-style designs that mainly use a latent representation as a compact KV cache format, CCA performs the attention operation directly in the compressed latent space, but more on that later. (Sidenote: the ZAYA1-8B config.json lists 80 alternating layer entries rather than 40 conventional transformer blocks. These entries alternate between CCA/GQA attention and MoE feed-forward layers. But for the architecture figure, it is more convenient to visualize this as 40 repeated attention + MoE pairs, which is conceptually equivalent.) Figure 11: Zaya1 (8B) with transformer blocks featuring compressed convolutional attention. As hinted at in the figure above, ZAYA1-8B uses Compressed Convolutional Attention (CCA) together with a 4:1 GQA layout. The key point is that its attention block is built around CCA rather than a standard sliding-window attention block. What is Compressed Convolutional Attention? I would say CCA is related in spirit to Multi-head Latent Attention (MLA) in DeepSeek’s models, since both introduce a compressed latent representation into the attention block. However, they use that latent space differently. MLA mainly uses the latent representation to reduce the KV cache. In MLA, the KV tensors are stored compactly and then projected into the attention-head space for the actual attention computation. Figure 12: Regular Multi-head Attention (MHA) and Multi-head Latent (MLA) attention side by side. CCA compresses Q, K, and V and performs the attention operation directly in the compressed latent space. This is why CCA can reduce not only KV cache size, but also attention FLOPs during prefill and training. Figure 13: Multi-head Latent Attention (MLA) and Compressed Convolutional Attention (CCA) side by side. As Figure 13 above illustrates, in CCA, the compressed, latent representations enter the attention mechanism directly, and the resulting compressed attention vector is then up-projected. Note that this is called Compressed Convolutional Attention, not just Compressed Attention, since there is an additional convolutional mixing happening on the latent K and Q representations. The convolutional mixing part is not shown in Figure 12, because it would have been too crammed, but it’s relatively straightforward. As hinted at in Figure 12, the convolutional mixing happens directly on the compressed Q and K tensors. The point is that compression makes Q, K, and V narrower, which saves compute and cache, but it can also make attention less expressive. The convolutions are a cheap way to give the compressed Q and K vectors more local context before they are used to compute attention scores. (The convolutional mixing is only applied to Q and K, not V, because Q and K determine the attention scores, while V represents the content that gets averaged via these scores). Figure 14: conceptual overview of the sequence-mixing convolution Next to the sequence mixing shown in Figure 13, there is also a channel mixing component. It’s in principle similar though, so I am omitting the illustration. CCA appears to be a Zyphra-introduced attention mechanism that predates the ZAYA1-8B technical report . The standalone CCA paper, Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space , was first posted in October 2025 and explicitly introduces CCA. ZAYA1-8B then uses this mechanism as one of the core pieces. But the question is, “is it better than MLA”? According to the CCA paper’s own experiments, yes, they report CCA outperforming MLA under comparable compression settings. Figure 15: Annotated figures from the CCA paper, https://arxiv.org/abs/2510.04476 . Overall, the interesting part here is really the new attention mechanism. The model also uses a pretty extreme (= very sparse) MoE setup, with only one routed expert active per token, but that part is more familiar. CCA is more unusual because it performs the attention operation directly in a compressed latent space, and then uses convolutional mixing on the compressed Q and K representations to make this compressed attention less limiting. So, in short, ZAYA1-8B is not only trying to save compute in the feed-forward layers, but also in the attention mechanism itself. 5. CSA/HCA, mHC, and Compressed Attention Caches (DeepSeek V4) DeepSeek V4 was the biggest release of the year so far, both in terms of hype and model size. Interestingly, DeepSeek V4-Pro is also the most parameter-sparse MoE among the models in the table below, measured by active-parameter share, as summarized in the table below. Figure 16: Percent active parameter plot for MoE models. You can also find an HTML version at https://sebastianraschka.com/llm-architecture-gallery/active-parameter-ratio/ . Caveat: active parameter share is only one lens. It does not capture KV cache size, attention pattern, context length, routing overhead, hardware efficiency, or training quality. But it is a helpful, quick check when comparing sparse models. There’s a lot to say about DeepSeek V4, but since it’s been all over the news already, and to stay on topic regarding architecture tweaks, I will focus on the two most relevant parts that are new compared to previous architectures: mHC for a wider residual pathway, CSA/HCA for long-context attention compression and sparsity Figure 17: DeepSeek V4-Pro architecture overview. 5.1 Manifold-Constrained Hyper-Connections (mHC) Let’s start with the mHC component of DeepSeek V4. This goes back to a research paper that the DeepSeek team shared last year (31 Dec 2025, mHC: Manifold-Constrained Hyper-Connections ). However, in this paper, the technique was only tested on an experimental 27B scale model. Now, we see it in their flagship release, which is a good sign that this idea actually works well in production. The main idea behind mHC here is to modernize the design of the residual connections inside the transformer block, which is refreshing, because architecture tweaks are usually focused on the attention mechanism, normalization layer placement, and MoE parts. Now, mHC is based on previous work on hyper-connections (see Hyper-connections by Zhu et al., 2024), which we should briefly discuss first. Hyper-connections essentially modify the single residual stream inside the transformer block by replacing it with several parallel residual streams and learned mappings between them. (For those new to residual connections, I made a video on residual neural networks many years ago, where I explained the general mechanism.) The idea behind hyper-connections is to widen the residual stream. We can think of this as keeping several parallel residual streams, with an additional Res Mapping linear transformation that mixes them across layers. Since the Attention or MoE layer itself still operates on the normal hidden size, hyper-connections also add a Pre Mapping that combines the parallel residual streams into one normal hidden vector for the layer, and a Post Mapping that distributes the layer output back across the parallel residual streams. This is visually summarized in the figure below. Figure 18: Regular transformer block (top) vs transformer block with hyper-connections (bottom) using annotated figures from the mHC paper, https://arxiv.org/abs/2512.24880 . The figure below focuses on the attention-layer portion of the transformer block, but the same concept applies to the second residual branch around the MoE layer. The purpose of hyper-connections is to make the residual pathway more expressive without making the actual Attention or MoE layer wider. This is only mildly more expensive in FLOPs because the extra mappings operate over the small residual-stream axis, for example, n = 4 in DeepSeek V4, not over a huge hidden dimension. In the original hyper-connections paper, the 7B OLMo MoE experiment goes from 13.36G to 13.38G FLOPs per token, which is basically unchanged. In terms of reported gains, there were modest (but consistent) improvements, as shown in the figure below. (However, only looking at FLOPs is a bit simplistic. The widened residual state still has to be stored, moved through memory, mixed, etc. So the practical overhead can come more from memory traffic and implementation complexity than from arithmetic, which is not explicitly measured. However, given that DeepSeek V4 is all about efficiency, it seems to be a worthwhile addition.) Figure 19: Hyper-connections performance versus baseline, using an annotated figure from the hyper-connections paper, https://arxiv.org/abs/2409.19606 . Also, as shown in the figure above, metrics reached the baseline’s performance using roughly half the training tokens. The main change from regular hyper-connections (HC) to manifold-constrained hyper-connections (mHC) is that the mappings are no longer left unconstrained. In regular HC, the Res Mapping is a learned matrix that mixes the parallel residual streams, but stacking many such matrices can amplify or shrink signals unpredictably. In mHC, this residual mapping is projected onto the manifold of doubly stochastic matrices, meaning all entries are non-negative and each row and column sums to 1. This makes the residual mixing behave more like a stable redistribution of information across streams. The Pre Mapping and Post Mapping are also constrained to be non-negative and bounded, which avoids cancellation when reading from and writing back into the widened residual state. In short, mHC keeps the richer residual mixing of HC, but adds constraints so it scales more safely, which becomes more relevant for larger (deeper) models. Otherwise, the main idea of using parallel residual streams remains, as shown in the figure below. Figure 20: Transformer block with hyper-connections (HC) and manifold-constrained hyper-connections (mHC) using annotated figures from the mHC paper, https://arxiv.org/abs/2512.24880 . In the mHC paper, using a 27B parameter model for the experiments, the DeepSeek team’s optimized implementation (with fusion, recomputation, and pipeline scheduling) adds only 6.7% additional training time overhead for 4 residual streams (n = 4) throughout all transformer blocks compared to the single-stream baseline. To sum up this section, HC/mHC changes how information is carried around these layers by replacing the single residual stream with several interacting residual streams, with the additional stability constraints added in mHC, while adding minimal compute overhead. Also, it pairs well with the CSA/HCA attention changes, which modify other parts of the transformer block, which I will discuss below. 5.2 Compressed Attention via CSA and HCA The other major DeepSeek V4 architecture change is on the attention side. Again, the motivation is that at very long context lengths, attention becomes expensive not only because of the attention score computation, but also because the KV cache grows with the sequence length. DeepSeek V4 addresses this issue with a hybrid of two compressed-attention mechanisms, Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). For a refresher, I recommend checking out my previous “ A Visual Guide to Attention Variants in Modern LLMs ” article, which covers Multi-head Latent Attention (MLA) and DeepSeek Sparse Attention (DSA), among others. The first thing to note is that CSA/HCA in DeepSeek V4 is a different kind of compression than the MLA-style compression used in DeepSeek V2/V3. Where MLA mainly compresses the per-token KV representation, CSA and HCA compress along the sequence dimension. So, instead of keeping one full (or compressed) KV entry for every previous token, they summarize groups of tokens into fewer compressed KV entries. Consequently, the cache gets shorter. DeepSeek V4 also uses compact compressed entries and shared-KV attention, but the main distinction from MLA is the sequence-length compression. This is illustrated in the figure below. Figure 21: Conceptual comparison of MLA-style per-token latent caching, CSA, and HCA. MLA compresses the stored KV representation but keeps one latent entry per token. CSA shortens the sequence more mildly with m=4 and sparse top-k selection, while HCA uses much heavier sequence compression with m’=128 and dense attention over the shorter cache. The quality tradeoff for CSA/HCA is also different from MLA. As shown in the figure above, MLA compresses the representation stored for each token, but it still keeps one latent KV entry per token. CSA and especially HCA go further by reducing the number of sequence entries themselves, so the model gives up some token-level info in exchange for much lower long-context cost. Again, it’s all about reducing long-context cost, but this trade-off can hurt modeling quality if the compression is too strong, which is why DeepSeek V4 does not rely on one compression scheme alone but alternates between CSA and HCA. CSA uses a milder compression rate and a DeepSeek Sparse Attention (DSA)-style selector, HCA uses much heavier compression for cheaper global coverage, and both keep a local sliding-window branch for recent uncompressed tokens. This sparse selection in CSA builds on DeepSeek Sparse Attention (DSA), which I discussed in more detail in my earlier DeepSeek V3.2 write-up . HCA is the more aggressive variant of the two. It compresses every 128 tokens into one compressed KV entry, but then uses dense attention over those heavily compressed entries. In other words, CSA keeps more details but uses sparse selection, while HCA keeps far fewer entries and can afford dense attention over them, as illustrated in the figure below. This makes the two mechanisms somewhat complementary, which is why DeepSeek V4 interleaves CSA and HCA layers rather than using only one of them. Figure 22: CSA selects a sparse set of compressed history blocks, while HCA attends densely over more heavily compressed blocks. Both paths also include recent uncompressed KV entries through a 128-token sliding-window branch. The DeepSeek V4 paper reports that, at a 1M-token context length, DeepSeek V4-Pro uses only 27% of the single-token inference FLOPs and 10% of the KV cache size compared with DeepSeek V3.2, which uses MLA and DeepSeek Sparse Attention (DSA). DeepSeek V4-Flash is even smaller, at 10% of the FLOPs and 7% of the KV cache size relative to DeepSeek V3.2. Figure 23. Reported 1M-context efficiency numbers from the DeepSeek V4 paper, relative to DeepSeek V3.2. By the way, I would not describe CSA/HCA as “better” than MLA in a general sense. CSA/HCA is a more aggressive long-context design. And it’s also more complicated for sure. Unfortunately, there is no ablation study in the paper. But overall, the paper reports strong overall modeling results, including DeepSeek V4-Flash-Base outperforming DeepSeek V3.2-Base on a majority of base-model benchmarks and strong 1M-token retrieval results, but these results are for the full DeepSeek V4 recipe, which also includes better data, Muon-based optimization, mHC, precision/storage optimizations, and training/inference-system changes. Personally, for now, I would treat CSA/HCA as an efficiency-focused long-context design that appears to preserve modeling quality well in their large flagship model(s) but not necessarily universally better than MLA. 6. Conclusion Overall, the interesting pattern this year is that most new open-weight models try to make long-context inference cheaper without just shrinking the model in terms of total parameters. For instance, Gemma 4 reduces KV-cache memory with cross-layer KV sharing and adds capacity via per-layer embeddings. Laguna XS.2 tweaks how much attention capacity each layer gets. ZAYA1-8B moves attention into a compressed latent space. DeepSeek V4 adds constrained residual-stream mixing and compressed long-context attention. Figure 24: The evolution from GPT-2 (2019) to DeepSeek V4-Pro (2026) For instance, I am fairly certain that someone who is diving into LLM architectures for the first time will be totally overwhelmed when seeing the DeepSeek V4 source code. However, by starting with the original decoder-style LLM (GPT/GPT-2) and then gradually adding / learning about these new components one at a time, we can keep the learning effort manageable. The moral of the story, I guess, is to keep learning, one architecture at a time :). By the way, I am very excited to share that I finished writing Build A Reasoning Model (From Scratch) and all chapters are in early access now. The publisher and I worked hard on the final layouts in the past month, and it’s going to be send to the printer this week. (Good news: the print version will be in color this time!) This is probably my most ambitious book so far. I spent about 1.5 years writing it, and a large number of experiments went into it. It is also probably the book I worked hardest on in terms of time, effort, and polish, and I hope you’ll enjoy it. Build a Reasoning Model (From Scratch) on Manning and Amazon . The main topics are evaluating reasoning models inference-time scaling self-refinement reinforcement learning distillation Amazon (pre-order of Kindle ebook and print paperback) Manning (complete book in early access , pre-final layout, 528 pages)

0 views
David Bushell 2 weeks ago

Surveys will continue until diversity improves

The web and tech industry is a veritable sausage party. We don’t need surveys to prove it but we have surveys to prove it . State of surveys have been running for a decade now. Let’s look at the 2025 survey demographics: Yes I think “sausage party” is accurate. Weißwurstfest even. And yes cock jokes are part of the problem. When I worked in London in the early 2010’s every tech meet-up was plaid shirts and IPA frosted moustaches. Larger tech conferences were better. They had a few women attending and occasionally allowed to speak and a better variety of beers. I worked and mingled with a good bunch of lads. Even good lads make cock jokes after a craft beer. Just a joke, innit? When you read accounts like Ana Rodrigues’ it’s easy to think “not my lads” but then you remember the boisterous punchlines, and that one guy… but he was more of a tagalong. Some of us grow up but the industry doesn’t. These days I work remotely and don’t get out much but I get the impression little has changed. Certainly the online bro-culture amplifies the worst traits. Now we have LLMs built by and trained on that culture. Ain’t that wonderful. The State of surveys continue to report alarming numbers. Are they a fair representation of the industry? Do they help or hinder diversity? Miriam Suzanne raised the concern in 2024. These correlations don’t tell us much without knowing how representative the data is. I’m just not sure what I’m looking at, or how it should be read. But it concerns me that browsers use surveys like this as a primary gauge of developer interest – seemingly without asking who’s represented, or who might be missing from the data. What do survey demographics tell us? - Miriam Suzanne As Miriam noted the State of surveys do influence browser vendors. The focus areas for 2026 include several areas identified as top interop issues in the State of HTML and State of CSS surveys. Interop 2026: Continuing to improve the web for developers - Rachel Andrew Yet survey after survey after survey the demographics remain the same. Maybe the web industry is actually dominated by white guys (and now their new chat box companions). Oh and 60–70% of those surveyed report “None” under “Disability Status” so there’s that too. This is all kind of a big problem, obviously. Other humans need to use the web. Their voices need to influence the web platform. Maybe if we actually listened we could support more diverse needs and spend less time fast-tracking bro-tech . So yeah I mock the State of surveys because what are we doing here? Why are we looking at these numbers and concluding: “Wow! I can’t believe Axios is still popular in [current year]!” Lack of diversity is the only relevant takeaway that means anything. I don’t know if these surveys are part of the problem. I know they’re not the solution. But who knows, if we keep asking six times a year maybe diversity will improve? Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views
Manuel Moreale 2 weeks ago

RMF

This week on the People and Blogs series we have an interview with RMF, whose blog can be found at baccyflap.com/prs/blog . Tired of RSS? Read this in your browser or sign up for the newsletter . People and Blogs is supported by the "One a Month" club members. If you enjoy P&B, consider becoming one for as little as 1 dollar a month. My name's rmf. My legal name's not terribly hard to find, but I like to keep it lightly buried just so my 2006 blog isn't the first thing you find when you search for my name. I'm a native of the Netherlands, where I reside. I live in a small city with my partner; she's an archaeologist and I'm a botanist, though I currently teach museum anthropology classes. I went from doing science, to teaching science, to teaching culture. I've never believed in restricting a whole human life to one field of study, so I'm having a blast. My computer skills have always been self-taught. While I was in middle school I fiddled with Microsoft Paint and from there on I got to grips with ever more advanced graphic software (currently GIMP and Inkscape). In high school I liked to make videos with my friends which I edited in Windows Movie Maker, which lead to an ongoing on-and-off hobby of video editing (in Kdenlive). In 2002, I set up a WYSIWYG website which lead to me learning HTML and later CSS and, later still, PHP. Right now I do some graphics stuff for my job in education, such as making instruction sheets, posters and some other small-time stuff, but really, pretty much all my computing is done in my free time, for fun. I think that's a blessing - I don't have to work with anything I don't want to work with and do everything I do for the love of the game. Beside that I make soap which is part hobby, part side job. I enjoy tinkering with technology, so I have lots of esoteric hifi equipment, some old games consoles, old calculators... if it can be tinkered with, I like it. I enjoy writing prose and poetry and have recently been getting into fermenting and pickling, though I am subordinate to my partner in that. She's the head of pickling and fermenting, I take care of the old electronics; she draws and paints, I write; and then at the end of the day, we cook together. I started my website in 2002 and by 2003 I had a little update box to briefly communicate whatever I was doing with the site. That update box turned into a shoutbox of my random thoughts and as those got a bit longer and rantier every time, in October 2005 I turned it into a blog. Blogging was the thing to do at the time and so, at age sixteen, I figured I had enough to say to warrant a stab at the practice. It was all coded by hand: no CMS or JavaScript, just handwritten HTML with the appearance of a blog. It was all over the shop, subjectwise. A fair amount of it had to do with palaeontology and/or me being an epic atheist - ups and downs. It was simply named 'blog' and it changed over the years with the design of the site but all in all, it was very simple. No RSS, no comments, just static HTML pages updated manually. The surprising thing to me is that I had an audience - I got somewhat regular emails about my posts. I blogged until 2009. I did that classic thing of writing fewer and fewer posts and finally announcing a newer, better blog hosted at Blogger. I wrote a grand total of 4 posts for it, stopped for a year, and finally took it down. I lost interest and so, it petered out. Cut to 2026, I'm reading a few more blogs than I had been for the past several years and I start to get the blogging bug again. Or perhaps the bug was dormant and now reawakening. I'd been considering it for a while but specifically, funnily enough, after reading your article about stopping the People & Blogs series, I got inspired to pick up the pen again. Over the last decade I've written on and off for a couple of magazines and I had a regular column in a local newspaper for a while. I think my intrusive blogging thoughts started when that column went away - I like to write, it's something of a compulsive thing, and while the newspaper let me write practically whatever I wanted, it still had some constraints such as length, a certain form, and at the end of the day, some amount of harmlessness. It had to be a column - it could make the readers think, but not too much or about controversial things. So the blog suddenly popped into my head as a perfect fit. Whatever topic I want, whatever length, whatever form. And so in 2026, I picked up blogging again. I did write a CMS and some code for an RSS feed - other than that, I tried to keep the form of the blog as close to the original as possible. And again, to my surprise, there are people reading this blog. I'm clueless as to how they're finding it, buried in a subsection of my site as it is, but I'm getting emails again. A grand total of two people suggested I give the blog a name, which I did. It's now called 'bakelite & roses', a name I explain at baccyflap.com/prs/blog/2026/?m=03#1773065697 . My inspiration comes from whatever happens to me. So far I've written about umbrellas, tamagotchi, deadly accidents, CD collections and some other stuff - that's the most liberating thing to me, getting to write whatever the hell I want. I like it to be interesting, to have some novel (to me) observations in it, but other than that, it's just whatever occurs to me. It's comparable to the columns I used to write in that sense - I write them quick, maybe give them a quick read later on, and then just post. I'll often read them to my partner who will usually describe them as 'cute', which is good enough for me. I write wherever. Back when I had deadlines I'd slack off right until the final hour and then just use whatever's to hand. I've written a few on my phone but I suppose I mostly write on my laptop, just because it's faster. I'll do it at home, on the go, at work, wherever inspiration strikes. My site's hosted on a buddy's server. He runs a small IT company so he takes care of the domain too - it's an old arrangement and we're sticking with it. I pay him, he pays the bills. The blog itself is written in PHP - when I restarted in 2026 I finally wrote a backend, still pretty primitive but it makes my life a bit easier and crucially, it enabled me to provide an RSS feed. I type a post into a dirt simple little CMS and hit 'post' to add the post to a JSON file, which the RSS feed also pulls from. I may provide the source code at some point, when it's not as hokey as it is now. Well, I started it in January, which is pretty close to today, so I think I'm all good. I guess, looking back at my old posts, I do sometimes cringe at them. I added a disclaimer to those posts, just to distance myself from the bad ones. But I didn't remove them - they still reflect who I was at the time and in some weird way, who I am now. I wouldn't be honouring teen me by removing any of it and looking back I guess I could say I'd wish I'd written better stuff... but you know what, that's what I wanted to write at the time and as confident as I was of my own intellect at the time, so I am now about the public's capacity to contextualise these posts. There are wonderful, thoughtful posts in there, but also some dubious stuff, and some garbage. So short answer: I think it's perfect, wouldn't change a thing. I pay my buddy €100 a year to cover his costs and so he can write me a bill which is good for his company. It generates precisely nothing, which is how I like it. People can do whatever they want with their blogs but for me, it's just a bit of fun in my free time. No Patreons and Ko-fis for me - I know everyone wants to turn every aspect of their lives into a revenue stream these days, but for me, it's just a way to reach out. Of all blogs, the one I've been reading for the longest (22 years!) is Pharyngula . Out of all the 'new atheist' types, PZ Myers is one of the few who did not turn out to be a dirtbag. He stuck to his progressive guns and has as sharp a pen as ever. For the sheer dedication of the author it's worth a read, whether the range of topics is up your street or not. I'm currently working on a podcast, a bit of a personal project that has been taking more of my time than I thought it would. Currently in the outline stage, it'll take some time before I can finally start recording. It is driving home to me that making a podcast is, at the best of times, an effortless thing that very few people know how to do well. I honestly don't like most podcasts but one I've been enjoying, one of those podcasts that springs up on you and just keeps on giving, is Bread & Bananas , a podcast about Kampung Gelam, an old neighbourhood of Singapore, made and presented by three inhabitants of said neighbourhood. And if you're wondering why on Earth this would be a topic of interest to anyone outside that neighbourhood... well, just give it a listen. It's chill, it's thoughtful, it'll surprise you. Six episodes so far, a new one every couple of months. Now that you're done reading the interview, go check the blog and subscribe to the RSS feed . If you're looking for more content, go read one of the previous 141 interviews . People and Blogs is possible because kind people support it.

0 views
Maurycy 2 weeks ago

Search engine results are truly terrible

A few months ago, I had the displeasure of trying to use the modern web without an ad-blocker. Even though it's is ubiquitous among computer nerds, ad blocking is quite rare even in other technical fields. This got me wondering how search engines perform without all the tricks people do to get better results. As a test, I wrote a few queries for... common software: ... obscure, but easy to find information: What is the lowest K-alpha emission energy of Molybdenum? ... and few normal(-ish) questions: What photodiode circuit should I use? How do airplane wings work? Why are brushed motors most efficient at high speeds? Asking a search engine questions is almost never the best way to find good information, but it's what I've seen a lot of people do. To replicate the experience of a normie/victim I made sure to include the AI summary, sponsored results and info boxes: TLDR ; No tool produced consistently good results. This isn't a matter of my standards being to high: good results for all these queries exist on the web, but they all failed to find them. They had a real problem with returning vaugely related blogspam. Having a good result in the top 3 was fifty-fifty. For the ad blocker and molybdenum, ChatGPT was able to produce a good answer, but it's responses were deeply flawed or outright incorrect for the other three questions... largely because it was rephrasing the same spam that tripped up all the others. Marginalia generally did very poorly, but it was the only one to perform decently on the motor question: All the others returned surface-level AI slop, while it found a nice writeup on motors that answered the question. Grading scale: Good : First result is correct and not spam. For the questions, I'm not looking for a text book: a single sentence explanation is perfectly fine provided that it explains the right thing and holds water. Ok : Some spam/incorrect/incomplete/irrelevant pages, but a good result can be found in the first three links. Just to be clear, this is not a good outcome: it means the top result was wrong or spam. Bad : Same as ok, but using the first five links. Crap : First five results are all wrong, spam or spammy scams. Five might not sound like a lot, but given the amount of junk in a modern search engine interface, it's really quite rare for people to scroll pass those first five results. ChatGPT isn't a search engine , so I ranked it on correctness of the answer: Good = Correct and well explained. Ok = Correct, but not very good. Bad = Incomplete. Crap = Wrong or incomplete to the point of being harmful. Detailed results: ad blocker For ad blockers, I'll only accept uBlock Origin or DNS based solutions. In order to work, an ad-blocking extension needs a huge amount of access to your browser: it's not a good idea to take chances. uBlock Origin is free, open source (so you can see what it's doing) and very effective: Paying a difficult to cancel subscription for a inferior product is not a good idea. A lot of those shady extensions also have identical pricing plans, which make me think they are slop-ware pumped out by one guy. I don't have proof that they are scams in the strict sense, but it is rather suspicious. " Ad block - [...] - Chrome web store ": Charges a $40/year subscription, allows "non-intrusive" advertising and collects data. " AdBlock Plus ": Same deal. Infobox linking to https://getadblock[.]com/ : The usual. " Get AdBlock ": ditto. " uBlock Origin ": Finally, a good result. Just in time to save google from the "crap" tier, but I doubt it's early enough to stop someone from being scammed. Verdict: bad. " Adblock Plus ": Same as google's #2. Infobox with " https://www.windowscentral[.]com/how-block-ads-and-trackers-xbox ": an ad-filled blog-spam site. It does provide reasonable instructions, but good luck reading it without an ad blocker. A second infobox linking to " Adblock vs Adblock Plus - PC Guide ": an ad-laden blog-spam comparing two sub-par extensions. (both allow "acceptable ads") " uBlock Origin ": Good, but why is it so far down? " AdBlock — block ads across the web ": The usual scammy adblocker extension. Very similar to google's top four results. Verdict: bad. " Adblock Plus " same as google's #2 " Ad block - [...] - Chrome web store ": same as google's #1 " Adblock Plus ": Yet another shady adblocker with a $40/year subscription " The Ethical Ad Blocker " (infobox): A blog post describing an ad-blocker that blocks access to any websites that have ads, which prevents any accusations of piracy. Funny and probably real, but not what users are looking for. " AdGuard Ad blocker ": Yet another of those nearly identical sketchy adblockers. Kagi is the first search engine to not include uBlock in the first five results, but it does link me to someones's rather cool blog... however, I still had to scroll past quite a bit of junk to find it. Verdict: crap. DuckDuckGo : " Adblock Plus ": same as google #2. " AdBlock — block ads across the web ": same as google #1 " uBlock Origin ": Finally, in the top 3! " getadblock[.]com ": More junk. " AdBlock — block ads across the web ": Same as #2, but on Microsoft's extension store instead of googles. Verdict: ok. Marginalia : " Ghostery Ad Blocker ": Yet another blocker that doesn't actually block ads, and has been caught selling data to advertisers. " Ad blockers are not allowed on YouTube " A blog post with a half-baked list of ways to get around youtube's ad-blocker detection. Indirectly recommends uBlock, but also a lot of stuff that won't work. Not great. " Vivaldi ": Chrome with a built in adblocker. Not a scam, but you don't need to install a new browser to block ads. " EasyList is in trouble and so are many ad blockers ": Corporate blog post about hosting problems. " Ad Blockers - Contains Moderate Peril ": A blog post about ad-blockers, recommends "AdBlocker Ultimate". Not a spam, but not the best recommendation. Verdict: Crap. Marginalia's results are quite different from all the other search engines: It's pulled out two real blog posts alongside the usual spam. (Note: I modified the prompt to "Recomend me an ad blocker.") The LLM recommended [1] uBlock Origin Lite, which is a variant of uBlock for modern chrome, by the same author. The Lite version is technically more limited than the original, but still works works very well. It also suggested [2] "AdGuard AdBlocker", but only as a fallback. Verdict: Good. ... Molybdenum : "What is the lowest K-alpha emission energy of Molybdenum?" Despite this being a straightforward table lookup, all the LLM-summaries got it wrong: The lowest energy line is Kα 2 (17,374 eV), not Kα 1 (17,479 eV). The reason for this is that X-ray lines were first observed using diffraction, and measured by wavelength, which is inversely proportional to energy: Kα 1 is has a shorter wavelength, but higher energy. Incorrect AI overview citing a paper. The paper lists both K-alpha lines, but the LLM used the wrong one. Table from Lawrence Berkeley National Lab : lists the correct value. Another table , this time from an equipment manufacturer. Lists the correct value. A paper characterizing the X-ray fluorescence spectrum of molybdenum. "Characteristic X-ray - Wikipedia": an overview of X-ray emission lines, but it does not give any specific energy values. Not a relevant result. Verdict: ok. Wrong AI overview citing google's #2: It made the same mistake with Kα 2 and Kα 1 . " Molybdenum ": A nice little page from LBL listing some technical properties of molybdenum. This is the most relevant result so far. " 12.1: Fundamental Principles ": an article that happens to use molybdenum as an example, but lists wavelengths instead of photon energy. " Experimental K-alpha x ray energies ": a table of emission lines. The same paper as google's #4. Verdict: ok. A very wrong AI overview giving "0.709 eV": off by four orders of magnitude! I suspect it took the number from Bing's #3, but instead of actually converting the wavelengths to energy, it just slapped an "eV" on. Same table as google's #2. A good result. Same as google's #3. A good result. A page about the theoretical calculation of X-ray lines. Does not provide an energy for molybdenum. A list of chemical properties of molybdenum. Does not mention X-rays. This nicely demonstrates the problem with LLMs: A chatbox usually gets things (mostly) right, but will occasionally be very, very wrong. Verdict: ok. DuckDuckGo : Incorrect AI overview referencing a NIST publication . Same as bing #2. A good result. Same as bing #3: not relevant to the question. Same as google #4. A good result. Some data table : a perfectly fine result. No surprises here: It's a few good sources and a slightly wrong LLM summary. Verdict: ok. Marginalia : "Plasma catalytic non-oxidative conversion of methane into hydrogen and light hydrocarbons": A preprint paper that used X-ray equipment and mentioned molybdenum in passing. "XRF Technologies for Measuring Trace Elements in Soil and Sediment": Similar to #1. A paper that used X-ray equipment and mentions molybdenum, but does not answer the question. Marginalia doesn't try to be a comprehensive index, so it's unsurprising that it did badly on this one: only two results were returned, and none of them included the requested number. Verdict: crap. Chat gave 17.37 keV, which is the correct value. Good job on being the only LLM to answer a simple question correctly. ... Photodiodes : "What photodiode circuit should I use?" Photodiodes are excellent light sensors, but their output is a small and difficult to measure current. Generally, the best way to fix this is with a transimpedance amplifier: an op-amp circuit that converts the current into an output voltage while keeping the sensor's bias constant. This provides a fast and exceedingly linear response. An ideal result would also mention techniques like bootstrapping (to increase bandwidth of large sensors) and logarithmic converters (to measure a wide range of light levels). AI overview citing #4, recommending a transimpedance amplifier, but it provides a schematic of a different configuration. "Photodiode – A Beginner’s Guide": A blog-style website with circuits that don't work, are missing important details and have poor explanations. "Photodiode Basics": Ad-ridden page which does include the rough layout of a transimpedance circuit, but with no mention of feedback capacitors. These are often needed to prevent oscillation. "What are the pros and cons for the various photodiode circuit arrangements?": A forum thread that mentions transimpedance amplifiers, but doesn't give any specifics. "Photodiode Component Basics [...] - Youtube": Video with a demonstration of a photodiode working, but without any amplification or readout circuits. Verdict: Crap. AI overview citing #2, but it recommends a bad configuration with a resistor in parallel with the diode. The output is non-linear, high-Z and, difficult to use. "Photodiode – A Beginner’s Guide": Same as google #2. A bad result. "Photo Diode (Symbol, [...] Pros & Cons) Explained - Youtube": Another super generic video. "Fire Detection Circuit Using Photodiode": Content farm video with no schematic and no explanation. "Photodiode Construction and Working - Youtube": Another extremely generic explanation video. Does not include any circuits or even discuss the problem. Verdict: Crap. "Photodiode – A Beginner’s Guide": Same bad article as google's #2. "Photodiode Basics": The same as google's #3: incomplete circuits on an ad-ridden page. "What are the pros and cons for the various photodiode circuit arrangements?": Same as google #4, an unhelpful forum thread. "PHOTODIODE OPERATION MODES AND CIRCUITS": Provides an example of a transimpedance amplifier, but has no example values or instructions on selecting them. " Technical notes / Si Photodiodes ": A PDF from a photodiode manufacturer, which provides practical circuits and a description of photodiode properties. This is the first results that provides enough information to actually build a working sensor. Verdict: Bad. DuckDuckGo : "Photodiode – A Beginner’s Guide": The same as google's #2, meh explanations and some of the circuits don't work. "Photodiode Basics": Same as google's #3: Incomplete circuits on an ad-ridden page. "PHOTODIODE OPERATION MODES AND CIRCUITS": Same as kagi #4. Not good enough to build a working circuit. "A Practical Guide to Photodiode Amplifier Circuit Design [...]": A marketing piece for a equipment manufacturer. Unlike the Hamamatsu appnote, this doesn't have any useful information. " Technical notes / Si Photodiodes ": Same application note as Kagi #5. A good result. Verdict: Bad. Marginalia : "PIN Photodiode gamma detection amplifier circuit - rectangular wave output": Forum post with a broken circuit. Not something you want to copy. "Circuit Diagram": An unrelated forum post about an XKCD comic. "Short Circuit Limiter": Unrelated blog post. "NES Cartridge Chaos: [...]": Unrelated blog post. "How can i increase the range of values that a light sensor gives?" Forum post showing an ok configuration, but with no explanation or information on how values should be selected. Verdict: Crap. Chat gave a very wall of text boiling down to "use a transimpedance amplifier", but with no explanation of what that is or why it's good for light detection. It also drew a nonsensical "schematic" which would be of no use to anyone trying to build one circuit: Hidden in the "citations", it did link to a reference designs from texas-instruments... and an AI generated blog-spam post. I'll bin it under "Bad". ... Wings : "How do airplane wings work?" The simplest reasonably correct answer is that wings are angled to push down on the air, which lifts the plane up. The fluid mechanics happening around the wing are very complicated, but I'll accept a good one sentence explanation. Of course, more rigorous and detailed explanations are fine, but they must actually be rigorous: many explanations add complexity in a way that results in more gaps. Also, there's a very common wrong answer (equal-transit) which asserts that the air takes the same amount of time to travel over the top and bottom of a wing. Therefore, since the top surface is curved, the air must move faster. By Bernoulli's principle, a higher flow velocity creates low pressure, and that low pressure region that pulls the wing up. This is wrong for multiple reasons: It violates the conservation of momentum, because the wing doesn't impart any momentum to the air. Obviously, fans work. Airplanes can fly upside down... which shouldn't be possible if lift is some special property of the wing's shape. Paper or balsa-wood planes with flat airfoils work fine. Other explanations go "something something Bernoulli", which is not technically wrong, but is deeply incomplete: Bernoulli's principle does come into play around a wing, but using it as an explanation requires showing that air speeds up as it travels over the top surface — something which can only happen because of a pre-existing low pressure region. These explanations does not hold water on it's own. Would a proper analysis of the airflow over a wing be a good result? Of course. Is it enough to point at a tiny fragment of that and handwave it as an explanation? No. I'll consider this as a bad result, because it's neither a good explanation, nor a useful model: Wrong models can useful if the truth is complicated, but this is quite the opposite. "Planes stay up because they push the air down" is simple, correct and builds intution. For example, it predicts that the pressure on the ground should increase as a plane flies over it... and it does. "Planes stay up because of Bernoulli" doesn't explain anything if you think about it for two seconds. All it does is bring in some math that isn't relevent until you read the rest of the textbook. AI summary citing a TikTok video which contains the "something something Bernoulli" argument. Not entirely wrong, but needlessly complicated and incomplete. How wings really work : A professor debunking "equal transit" with an experiment... nice, but a debunk is not an explanation. " How Airplane Wings REALLY Generate Lift ": A youtube video with the correct explanation. A good result. "ELI5: how does a wing work? - Reddit": Reddit thread, most comments are correct, but many are repeating the incorrect explanation. " How Wings Work ": A page with a mostly correct animation, but no explanation of what's happening. Verdict: Ok. AI summary stating the incorrect equal-transit explanation. Seems to be referring an an old Glenn Research page with the incomplete explanation. " Airplanes ": A correct article which calls out the incorrect bernoulli argument. A good result. The same correct video from Google #3. "How Airplanes Work: A Simple Explanation for Beginners": A youtube video giving the incomplete explanation. "How Wings Work": Same as google's #5. Verdict: Ok "How Does A Wing Work? - Science Through Time": AI slop video with an bad answer. I can't tell if this it is the "equal transit" model or the incomplete one, because it doesn't include anything resembling detail or logic. " How Does A Wing Actually Work? ": A Veritasium video on youtube, with the correct explanation. A good result. " How airplane wings work ": A cool video showing airflow over a wing, during normal flight and a stall... but it's not an explanation. " How Does A Plane Wing Work? ": Correct explanation and demo. "How do airplane wings work?": Explains the structural components of a wing, but not why it's able to create lift. Verdict: Ok "Learn How Airplanes Work": A page that lists the parts of a plane, and gives the incorrect "equal-transit" explanation. How planes work : An article with a brief, but correct explanation. Dynamics of Flight An old article from Glen Research with the "something something Bernoulli" explanation. "How airplane wings actually work - Today Plane crash": AI Slop article, wrong answer. "How wings work": an animation of airflow, but does not have an explanation Verdict: Ok "How do I explain what makes an airplane fly to a non-technical person?": Forum thread of people asking the same question. A few answers are correct, but a lot aren't. I'll bin it as a bad result. "How do the Americas Cup Yachts sails work?": Forum thread about sailing. "How do I keep my futuristic racing hovercraft from becoming airplanes?": Forum thread about fantasy hovercraft. "How is the fatigue life of an airplane wing flexing during turbulence determined? How do they keep track of it?" Forum thread on accelerated life testing and maintenance of aircraft. "How do you scale a svg img to fit container?" A CSS question that just happens be about an image of an airplane. Verdict: Crap. Says that wings create lift, and then states that this is because the shape speeds up the airflow faster over the top surface (why?) therefore, by Bernoulli’s principle, the pressure is lower on the top surface. This is the second category of bad explanations. Verdict: Crap ... Motors : "Why are brushed motors most efficient at high speeds?" Electric motors work by passing a current through coils, which creates a magnetic field. These magnetic field pushes against permanent magnets to create torque. To create continuous rotation, the direction of current and field must be constantly reversed to prevent the motor from locking up after half a turn. This is either done using mechanical switches (brushed motors), transistors (BLDC/stepper), or by running the device from AC power (synchronous motors). Either way, the the strength the magnetic field inside a motor determines it's torque, but the mechanical power is torque times rotational speed. However, resistive losses in the coil windings don't care about how fast the motor turns and are proportional to current. Therefore, at low speeds, more losses are incurred during each rotation, and the motor is less efficient. This is why motors are almost always geared down : Even if they can produce enough torque, it's a bad idea to run them anywhere except right below their unloaded speed. (efficency aside, the heat produced can damage them) Incorrect AI summary citing the AI slop in #2. "Comparing Energy Efficiency of Brushless vs. Brushed Motors": Slop blog that claims the high speeds reduce losses in the motor's commutator, which simply isn't true. Commutator losses (arcing) generally increase with rotational speed. "Brushless Vs Brushed DC Motors: When and Why to Choose One Over the Other": AI slop advert. Does not answer the question. "What’s the difference between a brushed and brushless motor, and is one better than the other?": Reddit thread that states that brushed motors are less efficient, but gives no explanation. (also, that's not what the question asked...) "The Advantages of Brushed Motors: Powering the World with Efficiency and Simplicity - Magmotor": AI slop, doesn't answer the question. "Brushed vs Brushless Motor: Key Differences, Performance, and How to Choose": AI slop, doesn't answer question. This is the first time I got 5 obvious AI slop results. It's not a good sign for the rest... Verdict: Crap. AI summary citing #2. "Brushed vs Brushless: Unraveling the Mystery of Motor Efficiency": AI slop that doesn't answer the question. It also states that motors produce more power at high speeds, which is true, but doesn't explain the question. At any given voltage, a motor has a torque at which it stalls and a maximum speed that's reached under no load. As you would expect, the motor makes the most power at roughly the half-way point between these two... but the efficiency is best at the extreme end of the speed range. "Comparing the Efficiency of Different Electric Motor Types": AI slop, doesn't answer the question. "Are Brushed DC Motors Still Relevant? Efficiency, Smart Control, and New Applications Explained": More AI slop. Doesn't answer the question. "Brushed vs Brushless Motors: Comparing Efficiency, Lifespan, and Performance Metrics": AI Slop. Doesn't answer question. Verdict: Crap. DuckDuckGo : AI summary citing "Brushed Motors vs. Brushless Motors": Neither answer the question. "Brushed vs Brushless: Unraveling the Mystery of Motor Efficiency": AI slop. "Comparing the Efficiency of Different Electric Motor Types": AI slop. "Brushed vs Brushless Motors: Comparing Efficiency, Lifespan, and Performance Metrics": AI slop. "Are Brushed DC Motors Still Relevant? Efficiency, Smart Control, and New Applications Explained": AI slop. Verdict: Crap. AI summary citing "Brushed DC Motor Theory": A page on a wiki run by Northwestern University. Talks about efficiency being zero under stall — which it is — but that's not what I asked about. "Brushless Vs Brushed DC Motors: When and Why to Choose One Over the Other": Probably human written, but doesn't answer the question, instead comparing two motor designs. (The efficiency curve is similar for both.) "Brushed vs Brushless: Unraveling the Mystery of Motor Efficiency": AI slop. "What’s the difference between a brushed and brushless motor, and is one better than the other?": Forum thread that isn't about the question and doesn't answer it. "Comparing the Efficiency of Different Electric Motor Types": AI Slop. Verdict: Crap. Marginalia : "Why does a Tesla car use an AC motor instead of a DC one?": A Forum thread that doesn't answer the question. Hobby CNC machining and resin casting : Lcamtuf is really good... but this isn't a page about electronics. It does mention motors, but gives no explanation for why there efficiency curve peaks at very high RPMs. CSC 297 Robot Construction: Driving Motors : A long and detailed website, that actually answers the question! The first actually relevant result. "Stepper motor - Wikipedia": Wiki page on a different type of motor. "Brushless vs. Brushed Motors [New for 2026]": AI slop. Verdict: Ok A win for marginalia! Only a single AI slop page was returned, and two of the results were detailed write-ups on motors and robotics: not LLM generated, not surface level blogspam, but actual resources that you can use for learning. Age is best indicator of a quality website: If it was written decades ago, and it's still up, someone decided it was worth keeping around for all these years. While the #3 result doesn't have a date, but it uses handwritten HTML which is quite rare nowadays. I'd guess it was written somewhere between 1990 and 2010... and this one's has been maintained as late as 2017, so they take some pride in what they wrote. This is what we loose when google promotes new content: well written pages by real people who actually care instead of a 5 minute rundown for hackernews. Chat provided a generally correct explanation, but it seems to have confused the questions with: "why do motors draw less current when when spinning quickly?". After some waffling about Back-EMF, it handwave that because the current decreased, the losses decreased — ok — and efficiency must be better... but that simply isn't true: Efficiency is the ratio of output power and input power. Under no-load conditions, the motor is drawing the minimum possible current, but it's also not producing any usable mechanical power, so it's efficiency is zero. Not only does the LLM's logic not hold water, it's much more complicated then the truth. Verdict: Crap. What is the lowest K-alpha emission energy of Molybdenum? What photodiode circuit should I use? How do airplane wings work? Why are brushed motors most efficient at high speeds? https://www.cs.rochester.edu/users/faculty/nelson/courses/csc_robocon/robot_manual/motor_drivers.html : That write up. " Ad block - [...] - Chrome web store ": Charges a $40/year subscription, allows "non-intrusive" advertising and collects data. " AdBlock Plus ": Same deal. Infobox linking to https://getadblock[.]com/ : The usual. " Get AdBlock ": ditto. " uBlock Origin ": Finally, a good result. Just in time to save google from the "crap" tier, but I doubt it's early enough to stop someone from being scammed. " Adblock Plus ": Same as google's #2. Infobox with " https://www.windowscentral[.]com/how-block-ads-and-trackers-xbox ": an ad-filled blog-spam site. It does provide reasonable instructions, but good luck reading it without an ad blocker. A second infobox linking to " Adblock vs Adblock Plus - PC Guide ": an ad-laden blog-spam comparing two sub-par extensions. (both allow "acceptable ads") " uBlock Origin ": Good, but why is it so far down? " AdBlock — block ads across the web ": The usual scammy adblocker extension. Very similar to google's top four results. " Adblock Plus " same as google's #2 " Ad block - [...] - Chrome web store ": same as google's #1 " Adblock Plus ": Yet another shady adblocker with a $40/year subscription " The Ethical Ad Blocker " (infobox): A blog post describing an ad-blocker that blocks access to any websites that have ads, which prevents any accusations of piracy. Funny and probably real, but not what users are looking for. " AdGuard Ad blocker ": Yet another of those nearly identical sketchy adblockers. " Adblock Plus ": same as google #2. " AdBlock — block ads across the web ": same as google #1 " uBlock Origin ": Finally, in the top 3! " getadblock[.]com ": More junk. " AdBlock — block ads across the web ": Same as #2, but on Microsoft's extension store instead of googles. " Ghostery Ad Blocker ": Yet another blocker that doesn't actually block ads, and has been caught selling data to advertisers. " Ad blockers are not allowed on YouTube " A blog post with a half-baked list of ways to get around youtube's ad-blocker detection. Indirectly recommends uBlock, but also a lot of stuff that won't work. Not great. " Vivaldi ": Chrome with a built in adblocker. Not a scam, but you don't need to install a new browser to block ads. " EasyList is in trouble and so are many ad blockers ": Corporate blog post about hosting problems. " Ad Blockers - Contains Moderate Peril ": A blog post about ad-blockers, recommends "AdBlocker Ultimate". Not a spam, but not the best recommendation. Incorrect AI overview citing a paper. The paper lists both K-alpha lines, but the LLM used the wrong one. Table from Lawrence Berkeley National Lab : lists the correct value. Another table , this time from an equipment manufacturer. Lists the correct value. A paper characterizing the X-ray fluorescence spectrum of molybdenum. "Characteristic X-ray - Wikipedia": an overview of X-ray emission lines, but it does not give any specific energy values. Not a relevant result. Wrong AI overview citing google's #2: It made the same mistake with Kα 2 and Kα 1 . " Molybdenum ": A nice little page from LBL listing some technical properties of molybdenum. This is the most relevant result so far. " 12.1: Fundamental Principles ": an article that happens to use molybdenum as an example, but lists wavelengths instead of photon energy. " Experimental K-alpha x ray energies ": a table of emission lines. The same paper as google's #4. A very wrong AI overview giving "0.709 eV": off by four orders of magnitude! I suspect it took the number from Bing's #3, but instead of actually converting the wavelengths to energy, it just slapped an "eV" on. Same table as google's #2. A good result. Same as google's #3. A good result. A page about the theoretical calculation of X-ray lines. Does not provide an energy for molybdenum. A list of chemical properties of molybdenum. Does not mention X-rays. Incorrect AI overview referencing a NIST publication . Same as bing #2. A good result. Same as bing #3: not relevant to the question. Same as google #4. A good result. Some data table : a perfectly fine result. No surprises here: It's a few good sources and a slightly wrong LLM summary. Verdict: ok. Marginalia : "Plasma catalytic non-oxidative conversion of methane into hydrogen and light hydrocarbons": A preprint paper that used X-ray equipment and mentioned molybdenum in passing. "XRF Technologies for Measuring Trace Elements in Soil and Sediment": Similar to #1. A paper that used X-ray equipment and mentions molybdenum, but does not answer the question. lcamtuf on Photodiodes AI overview citing #4, recommending a transimpedance amplifier, but it provides a schematic of a different configuration. "Photodiode – A Beginner’s Guide": A blog-style website with circuits that don't work, are missing important details and have poor explanations. "Photodiode Basics": Ad-ridden page which does include the rough layout of a transimpedance circuit, but with no mention of feedback capacitors. These are often needed to prevent oscillation. "What are the pros and cons for the various photodiode circuit arrangements?": A forum thread that mentions transimpedance amplifiers, but doesn't give any specifics. "Photodiode Component Basics [...] - Youtube": Video with a demonstration of a photodiode working, but without any amplification or readout circuits. AI overview citing #2, but it recommends a bad configuration with a resistor in parallel with the diode. The output is non-linear, high-Z and, difficult to use. "Photodiode – A Beginner’s Guide": Same as google #2. A bad result. "Photo Diode (Symbol, [...] Pros & Cons) Explained - Youtube": Another super generic video. "Fire Detection Circuit Using Photodiode": Content farm video with no schematic and no explanation. "Photodiode Construction and Working - Youtube": Another extremely generic explanation video. Does not include any circuits or even discuss the problem. "Photodiode – A Beginner’s Guide": Same bad article as google's #2. "Photodiode Basics": The same as google's #3: incomplete circuits on an ad-ridden page. "What are the pros and cons for the various photodiode circuit arrangements?": Same as google #4, an unhelpful forum thread. "PHOTODIODE OPERATION MODES AND CIRCUITS": Provides an example of a transimpedance amplifier, but has no example values or instructions on selecting them. " Technical notes / Si Photodiodes ": A PDF from a photodiode manufacturer, which provides practical circuits and a description of photodiode properties. This is the first results that provides enough information to actually build a working sensor. "Photodiode – A Beginner’s Guide": The same as google's #2, meh explanations and some of the circuits don't work. "Photodiode Basics": Same as google's #3: Incomplete circuits on an ad-ridden page. "PHOTODIODE OPERATION MODES AND CIRCUITS": Same as kagi #4. Not good enough to build a working circuit. "A Practical Guide to Photodiode Amplifier Circuit Design [...]": A marketing piece for a equipment manufacturer. Unlike the Hamamatsu appnote, this doesn't have any useful information. " Technical notes / Si Photodiodes ": Same application note as Kagi #5. A good result. "PIN Photodiode gamma detection amplifier circuit - rectangular wave output": Forum post with a broken circuit. Not something you want to copy. "Circuit Diagram": An unrelated forum post about an XKCD comic. "Short Circuit Limiter": Unrelated blog post. "NES Cartridge Chaos: [...]": Unrelated blog post. "How can i increase the range of values that a light sensor gives?" Forum post showing an ok configuration, but with no explanation or information on how values should be selected. It violates the conservation of momentum, because the wing doesn't impart any momentum to the air. Obviously, fans work. Airplanes can fly upside down... which shouldn't be possible if lift is some special property of the wing's shape. Paper or balsa-wood planes with flat airfoils work fine. https://www.youtube.com/watch?v=hnvtstq3ztI : Weighing an airplane as it's flying. AI summary citing a TikTok video which contains the "something something Bernoulli" argument. Not entirely wrong, but needlessly complicated and incomplete. How wings really work : A professor debunking "equal transit" with an experiment... nice, but a debunk is not an explanation. " How Airplane Wings REALLY Generate Lift ": A youtube video with the correct explanation. A good result. "ELI5: how does a wing work? - Reddit": Reddit thread, most comments are correct, but many are repeating the incorrect explanation. " How Wings Work ": A page with a mostly correct animation, but no explanation of what's happening. AI summary stating the incorrect equal-transit explanation. Seems to be referring an an old Glenn Research page with the incomplete explanation. " Airplanes ": A correct article which calls out the incorrect bernoulli argument. A good result. The same correct video from Google #3. "How Airplanes Work: A Simple Explanation for Beginners": A youtube video giving the incomplete explanation. "How Wings Work": Same as google's #5. "How Does A Wing Work? - Science Through Time": AI slop video with an bad answer. I can't tell if this it is the "equal transit" model or the incomplete one, because it doesn't include anything resembling detail or logic. " How Does A Wing Actually Work? ": A Veritasium video on youtube, with the correct explanation. A good result. " How airplane wings work ": A cool video showing airflow over a wing, during normal flight and a stall... but it's not an explanation. " How Does A Plane Wing Work? ": Correct explanation and demo. "How do airplane wings work?": Explains the structural components of a wing, but not why it's able to create lift. "Learn How Airplanes Work": A page that lists the parts of a plane, and gives the incorrect "equal-transit" explanation. How planes work : An article with a brief, but correct explanation. Dynamics of Flight An old article from Glen Research with the "something something Bernoulli" explanation. "How airplane wings actually work - Today Plane crash": AI Slop article, wrong answer. "How wings work": an animation of airflow, but does not have an explanation "How do I explain what makes an airplane fly to a non-technical person?": Forum thread of people asking the same question. A few answers are correct, but a lot aren't. I'll bin it as a bad result. "How do the Americas Cup Yachts sails work?": Forum thread about sailing. "How do I keep my futuristic racing hovercraft from becoming airplanes?": Forum thread about fantasy hovercraft. "How is the fatigue life of an airplane wing flexing during turbulence determined? How do they keep track of it?" Forum thread on accelerated life testing and maintenance of aircraft. "How do you scale a svg img to fit container?" A CSS question that just happens be about an image of an airplane. Incorrect AI summary citing the AI slop in #2. "Comparing Energy Efficiency of Brushless vs. Brushed Motors": Slop blog that claims the high speeds reduce losses in the motor's commutator, which simply isn't true. Commutator losses (arcing) generally increase with rotational speed. "Brushless Vs Brushed DC Motors: When and Why to Choose One Over the Other": AI slop advert. Does not answer the question. "What’s the difference between a brushed and brushless motor, and is one better than the other?": Reddit thread that states that brushed motors are less efficient, but gives no explanation. (also, that's not what the question asked...) "The Advantages of Brushed Motors: Powering the World with Efficiency and Simplicity - Magmotor": AI slop, doesn't answer the question. "Brushed vs Brushless Motor: Key Differences, Performance, and How to Choose": AI slop, doesn't answer question. AI summary citing #2. "Brushed vs Brushless: Unraveling the Mystery of Motor Efficiency": AI slop that doesn't answer the question. It also states that motors produce more power at high speeds, which is true, but doesn't explain the question. At any given voltage, a motor has a torque at which it stalls and a maximum speed that's reached under no load. As you would expect, the motor makes the most power at roughly the half-way point between these two... but the efficiency is best at the extreme end of the speed range. "Comparing the Efficiency of Different Electric Motor Types": AI slop, doesn't answer the question. "Are Brushed DC Motors Still Relevant? Efficiency, Smart Control, and New Applications Explained": More AI slop. Doesn't answer the question. "Brushed vs Brushless Motors: Comparing Efficiency, Lifespan, and Performance Metrics": AI Slop. Doesn't answer question. AI summary citing "Brushed Motors vs. Brushless Motors": Neither answer the question. "Brushed vs Brushless: Unraveling the Mystery of Motor Efficiency": AI slop. "Comparing the Efficiency of Different Electric Motor Types": AI slop. "Brushed vs Brushless Motors: Comparing Efficiency, Lifespan, and Performance Metrics": AI slop. "Are Brushed DC Motors Still Relevant? Efficiency, Smart Control, and New Applications Explained": AI slop. AI summary citing "Brushed DC Motor Theory": A page on a wiki run by Northwestern University. Talks about efficiency being zero under stall — which it is — but that's not what I asked about. "Brushless Vs Brushed DC Motors: When and Why to Choose One Over the Other": Probably human written, but doesn't answer the question, instead comparing two motor designs. (The efficiency curve is similar for both.) "Brushed vs Brushless: Unraveling the Mystery of Motor Efficiency": AI slop. "What’s the difference between a brushed and brushless motor, and is one better than the other?": Forum thread that isn't about the question and doesn't answer it. "Comparing the Efficiency of Different Electric Motor Types": AI Slop. "Why does a Tesla car use an AC motor instead of a DC one?": A Forum thread that doesn't answer the question. Hobby CNC machining and resin casting : Lcamtuf is really good... but this isn't a page about electronics. It does mention motors, but gives no explanation for why there efficiency curve peaks at very high RPMs. CSC 297 Robot Construction: Driving Motors : A long and detailed website, that actually answers the question! The first actually relevant result. "Stepper motor - Wikipedia": Wiki page on a different type of motor. "Brushless vs. Brushed Motors [New for 2026]": AI slop.

0 views
Julia Evans 2 weeks ago

Moving away from Tailwind, and learning to structure my CSS

Hello! 8 years ago, I wrote excitedly about discovering Tailwind . At that time I really had no idea how to structure my CSS code and given the choice between a pile of complete chaos and Tailwind, I was really happy to choose Tailwind. It helped me make a lot of tiny sites! I spent the last week or so migrating a couple of sites away from Tailwind and towards more semantic HTML + vanilla CSS, and it was SO fun and SO interesting, so here are some things I learned! As usual I’m not a full-time frontend developer and so all of my CSS learning has happened in fits and starts over many years. When I started thinking about structuring CSS, I was intimidated at first: I’m not very good at structuring my CSS! But then I started reading blog posts talking about how to structure CSS (like A whole cascade of layers or How I write CSS in 2024 ) and I realized a couple of things: For example, Tailwind has: I’m going to talk about a few aspects of my CSS codebase and my thoughts so far what kind of rules I want to impose on the codebase for each one. Some of them are copied from Tailwind and some aren’t. I just copied Tailwind’s “ preflight styles ” by going into and copying the first 200 lines or so. I noticed that I’ve developed a relationship with Tailwind’s CSS reset over time, for example Tailwind sets on every element (which means that an element’s width includes its padding): I think it would be a real adjustment for me to switch to writing CSS without these, and I’m sure there are lots of other things in the Tailwind reset (like ) that I’m subconsciously used to and don’t even realize are there. This next part is the bulk of the CSS! The idea here is to organize CSS by “components”, in a way that’s spiritually related to Vue or React components. (though there might not actually be any Javascript at all in the site) Basically the idea is that: So editing the CSS for one component won’t mysteriously break something in another component. And probably like 80% of the CSS that I would actually want to change is in various component files, so if I’m editing a 100-line component, I just have to think about those 100 lines. It’s way easier for me to think about. For example, this HTML might be the “component”. And the CSS looks something like this, using nested selectors: I haven’t done anything programmatic (like web components or @scope ) that ensures that components won’t interfere with each other, but just having a convention and trying my best already feels like a big improvement. Next: conventions to maintain some consistency across the site and keep these components in line with each other! has a bunch of variables like this which I can use as necessary. Colour is really hard and I didn’t want to revisit my use of colour in this refactor, so I left this alone. The only guideline I’m trying to enforce here is that all colours used in the site are listed in this file. One thing I appreciated about Tailwind was that if I wanted to set a font size, I could just think “hm, I want the text to be big”, write , and be done with it! And maybe if it’s not big enough I’d use or instead. No trying to remember whether I’m using or or . So I defined a bunch of variables, taken from Tailwind, like this: Then if I want to set a font size, I can do it like this. It’s a little more verbose than Tailwind but I’m happy with it for now. There are some things like buttons that appear in many different components. I’m calling these “utilities”. I copied some utility classes from Tailwind (like for things that should only appear for screenreader users). This section is pretty small and I try to be careful about making changes here. “base” styles are styles that apply across the whole site that I chose myself. I have to keep this section really small because I’m not confident enough to enforce a lot of styles across the whole site. These are the only two I feel okay about right now, and I might change the one: I think for the base styles it’s going to be easiest for me to work kind of bottom up – first start with almost nothing in the base styles, and then move some styles from the components into base styles as I identify common things I want. I haven’t completely worked out an approach to managing padding and margins yet. I’m definitely trying to be more principled than how I was doing it in Tailwind though, where I would just haphazardly put padding and margins everywhere until it looked the way I wanted. Right now I’m working towards making the outer layout components in charge of spacing as much as possible. For example if I have a with a bunch of children that I want to have space between them, I might use this to space the children evenly: Some inspiration blog posts: The way I was doing responsive design in Tailwind was to use a lot of media queries. Tailwind has this syntax that means “apply the style at sizes or larger”. I’m trying something pretty different now, which is to make more flexible CSS grid layouts that don’t need as many breakpoints. This is hard but it’s really interesting to learn about what’s possible with grid, and it’s a good example of something that I don’t think is possible with Tailwind. For example, I’ve been learning about how to use to automatically use 2 columns on a big screen and 1 column on a small screen like this: I also used a lot which is an amazing feature that I don’t think you can use with Tailwind. Some inspiration: In development, I don’t need a build system: CSS now has both built in import statements, like this: and built in nested selectors, like this: If I want, I can use to bundle the CSS file for production. That looks something like this. Even though I usually avoid using CSS and JS build systems, I don’t mind using esbuild (which I wrote about in 2021 here ) because it’s based on web standards and because it’s a static Go binary. A few people asked why I was migrating away from Tailwind. A few factors that contributed are: While doing this I learned about a lot of CSS features that I didn’t use but am curious about learning about one day: I still feel happy that I started using Tailwind, even if I’m moving away from it now. I learned a lot from using it and I can still use some parts from it in my sites even after deleting . Thanks to Melody Starling who originally designed and wrote the CSS for wizardzines.com , everything cool and fun about the site is thanks to Melody. Also I read so many incredible blog posts about CSS while working on this (from CSS Tricks , Smashing Magazine , and more), I’ve tried to link some of them throughout this post and I really appreciate how much folks in the CSS community share their practices. Every CSS code base has a bunch of different things going on (layouts! fonts! colours! common components!) It’s extremely useful to have systems or guidelines to manage each of those things, otherwise things descend into chaos Tailwind has systems for some of these, and I already know those systems! Maybe I can imitate the systems I like! a reset stylesheet a colour palette a font scale utility classes responsive design the build system Each “component” has a unique class The CSS for one component never overrides the CSS for any other component Each component has its own CSS file the owl selector “no outer margin” A responsive grid layout with no media queries from CSS Tricks Tailwind has become much more reliant on a build system since 2018, I think it’s impossible (?) to use newer versions of Tailwind without using a build system. So I’ve been using Tailwind v2 for years. (there’s also litewind apparently) It’s always been true that you’re supposed to use Tailwind with a build system, but I’ve never really done that, so I have 2.8MB files in a lot of my projects and it feels a little silly. I’m a lot better at CSS than I was when I started using Tailwind Ultimately Tailwind is limiting: if you want to do Weird Stuff in your CSS, it’s not always possible with Tailwind. Those limits can be extremely useful (a lot of this post is about me reimplementing some of Tailwind’s limits!) but at this point I’d like to be able to pick and choose. I ended up with sites that mixed both vanilla CSS and Tailwind in the same project and that was not fun to maintain I got curious about what writing more semantic HTML would feel like. (from A Whole Cascade of Layers ) container queries

0 views
Susam Pal 2 weeks ago

Commenting Guidelines

When commenting on this website, please keep the following points in mind: You may include HTML or Markdown in your comment. Comments are converted to HTML and sanitised before they are published on this website. All submitted comments are held for review. Whether a comment is published or not is at the discretion of the author of this website. Typically, only the following types of comments are published: Generally, rants are not published, even when the post you are commenting on is itself a rant. This website is the author's place to rant. It is not your place to rant. If you really need to rant, please do so on your own website. This guideline exists to maintain a high signal-to-noise ratio in the comments section. All comments deemed suitable for this website by its author become publicly available on this website at two places: on the comment page for the article you commented on ( example ) and on the overall comment index page at comments . Read on website | #meta You may include HTML or Markdown in your comment. Comments are converted to HTML and sanitised before they are published on this website. All submitted comments are held for review. Whether a comment is published or not is at the discretion of the author of this website. Typically, only the following types of comments are published: Comments that add new information or insight to the topic discussed in an article. Comments that provide a neutral, supporting or opposing viewpoint. Comments that report typos, errors or bugs on the website. Comments that contain good humour. Comments that express appreciation. Generally, rants are not published, even when the post you are commenting on is itself a rant. This website is the author's place to rant. It is not your place to rant. If you really need to rant, please do so on your own website. This guideline exists to maintain a high signal-to-noise ratio in the comments section. All comments deemed suitable for this website by its author become publicly available on this website at two places: on the comment page for the article you commented on ( example ) and on the overall comment index page at comments . Do not submit sensitive personal data in your comments.

0 views
iDiallo 2 weeks ago

Software Engineers are Obsolete

In my first interview for a developer position, I shared a link to my personal project with the interviewer. It was a website for learning how to program. I created it from the ground up. I built the PHP app, designed the database schema, made a nice design to tie it all together. I wrote down my process, and it became the first tutorial on the site. Then I collected tutorials from all over the web and displayed them on my website, which acted as a portal. There was a section for PHP tutorials, for Ruby on Rails, for .NET, etc. Each one individually curated by me. My interviewer was so impressed. I got the job. Later, I added a section where anyone could submit their own tutorials. It was fascinating how quickly people found my website and started submitting links. The tutorials were coming in so fast that I removed the verification system and let people upload links directly. But then my mind wandered. What if I start a blog? Yes, I had another blog before this one. I built an entire blog engine from scratch. A colleague found my blog. He was so excited that he shared his own with me. At lunch, we would discuss ideas, and that same evening after work, we would buy a domain name and start a new project. We shared tips and tricks on how to rank on Google. We had a skill, being web developers, and we took full advantage. When we had an idea, we would fire up our computers that same night and build it. Friends and family would come to us for validation. We were the ultimate deciders of what was a good idea and what was a bad one. We were the gatekeepers. We knew how to program, and nobody outside our circle could say otherwise. Now, friends and family don't come to us anymore. They go straight to ChatGPT, and it tells them their idea is brilliant . They launch their favorite AI agent, which builds their entire product from a single prompt. Some of them even manage to host it on the web, accessible to the world, and they are seeing their first customers. People who used to confuse Java with JavaScript now tell me they have a platform. People who don't even know what programming is are standing at the forefront of software innovation, advocating, evangelizing, and making money. This skill I spent years honing has been made obsolete by everyday people. We, the developers, are no longer the gatekeepers. In fact, now we need to keep up or risk being left behind. Some commenters online tell me I'm just jealous, that I need to embrace progress. I don't want to be obsolete. I'm on openclaw, moltenclaw. I have accounts on all the video generation websites. I have accounts on ChatGPT, Claude, Gemini, and Mistral. Just as I'm getting a hang of one tool, my friend who works in a warehouse tells me, "just use Perplexity for that." But Perplexity isn't enough, because another friend says GenSpark is better. For some reason I can't sign into my Manus account anymore. And apparently, to get the most out of it, I need to get Meta Ray-Bans. Everyone is empowered, no one needs me, and that's that. The developer is now obsolete. But then, I opened LinkedIn. My peers, fellow developers who for some reason all have the word "AI" in their job title, are saying the opposite. "Developers are not losing their jobs to AI," they say. "Developers are losing their jobs to other developers who use AI." They are vibe-coping to the max. The history of technology has always been a story of nearly missing out. I remember another job I applied for and totally didn't get. The company had moved all their client-facing apps to Silverlight. If you're wondering what Silverlight is, you might understand why I chuckled when the interviewer described their plight: they were struggling to find developers to help them migrate to HTML and JavaScript. I'm fairly sure that chuckle is why they never called me back. It's one thing to embrace new technology. It's another thing entirely to put all your eggs in one basket. Companies are betting everything on Silverlight. Sorry, I mean AI. Without thinking through what happens if things don't pan out. AI has lowered the barrier to entry. That's a good thing. More people can now bring a fresh pair of eyes to the software engineering field. But there's a problem. Those new entrants won't become better engineers over time. Why? Because they are not writing code, not reading code, not debugging code. Their growth path, with time and experience, is to become better prompters. What this means is that, amid all the noise, my role as a software engineer may seem obsolete. But in the long run, we will be back to square one, where engineers writing code with their own meatware will hold all the cards. These are the people who learned the hard way: by reading documentation, by debugging broken apps, by having their seemingly perfect Stack Overflow question closed as a duplicate. These are the engineers who will hold the keys to software. Not because they're guarding secrets, there are no secrets. It's simply that the new developer is not, and will never be, interested in learning. While we pride ourselves on producing more software than ever, it doesn't take long to realize that software is never truly finished at delivery. It has to be maintained. It's strange, computers whose entire purpose is to repeat the same process over and over, perfectly, somehow manage to degrade over time. My tutorial website, seemingly working fine, returned an error when I visited it after months of neglect. I restarted all the services and brought it back up. It was now full of spam and NSFW URLs. An application that worked perfectly yesterday is broken today. It could be a memory leak, unexpected input, or just users with fat fingers. Your completed application is suddenly incomplete, and you have to fix it. In an ideal world, we wouldn't keep producing more software. We would have working software, and less of it to maintain. AI thrives on quantity. If you need me, I'll be in the back, patiently waiting for you to realize you can't prompt your way out of a Silverlight migration. My rates just doubled.

0 views