Latest Posts (20 found)
Xe Iaso -1 days ago

Giving your Go apps Tigris superpowers

Tigris is S3-compatible, which means you can point the AWS SDK at it and most things just work. The catch is that the Tigris-exclusive features—bucket forking, snapshots, object renaming, and the like—need verbose workarounds because the AWS SDK doesn't know they exist. So we wrote a Go SDK that does. It comes in two flavors: the package is a drop-in replacement for the standard S3 client with first-class methods for the Tigris-specific operations, and is a higher-level client for the common single-bucket case that infers its configuration from the environment so you stop passing the same parameters over and over. You can adopt the Tigris features incrementally without refactoring your existing S3 code, and the simpler API still works against other S3-compatible providers. I wrote up how it works and why we built it over on the Tigris blog.

0 views

First Impressions of the Fitbit Air

A little over a week ago I took delivery of my new Fitbit Air, so I thought I'd jot down some of my thoughts after using it every day to track my health. I recently started running again , for which I use my Suunto Run to track. I've had it for a little while now and it tracks all my walks and runs. It's pretty good, but I wanted something that I could wear along side a proper watch, so it needed to have no screen and just silently track my health as I'm not interested in replacing a proper watch with a wrist phone . The Whoop band was an obvious contender, but the £200+ per year subscription that leaves me with a brick if I ever cancel is a deal-breaker and there was nothing else that I could find on the market...that was until the Fitbit Air came along. Fitbit Air on one wrist, watch on the other The Fitbit Air costs £85 (~$100) and unlike the Whoop, is a one-off purchase. You also get 3 months of Fitbit Premium, which basically adds Gemini to the app to help provide context, motivation, and workout schedules. After the 3 month freebie it's $10/month, but crucially the device and app work fine without Premium. You just don't get the "AI Coach" which is probably a positive for lots of people. 🙃 I already have a Gemini subscription that gives me access to Fitbit Premium, so I get it with no extra cost anyway. Although the "Coach" has made basic mistakes a few times - like referring to my Suunto watch as a set of smart scales, or incorrectly stating I'd done a 10km run instead of a 5km one - generally speaking I've found the extra context and advice it gives to be very useful. It has helped me to tweak some of my strength sessions and improve my form while running. My hope is that the basic mistakes the AI is making is down to teething problems. If so, I'd like to think they will improve with time. Like anything AI generated though, it's important to not take the feedback and advice it gives as gospel. Whenever it's made mistakes and I've called it out, it's always responds with the correct data and context afterward. Most of the time I don't even notice I have the Air on. It's so small and light - it just chugs away in the background, doing its thing. It's also about half the width of the Whoop. I bought the rubber strap for mine too, which is more comfortable while running, and less absorbent than the standard canvas strap, so hopefully no sweat will sink into it. The OEM straps are super expensive though, so I'm looking forward to aftermarket ones becoming available. Google advertise the Air as having a 1 week battery life. I can attest to that - I'd easily get a week out of this. It's also super quick to charge. Earlier in the week I was down to around 30% battery, so I chucked it on charge while I jumped in the shower. 20 mins later when I put it back on, it was nearly fully charged. This is great news as I'll be able to keep it topped up when I shower, then pop it back on when I go to bed so it can track my sleep. This is the major downside to all this - Fitbit are owned by Google, so they're likely to use the data in all kinds of unscrupulous ways. But the way I'm looking at it is that the gamification, the data, and the motivation that this little thing provides is helping me to get out and exercise. That's because I love data, so being able to review it all after my workouts, and see progress is hugely motivating. So if it helps me to get fit, and stay fit, it's a price I think I'm willing to pay. I'm kind of at the point in my life now where I just want things to work for me. If there's tradeoffs, so be it. Anything for an easy (and healthier) life. This post was a little all over the place. But overall, I really like the Fitbit Air. The data is keeping me motivated, and although the AI Coach makes mistakes, it is helping me navigate the data and improve my training, so I'll take that as a win. For me, it's an easy decision between this and the Whoop. The Fitbit wins out. Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views
Unsung Today

“Artifacts from a strange moment”

Welcome to another Super Mario Sunday! This is an 11-minute video from gruz talking about the fascinating world of South Korean bootleg Marios, such as Super Boy, Super Bros World, and Super Bio Man – existing solely because of Korea’s subpar copyright law of that era: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/artifacts-from-a-strange-moment/yt1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/artifacts-from-a-strange-moment/yt1.1600w.avif" type="image/avif"> In short: The code was copyrighted, but the IP was not, so many companies rebuilt Mario for the dominant game console of the region, in the process stripping it of all of the original game’s actual craft – with “levels feeling assembled rather than built” and “getting the [visuals] right and missing almost everything underneath” – and as such become interesting as a reflection of the details that actually made Mario great. However, as the time moves on, some of the bootleg games actually get better and better, and come into their own. It’s interesting to compare this to Nintendo’s own “clone” I mentioned before . What I wouldn’t give for some oral history of what looks like an absolutely fascinating time and place for software. #craft #details #games #history #super mario bros #youtube

0 views

Thoughts on starting new projects with LLM agents

A few months ago I wrote about using LLM agents to help restructuring one of my Python projects . It's worth beginning by saying that the rewrite has been successful by all reasonable measures; I've been able to continue maintaining that project since then without an issue. In this post, I want to discuss another project I've recently completed with significant help from agents: watgo . In this project many things are different; most notably, it's a from-scratch project rather than a rewrite, and it uses a different programming language (Go). This post describes my experience working on the project, and some lessons learned along the way. This is a new project, so it required extensive design. I began by iterating on the design with the agent, with a sketch of the API. For this purpose, I recommend using a Markdown file committed into the repository for future reference. After that, I started asking the agent to write CLs [1] in a logical order that made sense to me, keeping them small and reviewable (more on this in the next section). Sometimes it's not easy to have a small CL, and multiple rounds of revision may confuse the agent; in this case, I commit the CL and then go back and ask the agent to modify or refactor the code, as much as needed, with separate CLs. In the worst case, the whole sequence can be reverted if I feel we've taken the wrong direction (branches could also be helpful here for more complicated scenarios). This point is worth reiterating: sometimes a single CL is a huge step forward, but requires lots of review, cleanup and refactoring to be viable. I've had multiple instances where an agent produced several days of work in a single CL, but I then spent hours instructing it to clean up and refactor. Overall, it's still a productivity gain, just not as much as some pundits would like us to believe. Given the current state of agent capabilities, I think it's worth splitting projects into two categories: The watgo projects is a clear example of (2): I certainly intend to maintain this project in the long term, so I insist on code that I understand. With very few exceptions, no code gets in without full review and often multiple rounds of revisions. Even if the cost for writing code went down, maintaining a project is so much more than that. It's triaging and fixing bugs, it's thinking through what needs to be done rather than how to do it, it's keeping the code healthy over time, and so on. As Brian Kernighan said : Maybe at some point agents will become good enough that projects in category (2) can be implemented and maintained completely autonomously. Maybe. But we're certainly not there yet. My hunch is that getting there will require crossing the AGI line [2] , after which little in our world remains certain. If you're using an agent to send an actual PR and only review that , it's difficult to be disciplined enough to actually perform a thorough review. I find the following method to be more reliable: I use a CLI agent running locally in my repository, and ask it to update the code there. In parallel, I have a VSCode window open in the same project, where I can: Once I'm pleased with the change, I manually create a commit. As mentioned above, it's imperative to keep making progress in small chunks, with small enough CLs that a human can fully understand in a single review. It's very tempting to sprint ahead submitting thousands of lines of code every day, but this temptation has to be avoided. Coding with an agent is like speed-reading; yes, you're making more progress, but comprehension suffers the faster you go. Particularly for refactoring, agents still take the shortest route to destination. It's important to guide them to think about the "big picture" at all times, find all instances where X is better done as Y, not just a single place noticed during a review. This is why it's sometimes OK to have a CL submitted before you fully agree with everything, and go back to it later for several refactoring rounds. Source control works amazingly well when pair-coding with agents. It's a key point discussed in every "how to succeed with AI" article, but still critical enough to reiterate here: a solid testing strategy is absolutely crucial for success. Agents produce - by far - the best results when they have a solid test suite to test their code against. With the pycparser rewrite, I had a large existing test suite. For watgo , the very first thing I did was think through how to adapt the test suites of the WASM spec and of the wabt project for my needs. If your project doesn't have such tests to rely on, this should be your first order of business - finding one, or building one from scratch. Beware of self-reinforcing loops though; it's dangerous to trust agents for both the tests and the implementations tested against them. Go is a fantastic language for agents to write, because it's designed to be very readable by humans. The biggest strengths of Go are exactly what makes the experience of reviewing agent code so positive: Since most of the time spent by humans when using agents is reading rather than writing code, these effects compound and produce a great experience. Recall the discussion of how some languages are optimized for writability (Perl) while others are optimized for readability (Go)? Well, when working on a project with an agent we live in a world of 99% reading vs. 1% writing, so this really matters. I find this aspect really crucial in light of the earlier points made in this post - namely, keeping the human in the loop by understanding and reviewing all of the agent's design choices and code. If you're working on a subject that's completely new to you, I would strongly recommend against the approach described in this post. To really learn something, you have to work through it from scratch, yourself, reading, designing, writing the code. Agents don't change this basic fact; even before agents, if you wanted to learn X, copying it from Stack Overflow or some other project clearly wasn't the right way to go. Similarly, while agents can be used as a prop for learning, they cannot learn for you . As a corollary, junior engineers should exercise extreme caution when relying on LLMs. There's no replacement to hard-won experience and the sweat and tears of learning new, challenging topics. Learning is supposed to be hard; if it's too easy, you're probably not learning. For senior engineers, agents are a boon; it's a great tool to increase productivity, avoid the boring stuff, and get unstuck from procrastination; but only when used judiciously. Low importance / prototype / throw away projects where deep code understanding is unnecessary. These can be "vibe-coded" (submitting agent code without even reviewing it). High importance projects that I actually want to maintain; here, vibe-coding is ill advised and I insist on reviewing and guiding all code the agent writes before it's submitted (or shortly after, as discussed above). Review the agent's changes using VSCode's diff view Make my own tweaks and code changes if needed Go changes very infrequently, so you don't have to wonder "are we using the most modern / idiomatic approach" or "what the hell is this construct" as often as with other languages (looking at you, Python and TypeScript). There are relatively few ways to accomplish the same thing in Go, further lowering the mental burden. The standard library is rich and there's much less need to keep abreast of the package-everyone-uses du jour. In general, Go is designed for readability, with a mild-but-still-strong type system, uniform formatting, explicit error propagation and opinionated choices already made for you.

0 views
Unsung Today

Face with symbols over mouth, apparently

A nice moment in the iOS emoji keyboard – after selecting an emoji from the grid, its name shows up for a second: I have small reservations here, as reusing a placeholder like this trips up my “ this is cheap ” alarm. But otherwise I like that this – just like keyboard shortcuts in menus or tooltips – ambiently teaches you the alternative representation of the emoji that you can use later to get to it faster. (Another way of looking at it: This is a tooltip in a place where tooltips cannot exist.) #search #touch #typography

0 views
Unsung Today

“-4.5° rather than -45°?”

Two nice follow-ups to topics we covered before. In February, Nobert Heger did some analysis of precisely which pixels in Tahoe are intercepted by mouse when trying to resize a window. In April, Steve Ruiz, author of tldraw, did this more extensively for all the drawing apps like Canva, Figma, Illustrator, and so on: When a user has one or more shapes selected, we display an interactive overlay that allows the user to transform their selection: a drag inside the box will translate the selection; a drag on the edges will resize along that axis; a drag from the corner will resize along both axes; and a drag from further out on the corners will rotate the selection. Like many features in tldraw, my design here was meant to follow the conventions of design tools. This meant a broad survey of other applications, both new and old, reconciling differences between them, and picking a design that I felt best served the user while remaining conventional. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/4.5-degrees-rather-than-45-degrees/1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/4.5-degrees-rather-than-45-degrees/1.1600w.avif" type="image/avif"> Remember the “ if you put the Apple icon in reverse ” joke from January? Last month, Jim Nielsen on his blog pulled on that thread and showed a few more examples : = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/4.5-degrees-rather-than-45-degrees/2.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/4.5-degrees-rather-than-45-degrees/2.1600w.avif" type="image/avif"> Some 3rd-party apps continue to fight a good fight, even as Apple’s definition of what an icon should be — or what’s even possible — shrinks all around them. One finding from this blog post for me was that things changed. In Big Sur, the squircle form factor was encouraged, but not enforced . Well, it is enforced now , when even shapes very similar to the squircle are now inside “the gray box of hell”: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/4.5-degrees-rather-than-45-degrees/3.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/4.5-degrees-rather-than-45-degrees/3.1600w.avif" type="image/avif"> These gray boxes are not some pedestal for icons. They’re the actual icons. Anyway, I always appreciate efforts of people methodically documenting things so we can all learn and notice patterns and/or continue the work from the best possible starting point. #iconography #interface design

0 views

i made a ring!

Not a webring - maybe some day - but an even cooler one! My wife and I went to the jewelry workshop our friends gifted us for our wedding last year. I had a moonstone at home I wanted to turn into a ring (yes, I do love big chunky gems as rings, as you might have already guessed from other pics of my hands, especially the aura quartz on my thumb), and she spontaneously made a necklace with a Cthulhu game symbol on it (Protection from Elder Gods). This is the loose stone. I bought that one together with a few others in 2019 in a display case, but soon that display case kinda feel apart (honestly paid way too much for it) so they are now loose in my other gem collection. This one was nicely vertical and so good for a chunky ring. If you are unfamiliar with moonstones , they are comparable to labradorites in the sense that tilting them around in light produces amazing color. This was the workstation: First step was to decide how thick the ring band was supposed to be, both the actual material thickness and how wide it's supposed to be. Then in a second step, what ring size, and how long the band needs to be. I wanted to comfortably wear it on my middle finger and possibly some others. My fingers can swell quite a bit since my months-long stint with Prednisone, so I went with a comfortable 61mm; you can always make it smaller anyway. So, cutting and measuring the right length, then slowly twisting the silver and aligning the ends so that they are really tight, tensing and pushing against each other. Was harder than I thought. It didn't need to be perfectly round yet, just aligned. Then it gets fired up with two pieces of silver to weld it all together. Next step was measuring around the stone to know how long the metal actually surrounding and holding the stone needs to be (the bezel). First used a paper strip, then used that to measure some silver, and twisted it through a machine to make it thinner and longer. Later, I hammered the ring into shape and polished the edges: The seam could be nicer, but to be fair, you don't see it now as that is where the stone is put on top of. Now I had to bend the bezel piece into the shape of the stone. That was by far the worst, and I just couldn't manage it on my own; when I had one part aligning correctly with the stone, the other would be bent out of shape again. And the second I lifted the stone away from the silver, I forgot what I needed to adjust again and how much. My wife had to help me with that. Now that piece needs a bottom. (don't we all) It gets put on a silver sheet, welded together, and then you cut the extra off and spend a fuckton of time sanding down the edges. Excuse the hands, this stuff is rough work, and I really should have gotten my nails done by now. The ring gets sanded down where the seam is so there is a flat surface that the bezel fits onto. Gets welded together again with a few pieces of silver. Next step is sanding down the top edges and if needed, shortening the bezel a little so not too much gets covered, then getting the stone in. Finally, you use a hammer and a thin chisel-like thing to collapse the bezel walls onto the stone so it is kept in place. Final result: Definitely not a ring for daily wear, but can be nice as a statement piece. :) You might notice that this isn't the side of the stone that was showing in all the other pictures. That is because we welded the bezel bottom onto the wrong side and didn't notice for far too long. But it's fine, because the other side works too, phew. Here's some progress shots of what my wife made: And the final result: All in all, I'm glad we did that, and very thankful to our friends who made it possible! But I suck really badly at this, was close to tears multiple times, and I am glad it is over and we both got something we like out of it. We ended the trip with a visit to a cafe. Reply via email Published 06 Jun, 2026

0 views
Kev Quirk Yesterday

📝 2026-06-06 18:02

Been doing some work on the website to bring back post types on the site properly (something I haven't had since switching from Kirby). I now have articles, links, notes and books, all on the homepage and filterable. Hopefully nothing is too messed up in the RSS feed! 😟 Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views
Ahead of AI Yesterday

LLM Research Papers: The 2026 List (January to May)

As some of you know, I have the long-running habit of keeping a running list of research papers I want to read, revisit, or cite in future articles and projects. Last year, I shared two organized paper lists, one covering January to June and another one covering July to December. Several readers told me that these lists were very useful, so, in a similar spirit, I prepared a new list for the first half of 2026. This one covers papers I bookmarked from January through May 2026. Please do not treat this as a complete list of everything published this year. There are so many papers published every day that this would be totally infeasible. Instead, this is a curated reference list based on papers I found interesting or relevant for my own work. I went through the titles, abstracts, and topic framing carefully while organizing the list, but I have to admit that I also only read a subset of the papers in detail. Why make these lists in the first place? When I work on an article, book section, code example, or lecture, I often remember that I saw a relevant paper somewhere, but finding it again can be surprisingly annoying. A categorized Markdown list solves that problem for me, and I hope it is useful to you as well. (Even in the era of LLM-based web searching, having a specific context list is pretty useful, still.) This year, the list is again heavy on reasoning models, reinforcement learning, and efficient inference, because I am biased towards bookmarking papers that are related to things I am currently working on. However, compared with the 2025 lists, I also bookmarked more papers around agent harnesses, tool use, long context, diffusion language models, and practical serving infrastructure, because that’s what I am currently pretty involved in and where the field is headed. The categories for this research paper list are as follows. (Pro tip: In the web version of this article, you can use the table of contents on the left to jump directly to the sections that are most relevant to you.) Architecture and Model Design Efficient Training and Scaling Inference Efficiency and KV Cache Sparse Attention and Long Context Reasoning and Test-Time Compute Reinforcement Learning and RLVR Agent Systems and Tool Use Coding Agents and Software Engineering Diffusion Language Models Model Evaluation and Benchmarks This first section collects papers on model architecture, model-release technical reports, and papers that help explain why current LLMs look the way they do. One thing I find interesting about 2026 so far is that architecture work goes beyond making transformers larger. There is a lot of work around hybrid architectures (for example, Nemotron 3 , and Arcee Trinity ), state space layers ( Nemotron 3 and Mamba-3 ), MoE capacity allocation ( Scaling Embeddings Outperforms Scaling Experts , and Step 3.5 Flash ), activation behavior ( The Spike, the Sparse and the Sink ), and representation geometry ( Symmetry in Language Statistics Shapes the Geometry of Model Representations ). All of these papers are quite interesting, which is why I bookmarked them in the first place. But if I had to pick one must-read, I’d probably be Nemotron 3 Super, because the article is super detailed (no pun intended), and it describes techniques used in a model that is already in production. And it’s one of the best models in its size class after all. One of the interesting aspects of Nemotron 3 is its hybrid-architecture design, meaning that it alternates between regular attention layers and Mamba-2 (state space model) layers to be more efficient at long contexts. In 2026, long-context efficiency is king as more and more LLMs get plugged into agent harnesses (OpenClaw etc.), which requires working with longer and longer contexts. That being said, 120B-A12B may be a bit too large for local inference on regular consumer hardware, but there is a Nemotron 3 Nano (4B) version as well. Figure 1: Architecture of Nemotron-3 Super, which is a hybrid architecture using Mamba-2 layers. Note that 2 days ago, Nvidia also released a scaled up-version of this, Nemotron 3 Ultra (550B-A55B), which scales the embedding and projection dimensions but otherwise uses the same building blocks. If you are interested in a visual, I posted about it on Substack Notes here . This hybrid-architecture trend with alternating attention and alternative layers is a relatively popular development this year. The probably most popular open-weight LLM series that uses a similar hybrid design is probably Qwen3.6, which uses Gated DeltaNet layers instead of Mamba-2 layers for the non-attention portions. For more information, see my Hybrid Attention ( https://sebastianraschka.com/llm-architecture-gallery/hybrid-attention/ ) write-up, which pools information from several of my previous substack articles where I wrote about these. Also, in the paper list below, you may notice that there is now a Mamba-3 and Gated DeltaNet-2 (i.e., newer versions of Mamba-2 and GatedDeltaNet), and it will be interesting to see those in the upcoming open-weight LLMs (e.g., Nemotron-4 and Qwen4?). Next to describing the hybrid-architecture design, the Nemotron-3 paper contains a whole lot of other interesting ablations, for example, around multi-token prediction for speculative decoding, NVFP4 pretraining versus BF16, synthetic MMLU-style data, and post-training quantization recipes, but covering these in detail would be out of scope for this overview. 1 Jan, Deep Delta Learning, https://arxiv.org/abs/2601.00417 6 Jan, MiMo-V2-Flash Technical Report, https://arxiv.org/abs/2601.02780 13 Jan, Ministral 3, https://arxiv.org/abs/2601.08584 29 Jan, Scaling Embeddings Outperforms Scaling Experts in Language Models, https://arxiv.org/abs/2601.21204 30 Jan, LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs, https://arxiv.org/abs/2602.00462 4 Feb, ERNIE 5.0 Technical Report, https://arxiv.org/abs/2602.04705 8 Feb, ViT-5: Vision Transformers for the Mid-2020s, https://arxiv.org/abs/2602.08071 (Most of this article is LLM-focused, but I couldn’t resist to include a new major vision transformer design.) 11 Feb, Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters, https://arxiv.org/abs/2602.10604 12 Feb, Nanbeige4.1-3B: A Small General Model That Reasons, Aligns, and Acts, https://arxiv.org/abs/2602.13367 16 Feb, Symmetry in Language Statistics Shapes the Geometry of Model Representations, https://arxiv.org/abs/2602.15029 17 Feb, GLM-5: From Vibe Coding to Agentic Engineering, https://arxiv.org/abs/2602.15763 18 Feb, Arcee Trinity Large Technical Report, https://www.arxiv.org/abs/2602.17004 4 Mar, The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks, https://arxiv.org/abs/2603.05498 12 Mar, Tiny Aya: Bridging Scale and Multilingual Depth, https://arxiv.org/abs/2603.11510 15 Mar, Attention Residuals, https://arxiv.org/abs/2603.15031 16 Mar, Mamba-3: Improved Sequence Modeling Using State Space Principles, https://arxiv.org/abs/2603.15569 31 Mar, Attention to Mamba: A Recipe for Cross-Architecture Distillation, https://arxiv.org/abs/2604.14191 13 Apr, Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning, https://arxiv.org/abs/2604.12374 6 May, ZAYA1-8B Technical Report, https://arxiv.org/abs/2605.05365 13 May, Delta Attention Residuals, https://arxiv.org/abs/2605.18855 21 May, Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention, https://arxiv.org/abs/2605.22791 25 May, The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence, https://arxiv.org/abs/2605.26494 This section is about training systems, adaptation methods, and scaling recipes. These papers are not (all) about pre-training from scratch. Some focus on fine-tuning, distillation, test-time training, or making training work better on constrained hardware. Architecture and Model Design Efficient Training and Scaling Inference Efficiency and KV Cache Sparse Attention and Long Context Reasoning and Test-Time Compute Reinforcement Learning and RLVR Agent Systems and Tool Use Coding Agents and Software Engineering Diffusion Language Models Model Evaluation and Benchmarks hybrid architectures (for example, Nemotron 3 , and Arcee Trinity ), state space layers ( Nemotron 3 and Mamba-3 ), MoE capacity allocation ( Scaling Embeddings Outperforms Scaling Experts , and Step 3.5 Flash ), activation behavior ( The Spike, the Sparse and the Sink ), and representation geometry ( Symmetry in Language Statistics Shapes the Geometry of Model Representations ). Figure 1: Architecture of Nemotron-3 Super, which is a hybrid architecture using Mamba-2 layers. Note that 2 days ago, Nvidia also released a scaled up-version of this, Nemotron 3 Ultra (550B-A55B), which scales the embedding and projection dimensions but otherwise uses the same building blocks. If you are interested in a visual, I posted about it on Substack Notes here . This hybrid-architecture trend with alternating attention and alternative layers is a relatively popular development this year. The probably most popular open-weight LLM series that uses a similar hybrid design is probably Qwen3.6, which uses Gated DeltaNet layers instead of Mamba-2 layers for the non-attention portions. For more information, see my Hybrid Attention ( https://sebastianraschka.com/llm-architecture-gallery/hybrid-attention/ ) write-up, which pools information from several of my previous substack articles where I wrote about these. Also, in the paper list below, you may notice that there is now a Mamba-3 and Gated DeltaNet-2 (i.e., newer versions of Mamba-2 and GatedDeltaNet), and it will be interesting to see those in the upcoming open-weight LLMs (e.g., Nemotron-4 and Qwen4?). Next to describing the hybrid-architecture design, the Nemotron-3 paper contains a whole lot of other interesting ablations, for example, around multi-token prediction for speculative decoding, NVFP4 pretraining versus BF16, synthetic MMLU-style data, and post-training quantization recipes, but covering these in detail would be out of scope for this overview. 1 Jan, Deep Delta Learning, https://arxiv.org/abs/2601.00417 6 Jan, MiMo-V2-Flash Technical Report, https://arxiv.org/abs/2601.02780 13 Jan, Ministral 3, https://arxiv.org/abs/2601.08584 29 Jan, Scaling Embeddings Outperforms Scaling Experts in Language Models, https://arxiv.org/abs/2601.21204 30 Jan, LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs, https://arxiv.org/abs/2602.00462 4 Feb, ERNIE 5.0 Technical Report, https://arxiv.org/abs/2602.04705 8 Feb, ViT-5: Vision Transformers for the Mid-2020s, https://arxiv.org/abs/2602.08071 (Most of this article is LLM-focused, but I couldn’t resist to include a new major vision transformer design.) 11 Feb, Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters, https://arxiv.org/abs/2602.10604 12 Feb, Nanbeige4.1-3B: A Small General Model That Reasons, Aligns, and Acts, https://arxiv.org/abs/2602.13367 16 Feb, Symmetry in Language Statistics Shapes the Geometry of Model Representations, https://arxiv.org/abs/2602.15029 17 Feb, GLM-5: From Vibe Coding to Agentic Engineering, https://arxiv.org/abs/2602.15763 18 Feb, Arcee Trinity Large Technical Report, https://www.arxiv.org/abs/2602.17004 4 Mar, The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks, https://arxiv.org/abs/2603.05498 12 Mar, Tiny Aya: Bridging Scale and Multilingual Depth, https://arxiv.org/abs/2603.11510 15 Mar, Attention Residuals, https://arxiv.org/abs/2603.15031 16 Mar, Mamba-3: Improved Sequence Modeling Using State Space Principles, https://arxiv.org/abs/2603.15569 31 Mar, Attention to Mamba: A Recipe for Cross-Architecture Distillation, https://arxiv.org/abs/2604.14191 13 Apr, Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning, https://arxiv.org/abs/2604.12374 6 May, ZAYA1-8B Technical Report, https://arxiv.org/abs/2605.05365 13 May, Delta Attention Residuals, https://arxiv.org/abs/2605.18855 21 May, Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention, https://arxiv.org/abs/2605.22791 25 May, The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence, https://arxiv.org/abs/2605.26494

0 views

Running Python code in a sandbox with MicroPython and WASM

I've been experimenting with different approaches to running code in a sandbox for several years now, but my latest attempt feels like it might finally have all of the characteristics I've been looking for. I've released it as an alpha package called micropython-wasm , and I'm using it for a code execution sandbox plugin for Datasette Agent called datasette-agent-micropython . My key open source projects - Datasette , LLM , even sqlite-utils - all support plugins. I absolutely love plugins as a mechanism for extending software. A carefully designed plugin system reduces the risk involved in trying new things to almost nothing - even the wildest ideas won't leave a lasting influence on the core application itself. My software can grow a new feature overnight and I don't even have to review a pull request! There's one major drawback: my plugin systems all use Python and Pluggy , and plugin code executes with full privileges within my applications. A buggy or malicious plugin could break everything or leak private data. I'd love to be able to run plugin-style code in an environment where it is unable to read unapproved files, connect to a network, or generally operate in a way that's risky or harmful to the rest of the application or the user's computer. My interest covers more than just plugins. For Datasette in particular there are many features I'd like to support where arbitrary code execution would be useful. I've already experimented with this for Datasette Enrichments , where code can be used to transform values stored in a table. I'd love to build a mechanism where you can run code on a schedule that fetches JSON from an approved location, runs a tiny bit of code to reformat it into a list of dictionaries, then inserts those as rows in a SQLite database table. My goal is to execute code safely within my own Python applications. Here's what I need: Web browsers operate in the most hostile environment imaginable when it comes to malicious code. Their job is to download and execute untrusted code from the web on almost every page load. Given this, JavaScript engines should be excellent candidates for sandboxes. Sadly those engines are also extremely complicated, and are not designed for easy embedding in other projects. Most of the v8-in-Python projects I've seen are infrequently maintained and come with warnings not to use them with completely untrusted code. WebAssembly is a much better candidate. It was designed from the start to support all of the characteristics I care about and has been tested in browsers for nearly a decade. The wasmtime Python library is actively maintained and has binary wheels. WebAssembly engines like wasmtime run WebAssembly binaries. Some programming languages like Rust are easy to compile directly to WebAssembly. Dynamic languages like JavaScript and Python are harder - they support language primitives like , which means they need a full interpreter available at runtime. To run Python we need a full Python interpreter compiled to WebAssembly, wired up in a way that makes it easy to feed it code, hook up host functions and access the results. Pyodide offers an outstanding package for running Python using WebAssembly in the browser, but using Pyodide in server-side Python isn't supported. The most recent advice I could find was from October 2024 stating "Pyodide is built by the Emscripten toolchain and can only run in a browser or Node.js". The other day I decided to take a look at MicroPython as an option for this. The MicroPython site says: MicroPython is a lean and efficient implementation of the Python 3 programming language that includes a small subset of the Python standard library and is optimised to run on microcontrollers and in constrained environments. WebAssembly sure feels like a constrained environment to me! I had GPT-5.5 Pro do some research for me , which turned up this PR against MicroPython by Yamamoto Takahashi titled "Experimental WASI support for ports/unix". It then produced this research.md document , so I let Codex Desktop and GPT-5.5 high loose on it to see what would happen: It worked. I now had a prototype Python library that could execute Python code inside a WebAssembly sandbox! The trickiest piece to solve was persistent interpreter state. The WASM build we are using here exposes a single entry point which starts the interpreter, runs the code and then stops the interpreter at the end. This works fine for one-off scripts, but for Datasette Agent I want variables and functions to stay resident in memory so I can reuse them across multiple code execution calls. A neat thing about working with coding agents is that you can get from an idea to a proof of concept quickly. I prompted: After some iteration we got to a version of this that works! In Python code you can now do this: Under the hood this starts a thread, sets up a request queue and then sends messages to that queue for the command, each time waiting on a reply queue for the result of that execution. Inside WASM the MicroPython interpreter blocks waiting for a host function to return the next line of code, which it runs on before calling when each block has been successfully executed. The other piece of complexity was supporting host functions, so my Python library could selectively expose functions that could then be called by code running in MicroPython. Codex ended up solving this with 78 lines of C , which ends up compiled into the 362KB WebAssembly blob I'm distributing with the package. I am by no means a C programmer, but I've read the C and had two different models explain it to me (here's Claude's explanation ) and I've subjected it to a barrage of tests. The great thing about working with WebAssembly is that if the C turns out to be fatally flawed the worst that can happen is the WebAssembly execution will fail with an exception. I can live with that risk. Memory limits are directly supported by wasmtime. CPU limits are a little harder: wasmtime offers a "fuel" concept to limit how many operations a WebAssembly call can execute, and that's the correct fit for this problem, but the units are hard to reason about. I'm experimenting with a 20 million default "fuel" setting now but I'm not confident that it's the most appropriate value. The alpha is now live on PyPI . You can try it from your own Python code as described in the README . I've also added a simple CLI mode in version 0.1a2 which means you can try it using without first installing it like so: You can also try it in Datasette Agent like this: Then navigate to http://127.0.0.1:8001/-/agent and run the prompt: Having complained about immature, loosely-maintained sandboxing libraries, it's deeply ironic that I've now built my own! I deliberately slapped an alpha release version on it, and I'm not ready to recommend it to anyone who isn't willing to take a significant risk. I've put it through enough testing that I'm OK using it myself. I've shipped my first plugin that uses it, datasette-agent-micropython . I've also locked GPT-5.5 xhigh in that Datasette Agent plugin and challenged it to break out of the sandbox and so far it has not managed to. I'm hoping this implementation can convince some companies with professional security teams and high-stakes problems to commit to using Python in WebAssembly as a sandboxing approach and open source their own solutions. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Why do I want a sandbox? What I want from a sandbox WebAssembly looks really promising here MicroPython in WebAssembly Building the first version Try it yourself Should you trust my vibe-coded sandbox? Dependencies that cleanly install from PyPI , including binary wheels across multiple platforms if necessary. I don't want people using my software to have to take any extra steps beyond directly installing my Python package. Executed code must be subject to both memory and CPU limits. I don't want to crash my application or the user's computer. File access must be strictly controlled . Either no filesystem access at all or I get to define exactly which files can be read and which files can be written to. Network access is controlled as well . Sandboxed code should not be able to communicate with anything without going through a layer I fully control. Support for interaction with host functions . A sandbox isn't much use if I can't carefully expose selected platform features to the code that it's running. It has to be robust, supported, and clearly documented . I've lost count of the number of sandbox projects I've seen in repos with warnings that they aren't actively maintained!

0 views

Communities of Not

There is a strange thing that happens in communities that gather around abstinence from something: identity from opposition. At their best these communities are not just negative: childfree spaces can be about autonomy, choice and acceptance, anti-car spaces about safer streets and transit, and LLM-skeptical developer spaces about the future of labor, code quality and slop 1 . But the thing being refused often does not go away and instead becomes the main subject of the community’s identity. That would be fine if it stayed at criticism, maybe even angry criticism, but more often than not it turns into policing and hatred towards others. An influencer without children becomes a parent, an urban bike commuter by choice buys a Porsche, a respected developer tries LLMs, and the community feels betrayed because it assumed they were members of the same tribe. The expulsion of that person (who never signed up to be a community member) is entirely imaginary but the punishment that the community unleashes is not: people pile on and shame them, quote them out of context and turn their weakest moments into proof that the person was always unserious, a sharlatan or should not be listened to. I do not think the answer is to tell people to stop paying attention. Cars shape cities even for people who cycle, children influence politics, workplaces and taxes even for people who do not have them. For us developers, LLMs show up in editors, issue trackers, hiring conversations, management pressure and code reviews whether we asked for them or not. Resisting that can be legitimate but that is no excuse for using one’s rejection to justify shitty mob behavior. I understand the thinking all too well, because I have done versions of this myself in the past. It took me a while to become more accepting of other people’s worldviews that diverge from mine. Whatever insecurities we have, finding a group of others sharing them can be comforting. The danger is that being part of a crowd of negativity can easily make us part of collective harassment. I can only encourage you to breathe, slow down, de-escalate when given the chance, and resist the temptation to always assume the most catastrophic reading. Default to being open to new things . Being negative towards something, and making that ones identity, is an easy trap to fall into. These examples are not meant as equivalents. The recent mob against rsync is the LLM version that prompted this post. I picked the others because I’m familiar with those communities and they all show similar cases of personal choices being interpreted as betrayal. ↩ These examples are not meant as equivalents. The recent mob against rsync is the LLM version that prompted this post. I picked the others because I’m familiar with those communities and they all show similar cases of personal choices being interpreted as betrayal. ↩

0 views
iDiallo Yesterday

Why all the PRs?

It's a signal. That's why we get AI-generated PRs. We told everyone, in order to get your resume taken seriously, you need to show your work. When I was getting started in my career, that meant having your own website that you contribute to regularly. So I did that. I built websites, I maintained them. I kept maintaining them even after I got the jobs because that's how I actually honed my web programming skills. Where else was I going to try new frameworks, a new JavaScript paradigm, or try out Ruby on rails? I got the job, and I advised other developers to follow the same path. But then github became mainstream. Rather than just show a finished website, you could actually share the code that runs your project. Share a link to your github project and companies can review your code and directly gauge your experience. But even better, you can show your contribution to open source projects. Not just any projects. Popular projects. The github stars became a metric people look for. A signal that can be used to quickly assign a value to a candidate. But that’s the story told from the outside. I don’t think the github profile link was ever important, unless it was significantly good. Employees focused on their work rarely have the time to maintain healthy github activity. Their experience comes from their day to day job. So for the most part, not much attention was placed on github links other than skimming through those surface level details. When stacks of resumes came on my desk, the best candidates stood out because they had work experience. The good candidates had projects that they could link to, github or elsewhere. But then, the worst candidates had long padded resumes that had elements of every job application tips-and-tricks-article. They had a website, but it was built in a day for the purpose of getting a job, with nothing interesting to say. They had github links, but those often pointed to school projects, homework, or boilerplate code. That’s the vast majority of github links I used to get. People with active and well maintained github profiles were rare. Rare because it actually requires time, effort, and experience. But then we have AI. There was a golang auth issue that I've contributed to on github. It was already a few years old when I proposed a solution that worked for my case. It wasn't universal so it wasn't accepted. The discussion is revived every couple years, each person bringing one more piece to the puzzle. But then recently, someone exploded the thread with comments. And even created a PR to go with it. This was from a user that went from a dormant account to 4000 contributions in a year. It was all AI assisted code. This isn’t to comment on the quality of his code, but he was clearly trying to optimize the metric. Looking at his linkedin profile, he doesn’t work in a software engineering role, and it’s hard to decide if he would be a good contributor if hired. If we were to judge his resume by looking at the github profile, it might catch our attention. But then, there is a problem. There are hundreds, even thousands of people all doing the same thing. They are cranking up their contributions to github projects using AI, so they can have a better chance at getting hired as developers. I understand the job market is rough right now, especially for gen z , and anything to differentiate yourself is a plus. The problem is this is being done at the expense of open source projects. The contributors are not submitting PRs to your project because they are personally invested in it. Instead, they are trying to get their name on the contributors list so that they can use it as a signal in their resume. When we are out here debating if there is any merit in AI generated PRs, or if we should just judge the code, we tend to miss that their gesture is completely hollow. The PR’s author intentions are completely misaligned with the project's maintainers. They are playing a different game. We call it slop, or a waste of time, we ban them and they get really vocal about expressing their first amendment rights. We are directly interfering with their goal of padding their resume. I often ask, why don’t people who create those PRs not just start their own project? One answer I’m starting to believe is, nobody cares about a github profile with a handful of stars. You need to contribute to a popular project. Most if not all AI generated websites look the same, it doesn’t matter how well you customize the prompt. Most greenfield projects from new programmers look the same, the prompter lacks the experience to do anything different. Contributing to open source is a scary thing when you are new. Even when you have experience, it’s a deliberate act. You have to be invested in the work. Just like asking questions on stackoverflow, issues you raised will often get closed . And when they do, you have to learn from it. The value of an open source contributor is not in the volume of work they can perform. If you skim any important projects, you’ll see that the best contributors spend more time discussing the problem than writing code. Their value is in solving problems and contributing to the collective memory of the group. But when you are doing a drive-by PR that may or may not be correct, and you are just trying to get your name on a list, you are providing zero value to the maintainer. Just more work. This is the signal every slop PR generator is after.

0 views
Farid Zakaria Yesterday

The Guix Nix Abomination: Leveraging Guix derivations in Nix

Nix and Guix look like rival ecosystems, but under the hood they’re the same “Input Output Machine”. Need proof? 🕵 How about we build a Guix derivation with Nix. First let’s create a super basic derivation in Guix: Hello world . We then ask Nix to build it. 🪄 We ask to use as the Nix store and have it write its state, database and log files in alternate directories, so it does not collide or mess with Guix. Note It’s slightly more complicated. Nix happens to check its SQLite database for the derivation, so we need to register it first. The version of Guix (v1.5.0) I’m using leverages a user that runs inside a private mount namespace where is writable, but everyone else (including me) sees it as read-only. The creates a new private mount namespace so I can mount it as read-write and run the Nix command against it. We just built a Guix derivation using Nix. 🔥 How is that possible? Both take a language frontend, Nix or Guile (Scheme), that compiles to a derivation (recipe) and pass that onto a builder (daemon) that executes it to produce an output. What makes them both special is they both promise the same thing: hermetic builds . Everything needed to build the output is declared in the recipe: sources, environment variables, dependencies, etc. “Under Nix, a build process will only find resources that have been declared explicitly as dependencies. There’s no way it can build until everything it needs has been correctly declared. If it builds, you will know you’ve provided a complete declaration.” – Nix OS Website Guix, specifically the daemon, was forked from Nix early on, and as a result the two are very similar; they both share the same derivation format, ATerm , for instance. Guix is based on the Nix package manager – Guix Website That’s why our earlier example of building the Guix derivation with Nix was possible without much translation. What if we could leverage an existing recipe from Guix in Nix in its traditional ? If we could convert from one recipe file to the other, we could use the existing recipes from Guix in Nix and vice versa. Turns out this is far more feasible than you would think, because Guix is Nix or at least a superset of it. I, with the help of Claude, built a tool to do just that: guix-transfer 🤯. guix-transfer is a CLI tool for performing bottom-up translation of GNU Guix derivations into Nix. Confused? Let us see it in action: Note When you unpack a tarball, tar restores each file’s original permissions, including setuid/setgid bits. Nix’s sandbox installs a seccomp filter that blocks any call that sets these bits, returning “Operation not permitted”. Guix’s early bootstrap uses a Scheme-based (gash-utils) that treats this error as fatal, unlike GNU tar which silently skips it. The fix is , which disables the filter. If it’s not clear what we just did: we took a Guix derivation and all of its dependencies (down to the bootstrap seeds), translated it to a Nix derivation, and built it with Nix. 😲 What is this abomination and how was this possible!? It’s important to revisit what a derivation is, and how it’s used in Nix and Guix to better understand how this is possible. Let’s look at the same basic derivation from earlier, Hello World . You might want to check out my other post on Nix derivations by hand if this interests you 🤓. When we evaluate (nix-instantiate) this derivation, we get a path to a file that contains the derivation in the ATerm format: If we look at the contents of the file, we can see the ATerm representation of the derivation: This has all the information we need to build the output by the builder. At this point, it’s really not Nix specific anymore. The same applies for the Guix derivations. The derivations do not “know” whether they came from Scheme or Nix. It’s a recipe. The insight then is if we rewrite the store paths from to , and swap some builtins (i.e. for ), we can get to build it identically . 💡 The only difference in more complex derivations is that they have dependencies, which are also derivations, and the builder references them so it forms a graph of derivations, each built by the builder in topological order. The leaves of this tree for any non-trivial derivation are the bootstrap seeds: , , , etc. Guix is famous for bootstrapping itself from a 357-byte binary as source [ ref ]. Since at no point do the bootstrap seeds depend on being the prefix, the translated chain builds identically under Nix. walks a Guix graph in post-order and for each derivation: Guix’s is replaced with Nix’s . Same idea, different name. Source files are added to the Nix store, with embedded paths rewritten to their equivalents. Every reference: input drvs, builder path, args, env vars are rewritten to the mapped path. Output paths are blanked as Nix recomputes them via . The result is serialised as JSON and registered with . That’s it. No Nix expressions are generated. No . No mapping of Guix packages to nixpkgs equivalents. The Guix derivation graph is translated faithfully , and builds it. Note Interestingly, takes exactly one URL and cannot fall back. Guix derivations carry lists of mirrors, many of which are flaky or dead. Similar to Nix, Guix operates a content-addressed mirror at that serves any source its CI has ever seen. We leverage this for the instead of the original source URL. Now that we have a way to slurp Guix packages into Nix, we can start to do some diabolical combinations by combining native Nix and Guix packages together! We can take our package we built in Nix and leverage it in a Nix derivation. Nix automatically scans your derivations for anything prefixed with and tracks it as an input dependency. This is similar to how store paths are interpolated when you do something like . If writing the paths raw in the Nix expression is a little too raw for you, we can build something more ergonic pretty easily as well. has an mode that instead will emit the Nix expression for the translated . Let’s look at a slightly more complex example that uses Guix’s to build a derivation with dependencies: We can now convert this to a Nix expression with . We realise the derivation with or we can the Nix expression. Please notice that both produce the exact same hash : . We can now use this Guix derivation like any normal Nix expression, such as the ones you might encounter in Nixpkgs. That means we could even build a that is all of Guix packages available for use. My mind is blown. 🤯 Nixpkgs is known as the world’s largest package repository, and now we have made a way for it suddenly to become even larger by borrowing any derivation from Guix! The real power behind Nix are the derivations and that they are hermetic, declaring any dependency needed. We’ve seen that we can transfer these recipes to any store-based system that has similar qualities and preserve the reproducibility. Guix’s is replaced with Nix’s . Same idea, different name. Source files are added to the Nix store, with embedded paths rewritten to their equivalents. Every reference: input drvs, builder path, args, env vars are rewritten to the mapped path. Output paths are blanked as Nix recomputes them via . The result is serialised as JSON and registered with .

0 views
Chris Coyier Yesterday

Sprinter Van Phone Mount + Better CarPlay-Compatible Cable Situation

This is the stock look of my 2021 Sprinter van front console area: If it’s not obvious, there’s no great place for a phone. You’ve got the cup holders. Those are actually sorta workable, except for this boss battle: wired-only CarPlay. Wired CarPlay is not ideal, but that’s all this van has. I expect upgrading the whole system would be super expensive or potentially not even possible. Wired isn’t that bad. Wired CarPlay means more immediate response from actions and heck, it charges the phone too. The problem is where that wire needs to go. Up on the dashboard, there is this little cabinet thing with a door that opens up toward the windshield. The ports are inside that cabinet, one of them being the one that has to be used for CarPlay. I’m sure the engineering thought is: plug it in, put your phone in the cabinet, shut the cabinet. And that’s kinda fine. I don’t like fiddling with my phone while driving and CarPlay means I don’t really need to. But it’s still inconvenient. I often forget my phone up there when getting out of the van. If I do need to fiddle with my phone while parked or because something just absolutely has to be done on-device, it’s extra obnoxious to get my hands on it. The answer is this little guy. A 3D printed part from NEXUS. This slides into the cabinet and now the cabinet door doesn’t full close, it closes onto this, leaving little slits to bring the cords out from. We’re ultimately going to move the phone to a mount, and then the question is where and how to mount it. Fortunately NEXUS is on the case again with a 1″ Ball Cubby Adapter . That weird looking thing has nothing at all to do with your butt! It’s a clever device that fits perfectly into the useless weird cubby on the dashboard and provides this general purpose mount. NEXUS doesn’t make a phone mount. I think on purpose? The Winnebago Revel has all these RAM ®  Mounts all over. I had never heard of all this RAM ® stuff before. They make a bunch of mounts and stuff. To me, it all feels like a little step up from the home 3D printing feel of NEXUS. Not quite industrial, not quite hobbiest. I was happy to stay in the kinda happy path ecosystem, so I bought the RAM ®  Mount with the ball. Astute readers will notice… now we have two balls. And, we have two ball mounts! Those two ball mounts won’t actually connect to each other. The magic of the ball mount is that you get this 360-degree(ish) adjustability. I don’t really need that much movability, but I’ll take it. The final bit here is to get the connector to get the two balls together. And it doesn’t occupy any other useful space on the dashboard. Love it. Order list: Kinda pricey for a phone mount. But I think it’s worth it. Its a slightly tricky situation and this solution is (1) long term in that it will fit any future phone (2) opens up the idea for mounting other things with ball mounts (3) allows for more cables out of the dashboard cabinet to come out smoothly. There are other options! There are other/cheaper ones that clip onto the cubby mount that look OK. There are ones that mount into the cup holder nicely. Honestly this one is super minimal and clever. I’m not good at this, which is basically why I just blogged it instead of like TikTok’d it or whatever.

0 views
Unsung Yesterday

More molly guards

Ever since I wrote a post about the molly guard , I have been on the lookout for those, and I think I collected enough to do a little follow-up. First, some classic industrial molly guards from a museum in Germany: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/1.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/2.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/2.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/3.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/3.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/4.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/4.1600w.avif" type="image/avif"> This IBM electronic typewriter had a gorgeous perspex molly guard around the power button: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/5.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/5.1600w.avif" type="image/avif"> Other machines opted for “softer” quasi molly guards that still aimed to prevent you from pressing a button or switch by accident, but without having to get something out of the way first: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/6.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/6.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/7.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/7.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/8.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/8.1600w.avif" type="image/avif"> Even softer? This below is not a traditional molly guard, but the placement of “I’m writing to the SD card” red light was not accidental. Ejecting the card while the camera is writing to it might cause some damage, so the light was positioned right next to the card door and the card itself, making you more likely to spot it and wait: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/9.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/9.1600w.avif" type="image/avif"> This one is even more clever. You know how some old floppy drives have a handle that lowers the reading/​writing head so that the diskette can be used? That same handle also prevented you from pulling the disk once the head was lowered. It felt so natural, you might not have even realized it’s a molly guard doing its job: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/10.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/10.1600w.avif" type="image/avif"> On the other side, these following guards are more of a “you really shouldn’t do this” variety – much closer to a disabled state in graphical user interfaces: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/11.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/11.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/12.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/12.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/13.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/13.1600w.avif" type="image/avif"> Let’s jump into software. This is a nice situational molly guard in Finder when you press ⌘O and have a lot of files selected: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/14.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/14.1600w.avif" type="image/avif"> iPhone’s “slide to unlock” no longer graces the home screen, with one exception – stopping the alarm: There’s something about this treatment that doesn’t sit well with me. I’m not sure what it is: The text not feeling centered? The control being circular? The icon on the slider making it seem like it’s a stop button you can press? Speaking of stuff I don’t love, you might recognize this molly guard from Chrome: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/16.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/16.1600w.avif" type="image/avif"> This one never felt pleasant to me. You might say “isn’t the point of the molly guard that it doesn’t feel pleasant”? But I think one needs to separate the intent and the mechanics. I don’t mind the intent here, but the styling is ugly, the message kind of confusing – you don’t really have to hold ⌘Q, just press it again – and you also don’t get any feedback during holding. Contrast with this extremely skeuomorphic CD burning molly guard in early iTunes, suggested by one of the readers: And lastly, something I didn’t expect to ever see. Per this issue (page 14) of an alumni magazine of University of Illinois, here’s the actual Molly with her father: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/18.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/more-molly-guards/18.1600w.avif" type="image/avif"> #definitions #interface design #real world

0 views

Beta Testing PostgreSQL With Docker

The Postgres community values feedback from testing of Beta releases, and with Docker it’s been easier to get pre-release versions up and running. With the recent announcement of PostgreSQL 19 Beta 1 , let’s get that running and test some of the new capabilities. First, you’ll need to install Docker for your OS! Grab the version needed for your OS and processor architecture, for example ARM or AMD/Intel/x86. On MacOS run or in your Terminal to learn more about your hardware details. For Windows check Install Docker Desktop on Windows Official Postgres images for Docker Postgres are limited to fully released versions. Fortunately @yosifkit created a PR to add 19 Beta 1 (merged by @ tianon ) with instructions for how to use to build pre-release versions. This command downloads and builds : With that built, I could invoke with . I named mine . I also passed the env vars below based on how I run other Docker Postgres containers (these options may not be necessary). The final command: To check if it’s running, I run . For logs I’d run: . The container is running and the logs have what we want: “database system is ready to accept connections”. Let’s connect to the database using psql on the container: We should see output like showing version 19: Great. Let’s try out some things in 19. 19 Added a new system view for checking out locks. Let’s try it out: We get a lot of new data like counts, and more. What about the new extension? First let’s load it and then create a table to experiment with: With that in place we can show the output via with a new parameter: I wonder why the rows estimate is 2550 by default? Let’s run . After doing that, it looks more sensible with a estimate of 1: The extension gained new capabilities in 19. Let’s try it out: Oops, we need to add it to first. We can see that’s currently not the case: One way to do that with is the parameter as follows: Now we see what we want in : We have not yet enabled the extension though given doesn’t list it. Let’s do that: Now shows it, and we’re ready to query it. One of the additions is tracking the use of prepared statements. Let’s create a basic table and prepared statement. Create table again if needed: Create a simple prepared statement and execute it. The goal here is for to increment the field. Now let’s execute it: Did it work? It worked! We see was incremented. This looks very useful to monitor the use of prepared statements. Please give this a shot and experiment with new features in Postgres 19! Add 19.x builds (currently beta 1)

0 views
Giles's blog Yesterday

JAX backends and devices

There's nothing like writing your own code with a framework to clarify how things fit together! Continuing with my port of my PyTorch LLM code to JAX , I wanted to load up a large dataset: the 10,248,871,837 16-bit unsigned integers in the split of . That's just over 19GiB of data. When I ran that, I got a CUDA out-of-memory error: That makes sense! The allocation it was trying to do is exactly the size of the data I was trying to load. I have an RTX 3090 with 24 GiB, but some is already used up by the OS, various apps, and a model that the code creates earlier on. But in PyTorch land, I was used to things being loaded into RAM by default, and only moved over to the GPU when I asked it to do that. JAX was clearly loading to the GPU by default. How could I stop it from doing that for this case? The load into the GPU was happening inside Safetensors, in code I couldn't directly control. Understanding how to do it helped me understand a little bit more about JAX. JAX has a function that looks relevant: . Without reading the docs, let's try running it. In my virtualenv, with the package installed, I get this: That seems a bit weird! I do indeed have a CUDA device, but I also have a CPU, obviously. Why isn't it showing up? Running the same code in another virtualenv, with just installed -- no CUDA -- gets this: OK, so it did recognise it this time. Feels like it might be time to RTFM. The docs explain things a bit: Returns a list of all devices for a given backend. If is , returns all the devices from the default backend. The default backend is generally or if available, otherwise . OK. So JAX has multiple backends -- named that because they're classes of backend hardware that XLA (the compiler behind the JIT) targets. There is a default one, which is essentially going to be the "best" one available given the hardware configuration and the parts of JAX that are installed. When I had the CUDA version installed, it made the backend default, but when I didn't, it defaulted to (and warned me). And because it only shows the devices on the default backend, when that was , I didn't see the CPU. However, you can specify which backend you want to use with that parameter, so let's go back to the virtualenv with CUDA: Great! So is there some way to list which backends are available? Apparently not -- the recommended way appears to be to try loading devices for the different possibilities, and catch to see which ones aren't available. Yuck. But maybe that's not such a big deal. In PyTorch-land I was very much used to putting code like this near the start of my code: ...then moving models to the device: ...and then moving data to the model's device as needed: What I actually wanted was essentially what JAX does -- have everything on the fastest device available at all times -- but with specific exceptions. In particular, the one that started off this investigation: how would I put this huge array of training data on the CPU's RAM rather than the GPU's VRAM? I had a bit of a false start when I spotted that the function in the Safetensors FLAX API has a parameter, but that appears to be more to do with how it loads up the file -- a backend in a different sense. And anyway, backend is not the right concept in JAX-land, as the backend means just something generic like -- for what we're trying to do, we want to load it onto a specific device . After some digging around, I discovered that JAX has a concept of a default device , which is the one used when it doesn't have any indication of where to put something. It makes sense that this will be on the default backend -- indeed, it looks like it's essentially "the first device in the list that returns for the default backend". There is a config option which you can use to set it; you'd normally use or an environment variable to change it. But what if you only want to change it temporarily? I found this documentation for . The docs are more than a little confusing: Context manager for config option. Configure the default device for JAX operations. Set to a Device object (e.g. ) to use that Device as the default device for JAX operations and jit’d function calls (there is no effect on multi-device computations, e.g. pmapped function calls). Set to None to use the system default device. That near the start tripped me up, as I missed the words "Context manager" just below, and the odd type, and tried this: I still got the CUDA OOM, though, so I reread the docs, spotted the "context manager" bit, swore violently, and tried this: ...which works. It looks like the equals sign in the docs is being used to mean something very different to what you'd normally use it for, and they decided not to actually document the signature of the context manager. Heigh ho. I guess documentation is hard . Still, at least now I have a solution. And as I said earlier, doc grumbles aside, the shape of the code might wind up being a little less fiddly than PyTorch. The default location of things I create is the fastest hardware I have, which is what I want. And for the rare exceptions when I don't want to use that, there is a reasonably simple (now that I know it) way to say where I want things to go. I'll call that a win :-) The only thing I'll need to remember is that when, in my training loop, I want to use subsets of that in-RAM tensor, I'll need to move them to the GPU. looks like the right tool for that.

0 views
Stratechery 2 days ago

2026.23: Power Shifts

Welcome back to This Week in Stratechery! As a reminder, each week, every Friday, we’re sending out this overview of content in the Stratechery bundle; highlighted links are free for everyone . Additionally, you have complete control over what we send to you. If you don’t want to receive This Week in Stratechery emails (there is no podcast), please uncheck the box in your delivery settings . On that note, here were a few of our favorites this week. This week’s Stratechery video is on The SpaceX IPO and Data Centers in Space . Google and Microsoft . Three years ago Google looked hapless, scrambling to respond to ChatGPT, while Microsoft, thanks to their groundbreaking partnership with OpenAI, looked on top of the world. Now Google is pulling away in terms of market capitalization, which makes their decision to issue equity to Berkshire Hathaway a curious one: I try to figure out what it means in The Google Capital Company . As for Microsoft, my first question to CEO Satya Nadella in this week’s Stratechery Interview was a simple one: is he happy with Microsoft’s competitive position? — Ben Thompson YouTubers Take Over Hollywood.  Over the past few weeks, the biggest news in the entertainment business has been the surprising success of two Gen Z YouTubers. They’ve directed the most successful movies in America and beaten (yet another) Star Wars spinoff at the box office. Monday’s Daily Update seizes on that news to explain how we got here and the various factors that explain the YouTubers’ success. We doubled back on the topic on this week’s Sharp Tech to explore not only the implications for Hollywood, but why YouTube, in response to this phenomenon, is unlikely to change a thing.  — Andrew Sharp A Guide to the NBA Finals. The Knicks took Game 1 of the NBA Finals on Wednesday, and I have to say, this week’s Greatest of All Talk preview nailed the contours of exactly what we saw. If you’re looking for what to watch going forward, the intrigue of course starts with 22-year-old Victor Wembanyama. If he can win this title it’ll be the sort of flashpoint that we all remember for the next 50 years. But can he do it? My co-host Adam Mares explained why the Knicks are a tougher Wemby challenge than even the defending champion Thunder were, and that as great as he is, there are real limits in Wemby’s game that will be tested for the next 10 days. Let’s see how he responds. — AS YouTubers Win the Box Office, Goodbye Gatekeepers, The YouTube Bar — YouTubers are ruling the box office, and it shouldn’t be a surprise: succeeding on YouTube is a much higher bar than the gates that currently govern Hollywood. The Google Capital Company — Google has issued equity to Berkshire Hathaway in a deal that signals far more demand and a future where capital is the ultimate commodity. The Nvidia AI PC, Project Solara, Microsoft AI — The Nvidia AI PC feels like a relic of another AI era; Microsoft’s vision for devices at Build was much more compelling. An Interview with Microsoft CEO Satya Nadella About Finding Core Competencies — An interview with Microsoft CEO Satya Nadella about figuring out Microsoft’s role in AI, the relationship with OpenAI, Capex, Software, and a potential new agentic platform. Steph Curry Turns to China — Steph Curry’s move to partner with Chinese shoe brand Li-Ning is smart, lame, and ultimately a refreshing reminder of American strengths. Electric Cars and Meta Subs WWDC Questions Taiwan’s DRAM Failure Seizing The Commanding Heights; Decoding Shangri-La Dialogue; Europe Moots Trade Policy; The PRC Expels a New York Times Journalist Five Questions on the NBA Finals, Wemby and NBA History, Q&A on the Kroenkes, OKC, and Team USA What’s Google Doing With Berkshire Hathaway?, A Bubble Temperature Check, Gen Z YouTubers Take Hollywood

0 views
Andy Bell 2 days ago

I did the Standard.site thing

It seems like everyone is integrating Standard.site on their personal websites (and beyond) at the moment. It’s mainly been encouraged by Bluesky’s recent update to the treatment of sites integrated with Standard.site. Sam has a good explainer here . It’s a cool feature! There’s loads of different platforms doing cool stuff, all using different data structures, so the standardisation was really needed. Integrating is fairly straightforward too. If you’re using WordPress, for example, it’s as simple as using a plugin . There’s an Astro plugin too, along with all of your favourites (I imagine). I’ll write up the details as part of my ongoing series . I’ll definitely be getting on that towards the end of the month too as my studio workload quietens down a bit. Was it all a bit of a faff to integrate? Yes. Did I get rate limited multiple times? Absolutely. But do I think it’s worth doing? I think Standard.site is going to do wonders for website discovery, so without a doubt.

0 views