Latest Posts (20 found)

Thinking Machines and interaction models

Thinking Machines just released Interaction Models . This is their first real AI model release 1 after a year of work and two billion dollars of capital. What is an “interaction model”? First, it’s not a frontier model . Thinking Machines is not yet competing with OpenAI, Anthropic and Google. Instead, they’re working on the problem of better real-time interaction with models. Some parts of what they’re doing are not new at all, other parts are slightly-questionable benchmark gaming, and still other parts represent a genuine technological advancement. I’ll try to lay it all out. If you’ve used ChatGPT in audio mode, you know that you can’t talk to it exactly how you’d talk to a human. There’s a big latency gap between when you finish talking and when the model jumps in. The model won’t interrupt you like a human, and doesn’t react to you interrupting it like a human would either. And of course you can’t give the model visual feedback like facial expressions. That’s because ChatGPT is either speaking or listening at any given time . When you’re talking, it’s in “listening” mode; when it’s talking, it’s in “speaking” mode, and isn’t absorbing any information from you. It relies on VAD (“voice activity detection”) to figure out if you’re talking. The alternative (and what “interaction models” do) is a fully-duplex system, where the model is constantly both in listening and speaking mode at the same time. Of course, the model can’t literally do this. Like all language models, it’s either doing prefill (ingesting prompt tokens) or decode (producing completion tokens). But what fully-duplex models can do is switch from listening to speaking mode in tiny chunks, called “micro-turns”. Instead of listening for ten seconds (or however long it takes you to stop talking), then speaking for ten seconds (or however long it takes to pass the model output through TTS), the model can listen for 200ms, then output for 200ms, then listen for 200ms, and so on. While the user is speaking, the model will know to output silence - most of the time. But if it decides it’s good to interrupt you or speak at the same time as you, it’s capable of doing that. So far, so unoriginal. There are plenty of examples of fully duplex audio systems that the Thinking Machines blog post already cites: Moshi , PersonaPlex , Nemotron-VoiceChat , and so on. But at least this outlines the space that “interaction models” are playing in: not “superintelligence from a frontier model”, but “better real-time conversational interaction” 2 . Given that, what is Thinking Machines doing that’s new? For existing fully-duplex models, you talk to the model itself. That’s a fairly big problem, since fully-duplex models have to be fast: fast enough that they can operate in tiny 200ms turns 3 . A model that fast cannot be particularly intelligent. Thinking Machines’ solution is to introduce an actual smart model - any regular language model will do here - in the background that the interaction model can delegate tasks to. In practice this is probably implemented as a tool call. The interaction model keeps chatting while the smart model works away, and then the smart model output is directly integrated into the interaction model’s context in the same way as audio and video input (a genuinely cool idea, I think). This is kind of neat, though it remains to be seen how well it works in practice. Will the model do a lot of “oh wait, the last thing I said was dumb, never mind” self-correction as the smarter model output trickles in? Will the fast interaction model be smart enough to delegate the right tasks at the right time? In general, the “start with a fast dumb model and have it hand off tasks” approach has been tricky for the AI labs to get right for a variety of reasons. If I’m being uncharitable, I might say that bolting on a strong reasoning model was an easy way for Thinking Machines to post impressive values for competitive benchmarks like FD-bench V3 (where they barely beat GPT-realtime-2.0) and BigBench Audio (where introducing the reasoning model bumps their score from 76% to 96%, only 0.1% below GPT-realtime-2.0). If I’m being charitable, I might say that a model fast enough for realtime conversation will have to have some way to punt hard tasks to a slower, smarter model. Both of those things are probably true. It’s also worth noting that Thinking Machines have also bolted on video input to their fully-duplex model. This is more exciting than it sounds, because face-to-face human conversation is very dependent on being able to read human expressions. In theory, this could unlock the ability to have genuine human-like conversations. The other reason why this is exciting is that it means Thinking Machines have been able to make a pretty big fully-duplex model (maybe twice the size of Moshi in terms of active parameters, and 40x the size in terms of total parameters). In fact, this is probably the biggest real technical achievement here. Other fully-duplex models are already doing micro-turns and interruptions, and could delegate reasoning fairly easily if they wanted to, but they aren’t doing video because they can’t . Being able to make a fully-duplex model the size of DeepSeek V4-Flash is pretty impressive. Much of the Thinking Machines blog post is dedicated to explaining how they’ve managed to do this: ingesting data in a more lightweight way, optimizing their inference libraries for tiny prefill/decode chunks, various decisions to make inference deterministic (a long-held hobbyhorse for Thinking Machines). There’s a lot of pressure on Thinking Machines to produce a genuine AI advancement. It doesn’t seem like they’re willing or able to compete in the frontier-model space (which makes sense, I wouldn’t want to either). Given that, I can see why they’re highlighting the parts of interaction models that are impressive to laypeople - all the fully-duplex interaction stuff - even though those parts are not truly innovative. So what are Interaction Models? A scaled-up, multimodal version of existing fully-duplex models like Moshi, with a real model bolted on for extra intelligence (and maybe better benchmarks). The scale and video parts are new and cool, and something like the overall approach has to be right. In general, I’m glad that we’ve got well-funded and high-profile AI labs tackling problems other than “build a smarter frontier model”. I think there’s a lot of low-hanging fruit waiting to be picked in other areas of AI research. People do seem to really like Tinker , which is their tooling for researchers who want to fine-tune models, but it’s not exactly the hot new frontier model that people were expecting. I think it’s at least a little shady that the Interaction Models video demo is making a big deal about some features (like real-time simultaneous translation) that are just features of fully-duplex audio models, not anything specific to their system. Even 200ms is a bit long. You can see from the demo that there’s an uncomfortable half-second lag sometimes as the model finishes its prefill slice and has to move to the decode slice. People do seem to really like Tinker , which is their tooling for researchers who want to fine-tune models, but it’s not exactly the hot new frontier model that people were expecting. ↩ I think it’s at least a little shady that the Interaction Models video demo is making a big deal about some features (like real-time simultaneous translation) that are just features of fully-duplex audio models, not anything specific to their system. ↩ Even 200ms is a bit long. You can see from the demo that there’s an uncomfortable half-second lag sometimes as the model finishes its prefill slice and has to move to the decode slice. ↩

0 views

Meet People Where They're At

There's a shopping center I sometimes walk to for lunch. It's been there long enough that it doesn't have a sidewalk (before city ordinances required sidewalks I imagine). A few years ago, a mixed-use complex was built next to it, complete with a sidewalk that ended right at the boundary of the old plaza. This new sidewalk has resulted in a path of trampled grass as people (like myself) walk to the restaurants in the old plaza. Today on my way to get some "Italian food" (it's America, nothing is authentic here), I was greeted with a new gravel path at the end of the sidewalk. The path had been placed to line up with the curve of dead grass and perfectly connected both plazas. ↑ I didn't have a camera on me, so enjoy this detailed sketch done on my Palm Pilot It seems like a small thing, but it surprised me. Just a week ago I remember wondering to myself how long it would be until a "stay off of grass" sign appeared. Instead, I was treated to a rare instance of people's needs being directly addressed. It reminded me of a similar story around Ohio State University (the university in my city). The sidewalks built across the campus green were made to follow the paths students trekked in the early days of the campus. A similar method, named Sneckdown , is used to determine where traffic calming measures are needed based on snow that has not been touched by traffic. I wish this was more common, identifying pain points and improving the situation. Instead, we spend hours in meetings figuring out how to fight people's goals because what they want isn't "sticky enough" or "doesn't meet business goals".

0 views
Unsung Today

“Nothing short of a magic trick.”

A fascinating 25-minute video from Mark Brown at Game Maker’s Toolkit about how the team building Grand Theft Auto 3 conquered the technical limitations of PlayStation 2: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/nothing-short-of-a-magic-trick/yt1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/nothing-short-of-a-magic-trick/yt1.1600w.avif" type="image/avif"> How do you squeeze a city that occupies over 50 megabytes into the 32MB memory of the console? You simply do what The Truman Show did , and construct the city around the player as they’re moving around : This has, as you can expect, a lot of technical and even game-design consequences, and the video goes into a lot of detail on these – including Brown rebuilding the Grand Theft Auto 3 source to visualize things better. This technique is also used in interface design, for example if you have a really long list of things that would take too much memory or GPU power to render. What the video calls “streaming” is, in the context of UI, often called “virtualization”: instead of having a full long list (or an entire world), you abstract it away – or, virtualize – into something nimbler. Some of the challenges and techniques used by Grand Theft Auto 3 apply pretty directly here, as well: On the other hand, “speedy players” and “pop in” can’t ever be solved because any UI list is random access, and slowing users down is not typically appropriate; better to make loading as pleasant as possible than introduce any roadblocks, even if figurative ones. #definitions #games #performance #youtube you can use UI skeletons as “low poly” models, in some contexts, you can guess the user is more likely to move in one direction (for example, going through fonts in a font picker), and more eagerly preload where they’re going to look next, rather than symmetrically in both directions.

0 views

Installing JPilot on Arch

This post is a quick tip for anyone else running into issues installing the Palm Pilot desktop software, JPilot on Arch Linux. If you just try installing via , the build will fail as the dependency no longer builds on modern systems. The solution is to first install , then .

0 views

When Escalator Breaks, It Turns Stairs

Read on the website: We need resilient systems that fall back to sanity when broken / discriminating. And not whatever.

0 views

a little note on the choices we make

When I think something is bad, immoral, unethical, harmful, evil - or whatever may apply - I neither do it in private or in public. I don’t just adhere to this rule of not doing it when I’m by myself, I also don’t do it when I’m with others, regardless of whether they might do that thing and would think it’s more comfortable for them when I partake as well. That’s what’s at the core of living within my own moral boundaries and values. Yes, it might be difficult at times or offend people, but at least I neither feel like a hypocrite nor a coward. I stay true to myself and my behavior aligns with what I expect from myself and how I wish others lived. I cannot force anything they don’t want on them, but I can lead by example and enforce my own boundaries. Do what you want, but I will not do it. You compromising on your understanding of what’s right and wrong simply to appease others and not stand out is sad. You are betraying yourself and what you stand for for very little, temporary gain, and you rob others of being challenged and inspired. It also makes me wonder if you really stand behind what you preach; if you truly think something is cruel and unacceptable, you would not try to find loopholes to still keep doing that thing, and then pointing fingers as to who made you do it or what exception counts. No more excuses pointing at what others are doing, how your behavior has no impact and how hopeless or hard it is. Hard things are worth doing. It’s time that you show some respect to yourself and stop putting off making some decisions and sticking to them. Your trust in yourself erodes when you keep making promises to yourself you don’t keep. Aren’t you fucking sick of seeing other people live the way you want to? You don’t have to feel inadequate, guilty, jealous or like a hypocrite in their presence. You can avoid feeling like you have to justify yourself if you commit even for just a month and go from there. Take inspiration from the people you admire and ask them for help. Find your own path that’s similar to theirs if that’s what works. You made yourself do that. Take someone accountability for your actions. You have a choice every time. Reply via email Published 11 May, 2026

0 views
Unsung Today

“They did the bare minimum and moved on.”

Since the early 2000s, Mac OS X had a few orientations of icons depending on whether they were applications, files, utilities and so on : = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/they-did-the-bare-minimum-and-moved-on/1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/they-did-the-bare-minimum-and-moved-on/1.1600w.avif" type="image/avif"> In 2020, macOS Big Sur unified those styles and made them more iOS-like: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/they-did-the-bare-minimum-and-moved-on/2.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/they-did-the-bare-minimum-and-moved-on/2.1600w.avif" type="image/avif"> A few years later, Jim Nielsen revisited the icon “Big Sur-ification” , and showed examples of apps that did the transition really well, but also those where the transition felt… lazy, essentially shoving their previous icon into a roundrect. For those, Nielsen proposes some alternatives that are delightful to see: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/they-did-the-bare-minimum-and-moved-on/3.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/they-did-the-bare-minimum-and-moved-on/3.1600w.avif" type="image/avif"> The Word/​Excel/PowerPoint/​Outlook explorations are particularly nicely done. #iconography

0 views

re: Hey you, start communicating!

David writes about the importance of reaching out to the author of blog posts and starting a conversation, I 100% agree! I love when something I write resonates with somebody, and more often than not it turns into a continuing conversation. I see this blog-o-sphere as it's own little world filled with friends across the world. I recently ran across a blog that belonged to a Youtuber. On the "about me" section they stated the following: NOTE: I don't answer any personal questions - Please don't send me emails. This does not sit well with me. What's the point of creating if not to spark conversation and meet others? At that point, it feels like you're just in it for the adsense revenue. The internet doesn't need that, it needs community (now more than ever). I don't have a problem with people making money off of their work, but it shouldn't be the only motivation. So reach out, send an email, even if it's just a "hello". I promise, you'll make the other person's day!

0 views
iDiallo Today

We Are Not Going to Agree on AI

On one hand, I know a developer producing 30,000 lines of code a month. On the other, I know a developer who says AI is stupid. Each swears by their stance and has evidence to back it up. One has a working product and the other has a broken one. The New York Times profiled Medvi and reported they're on track to make $1.8 billion this year. Clearly AI worked for them... if you ignore the alleged fraud for just a second. And while Microsoft now claims that at least 30% of their code is AI-generated , GitHub logged 89 incidents in 90 days ( as of this writing ). That doesn't exactly paint a bright picture of a technology firing on all cylinders. If you're sitting on the sidelines trying to decide whether AI works or not, you're not going to get a clean answer. But it's still the right question to be asking. I don't think we'll ever reach a consensus, because after all the hype, what we're left with is a capable tool. And apparently, that's not enough. For AI companies, it's supposed to be the alpha and the omega. Something that will both kill us and save us , take all our jobs and liberate us at the same time. For the rest of us, if we're not afraid, it's proof we don't understand it well enough, and we'll be left behind. I think of AI as a capable tool. That's it. I went to Home Depot once to buy what I needed to mount a TV. I asked an employee for a stud finder, and instead of just pointing me to the aisle, he walked me there himself. I hate when they do that, it usually means they're about to try to sell you something. Sure enough, in Aisle 17, his coworker was manning a caged shelf stocked with expensive-looking tools. When the cage opened, he didn't reach for the simple one I wanted. He grabbed the model loaded with 13 additional sensors. I asked if the basic one could do the job. The first employee took my side and made the case for simplicity. The second shot back: "Sure, if you don't mind drilling through a live wire!" I stood there watching these two argue, trading field stories like they were one-upping each other at a bar. One of them claimed he'd worked with a guy who didn't even need a stud finder, he'd just knock on the wall three times and know exactly where to drill. I put both stud finders in my basket, thanked them both, and walked away. I circled the store a few times to make sure neither of them could see me before I quietly dropped the expensive one into an empty basket near the checkout. To this day, I'm a little afraid to go back to that Home Depot. The only thing that matters to me is what I can do with a tool, not what the tool can theoretically do. When a tooltip pops up on my screen, I dismiss it before I've even finished reading it. I don't care about Jira's latest feature update on the sidebar. I don't care that AI can rewrite my already written ticket. I just want the stud finder to help me hang the TV. Everything else it can do holds no interest for me, especially when I'll use it for ten minutes and not touch it again for three years. When I have a goal, I reach for whatever helps me get there. If you're waiting for a tool to do the work for you, you're going to be disappointed. A tool's job is to make your work easier, and sometimes it does, sometimes it doesn't. Figuring out when to reach for it is on you.

0 views

Using LLMs to find Python C-extension bugs

Link: https://lwn.net/Articles/1067234/ Jake Edge , LWN.net: […] Hobbyist Daniel Diniz used Claude Code to find more than 500 bugs of various sorts across nearly a million lines of code in 44 extensions; he has been working with maintainers to get fixes upstream and his methodology serves as a great example of how to keep the human in the loop—and the maintainers out of burnout—when employing LLMs. It's worth reading Daniel Diniz's post on the Python forums in full. This is a great example of an engineer with specific domain expertise using LLMs to augment and amplify his abilities. Not just that, he's working closely with maintainers to ensure he's not inundating them with slop PRs or unreproducible bug reports. The part I find most interesting is how Daniel's Claude Code plugin works. He writes in his forum post : I built a Claude Code plugin called  cext-review-toolkit . The key difference from traditional static analysis is that this system tracks Python-specific invariants (refcounts, GIL discipline, exception state) across control flow, and validates findings with targeted reproducers. That is done by 13 specialized analysis agents analyzing the C extension source code in parallel, with each agent targeting a different bug class. The agents use  Tree-sitter  for C/C++ parsing, which enables analysis that pattern matching can’t do, like tracking borrowed reference lifetimes across function calls, or cross-referencing type slot definitions with struct members. Each agent can run a scanner script to find candidates, then performs qualitative review of each candidate to confirm or dismiss it. The scripts have a ~20-40% false positive rate and the agents are there to bring that down. After the agents finish, I try to reproduce every finding from pure Python and write a reproducer appendix. Later from the same post: Traditional tools like clang-tidy, Coverity, and sanitizers struggle with Python C API semantics (reference ownership, exception state, GIL constraints). The analyses cext-review-toolkit performs target those invariants specifically. Besides that, the tool uses guided semantic analysis (LLM-assisted) to analyze aspects like “was that bugfix complete, and do similar bugs still lurk in the codebase?” that other tools cannot cover. The rich set of agents cover: So is not just a set of prompts that tell Claude to go find bugs. It combines detailed descriptions of specific classes of bugs with scripts powered by Tree-sitter that allow Claude to extract rich semantic data from the codebase it's analyzing. The LLM is not doing all of the heavy lifting here. It works in tandem with human expertise encoded in prompts and deterministic scripts custom built for acting on those prompts. To me, this feels like the most effective use of LLMs for domain-specific tasks that don't exist in training data: encode as much of your logic into deterministic tools as you can, encode the more squishy parts of your domain into prompts, and let an agent drive those tools. I can see a possible future where every project has its own version of that encodes common classes of bugs the project deals with repeatedly. How much would something like this improve code quality? How much better would it be versus the generic PR review agents we use today? Reference counting: leaked refs, borrowed-ref-across-callback, stolen-ref misuse. Error handling: missing NULL checks, return without exception, exception clobbering. NULL safety: unchecked allocations, dereference-before-check. GIL discipline: API calls without GIL, blocking with GIL held. Type slots: dealloc bugs, missing traverse/clear,  -without-  safety. PyErr_Clear: unguarded exception swallowing (MemoryError, KeyboardInterrupt). Module state: single-phase init, global PyObject* state. Version compatibility: deprecated APIs, dead version guards. Git history: fix completeness (same bug fixed in one place but not another). Plus: stable ABI compliance, resource lifecycle, complexity analysis.

0 views
Stratechery Yesterday

The Inference Shift

Read more of this content when you subscribe today. If you were looking for the ideal time to IPO, being a chip company in May 2026 is hard to beat. Reuters reported over the weekend : Cerebras Systems is set to raise the size and price of its initial public offering as soon as Monday, as demand for the artificial intelligence chipmaker’s shares continues to climb, two people familiar with the matter told Reuters on Sunday. The company is considering a new IPO price range of $150-$160 a share, up from $115-$125 a share, and raising the number of shares marketed to 30 million from 28 million, said the sources, who asked not to be identified because the information isn’t public yet. The fundamental driver of the ongoing surge in semiconductor stocks is, of course, AI, particularly the realization that agents are going to need a lot of compute . What Cerebras represents, however, is something broader: while the compute story for AI has been largely about GPUs, particularly from Nvidia, the future is going to look increasingly heterogeneous. The story of how Graphics Processing Units became the center of AI is a well-trodden one, but in brief: The number one use case for GPUs has been training, which stresses the third point in particular. While the calculations within each training step are massively parallel, the steps themselves are serial: every GPU has to share its results with every other GPU before the next step can begin. This is why a trillion-parameter model needs to fit in the aggregate memory of tens of thousands of GPUs that can communicate as one system. Nvidia dominates both problem spaces, first by securing HBM ahead of the rest of the industry, and second thanks to its investments in networking. Of course training isn’t the only AI workload: the other is inference. Inference has three main parts: The two decode steps alternate for every layer of the model (they’re interleaved, not in sequence), which is to say that decode is serial and memory-bandwidth bound. For every token generated, two distinct memory pools must be read: the KV cache, which stores context and grows with each token, and the model weights themselves. Both must be read in full to produce a single output token. GPUs handle all three needs: high compute for prefill, abundant HBM for KV cache and model weights, and chip-to-chip networking to pool memory across multiple chips when a single GPU isn’t enough. In other words, what works for training works for inference — look no further than the deal SpaceX made with Anthropic. From Anthropic’s blog : We’ve signed an agreement with SpaceX to use all of the compute capacity at their Colossus 1 data center. This gives us access to more than 300 megawatts of new capacity (over 220,000 NVIDIA GPUs) within the month. This additional capacity will directly improve capacity for Claude Pro and Claude Max subscribers. SpaceX retains Colossus 2 — presumably for both training of future models and inference of existing ones — and can afford to do both in the same data center precisely because xAI’s models aren’t getting much usage; more pertinently to this piece, they can do both in the same data center because both training and inference can be done on GPUs. Indeed, the GPUs Anthropic is contracting for at Colossus 1 were originally used for training as well; the fact that GPUs are so flexible is a big advantage. Cerebras makes something completely different. While a silicon wafer has a diameter of 300mm, the “reticle limit” — the maximum area that a lithography tool can expose on that wafer — is around 26mm x 33mm. This is the effective size limit for chips; going beyond that entails linking two separate chips together over a chip-to-chip interposer, which is exactly what Nvidia has done with the B200. Cerebras, on the other hand, has invented a way to lay down wiring across the so-called “scribe lines” that are the boundary between reticle exposures, making the entire wafer into a single chip with no need for relatively slow chip-to-chip linkages. The net result is a chip with a lot of compute and a lot of SRAM that is blisteringly fast to access. To put it in numbers, the WSE-3 (Cerebras’ latest chip) has 44GB of on-chip SRAM at 21 PB/s of bandwidth; an H100 has 80GB of HBM at 3.35 TB/s. In other words, the WSE-3 has just over half the memory of an H100, but 6,000 times the memory bandwidth. The reason to compare the WSE-3 to an H100 is that the H100 is the chip most used for inference — and inference is clearly what Cerebras is most well-suited for. You can use Cerebras chips for training, but the chip-to-chip networking story isn’t very compelling, which is to say that all of that compute and on-chip memory is mostly just sitting around; what is much more interesting is the idea of getting a stream of tokens at dramatically faster speed than you can from a GPU. Note, however, that the limitation in terms of training also potentially applies in terms of inference: as long as everything fits in on-chip memory Cerebras’ speed is an incredible experience; the moment you need more memory, whether that be for a larger model or, more likely, a larger KV cache, then Cerebras doesn’t make much sense, particularly given the price. That whole-wafer-as-chip technique means high yields are a massive challenge, which hugely drives up costs. At the same time, I do think there will be a market for Cerebras-style chips: right now the company is highlighting the usefulness of speed for coding — reasoning means a lot of tokens, which means that dramatically scaling up tokens-per-second equals faster thinking — but I think this is a temporary use case, for reasons I’ll explain in a bit. What does matter is how long humans are waiting for an answer, and as products like AI wearables become more of a thing, the speed of interaction, particularly for voice — which will be a function of token generation speed — will have a tangible effect on the user experience. I have previously made the case, including in Agents Over Bubbles , that we have gone through three inflection points in the LLM era: All of this falls under the banner of “inference”, but I think it will be increasingly clear that there is a difference between providing an answer — what I will call “answer inference” — and doing a task — what I will call “agentic inference.” Cerebras’ target market is “answer inference”; in the long run, I think the architecture for “agentic inference” will look a lot different, not just from Cerebras’ approach, but from the GPU approach as well. I mentioned above that fast inference for coding is a temporary use case. Specifically, coding with LLMs requires a human in the loop. It’s the human that defines what is to be coded, checks the work, commits the pull request, etc.; it’s not hard to envision a future, however, where all of this is completely handled by machines. This will apply to agentic work broadly: the true power of agents will not be that they do work for humans, but rather that they do work without human involvement at all. This, by extension, will mean that the likely best approach to solving agentic inference will look a lot different than answer inference. The most important aspect for answer inference is token speed; the most important aspect for agentic inference, however, is memory. Agents need context, state, and history. Some of that will live as active KV cache; some will live in host memory or SSDs; much of it will live in databases, logs, embeddings, and object stores. The important point is that agentic inference will be less about GPUs answering a question and more about the memory hierarchy wrapped around a model. Critically, this articulation of an agentic-specific memory hierarchy implies a necessary trade-off of speed for capacity. Here’s the thing, though: lower speed isn’t nearly as important a consideration if there isn’t a human in the loop. If an agent is waiting around for a job that is being run overnight, the agent doesn’t know or care about the user experience impact; what is most important is being able to accomplish a task, and if entirely new approaches to memory make that possible, then delays are fine. Meanwhile, if delays are fine, then all of the focus on pure compute power and high-bandwidth memory seems out of place: if latency isn’t the top priority, then slower and cheaper memory — like traditional DRAM, for example — makes a lot more sense. And if the entire system is mostly waiting on memory, then chips don’t need to be as fast as the cutting edge either. This represents a profound shift in future architectures, but it also doesn’t mean that current architectures are going away: At the same time, these categories won’t be equal in size or importance. Specifically, agentic inference will be the largest market by far, because that is the market that won’t be limited by humans or time. Today’s agents are fancy answer inference; in the future true agentic inference will be work done by computers according to dictates given by other computers, and the market size scales not with humans but with compute. To date the invocation of “scaling with compute” has implicitly meant Nvidia bullishness. However, much of Nvidia’s relative advantage to date has been a function of latency: Nvidia chips have fast compute, but keeping that compute busy has required big investments in ever-expanding HBM memory and networking. If latency isn’t the key constraint, however, then Nvidia’s approach seems less worth paying a premium for. Nvidia does recognize this shift: the company launched an inference framework called Dynamo that helps disaggregate different parts of inference, and is shipping products like standalone memory and CPU racks to enable increasingly large KV caches and faster tool use, the better to keep their expensive GPUs busy. Ultimately, however, it’s easy to see cost and simplicity being increasingly attractive to hyperscalers for agentic inference that isn’t remotely GPU-bound. China, meanwhile, for all of its lack of leading edge compute, has everything it needs for agentic inference: fast-enough (but not leading-edge) GPUs, fast-enough (but not leading-edge) CPUs, DRAM, hard drives, etc. The challenge, of course, is compute for training; it’s also possible that answer inference is more important for national security, at least when it comes to military applications. The other interesting angle is space: slower chips actually make space data centers more viable for a number of reasons. First, if memory can be offloaded, chips can be made much simpler and run much cooler. Second, older nodes, by virtue of being physically larger, will better withstand space radiation. Third, older nodes require less power, which means there will be less heat to dissipate via radiation. Fourth, not being on the bleeding edge will mean higher reliability, an important consideration given that satellites won’t be repairable. Nvidia CEO Jensen Huang regularly says that “Moore’s Law is Dead”; what he means is that the future of computing speed-ups will be a function of systems innovation, which is exactly what Nvidia has done. Maybe the most profound implication of agents that act without humans in the loop, however, will be that Moore’s Law doesn’t matter, and that the way we get more compute is by realizing that the compute we have is already good enough. Just as drawing pixels on a computer screen was a parallel process, which meant there was a direct connection between the number of processing units and graphics speed, making AI-related calculations was a parallel process, which meant there was a direct connection between the number of processing units and calculation speed. Nvidia enabled this dual-usage by making its graphics processors programmable, and created an entire software ecosystem called CUDA to make this programming accessible. The big difference between graphics and AI has been the size of the problem being solved — models are a lot bigger than video game textures — which has led to a dramatic expansion in high-bandwidth memory (HBM) per GPU, and dramatic innovations in terms of chip-to-chip networking to allow multiple chips to work together as one addressable system. Nvidia has been the leader in both. Prefill encodes everything the LLM needs to know into an understandable state; this is highly parallelizable and compute matters. The first part of decode entails reading the KV cache — which stores context, including the output of the prefill step — to make an attention calculation. This is a serial step where bandwidth matters, but the memory requirements are variable and increasingly large. The second part of decode is the feed-forward computation over the model weights; this is also a serial step where bandwidth matters, and the memory requirements are defined by the size of the model. ChatGPT demonstrated the utility of token prediction. o1 introduced the idea of reasoning, where more tokens meant better answers. Opus 4.5 and Claude Code introduced the first usable agents, which could actually accomplish tasks, using a combination of reasoning models and a harness that utilized tools, verified work, etc. Training will continue to matter, and Nvidia’s current architecture, including high-speed compute, large amounts of high-bandwidth memory, and high-speed networking, will likely continue to dominate. Answer inference will be a meaningful market, albeit a relatively small one, and speed from chips like Cerebras or Groq (I explained how Nvidia is deploying Groq’s LPUs here ) will be very useful. Agentic inference will gradually unbundle the GPU, which alternates between stranding high-bandwidth memory (during the prefill process) and stranding compute (during the decode process), in favor of increasingly sophisticated memory hierarchies dominated by high capacity and relatively lower cost memory types, with “good enough” compute; indeed, if anything it will be the speed of CPUs for things like tool use that will matter more than the speed of GPUs.

0 views
Kev Quirk Yesterday

Hey you, start communicating!

by David Jamieson David talks about why it's good to reach out to authors when you read their content. Even if it's just to say hi. Read post ➡ Hard agree with David's comments here - he and I regularly exchange emails, actually. I try to reach out to authors whenever I read something that resonates with me. I'll also try to share their work via posts like this too. For me, blogging is the original social network; just because we're on our own spaces doesn't mean we can't be socially connected. That's why I offer comments, and a reply by email link on all posts, including my RSS feed. So yeah, start communicating! 🙃 Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views

Mythos finds a curl vulnerability

yes, as in singular one . Back in April 2026 Anthropic caused a lot of media noise when they concluded that their new AI model Mythos is dangerously good at finding security flaws in source code. Apparently Mythos was so good at this that Anthropic would not release this model to the public yet but instead trickle it out to a selected few companies for a while to allow a few good ones(?) to get a head start and fix the most pressing problems first, before the general populace would get their hands on it. The whole world seemed to lose its marbles. Is this the end of the world as we know it? An amazingly successful marketing stunt for sure. Part of the deal with project Glasswing was that Anthropic also offered access to their latest AI model to “Open Source projects” via Linux Foundation . Linux Foundation let their project Alpha Omega handle this part, and I was contacted by their representatives. As lead developer of curl I was offered access to the magic model and I graciously accepted the offer. Sure, I’d like to see what it can find in curl. I signed the contract for getting access, but then nothing happened. Weeks went past and I was told there was a hiccup somewhere and access was delayed. Eventually, I was instead offered that someone else, who has access to the model, could run a scan and analysis on curl for me using Mythos and send me a report. To me, the distinction isn’t that important. It’s not that I would have a lot of time to explore lots of different prompts and doing deep dive adventures anyway. Getting the tool to generate a first proper scan and analysis would be great, whoever did it. I happily accepted this offer. (I am purposely leaving out the identity of the individual(s) involved in getting the curl analysis done as it is not the point of this blog post.) Before this first Mythos report, we had already scanned curl with several different very capable AI powered tools (I mean in addition to running a number of “normal” static code analyzers all the time, using the pickiest compiler options and doing fuzzing on it for years etc). Primarily AISLE , Zeropath and OpenAI’s Codex Security have been used to scrutinize the code with AI. These tools and the analyses they have done have triggered somewhere between two and three hundred bugfixes merged in curl through-out the recent 8-10 months or so. A bunch of the findings these AI tools reported were confirmed vulnerabilities and have been published as CVEs. Probably a dozen or more. Nowadays we also use tools like GitHub’s Copilot and Augment code to review pull requests, and their remarks and complaints help us to land better code and avoid merging new bugs. I mean, we still merge bugs of course but the PR review bots regularly highlight issues that we fix: our merges would be worse without them. The AI reviews are used in addition to the human reviews. They help us, they don’t replace us. We also see a high volume of high quality security reports flooding in : security researchers now use AI extensively and effectively. Security is a top priority for us in the curl project. We follow every guideline and we do software engineering properly, to reduce the number of flaws in code. Scanning for flaws is just one of many steps to keep this ship safe. You need to search long and hard to find another software project that makes as much or goes further than curl, for software security. Steps involved in keeping curl secure May 6, 2026 It was with great anticipation we received the first source code analysis report generated with Mythos. Another chance for us to find areas to improve and bugs to fix. To make an even better curl. This initial scan was made on curl’s git repository and its master branch of a certain recent commit . It counted 178K lines of code analyzed in the src/ and lib/ subdirectories. The analysis details several different approaches and methods it has performed the search, and how it has focused on trying to find which flaws. A fun note in the top of the report says: curl is one of the most fuzzed and audited C codebases in existence (OSS-Fuzz, Coverity, CodeQL, multiple paid audits). Finding anything in the hot paths (HTTP/1, TLS, URL parsing core) is unlikely. … and it correctly found no problems in those areas. Completely unscientific poll on Mastodon about people’s expectations for Mythos scanning curl The size of curl curl is currently 176,000 lines of C code when we exclude blank lines. The source code consists of 660,000 words, which is 12% more words than the entire English edition of the novel War and Peace. On average, every single production source code line of curl has been written (and then rewritten) 4.14 times. We have polished on this. Right now, the existing production code in git master that still remains, has been authored by 573 separate individuals. Over time, a total of 1,465 individuals have so far had their proposed changes merged into curl’s git repository. We have published 188 CVEs for curl up until now. curl is installed in over twenty billion instances . It runs on over 110 operating systems and 28 CPU architectures . It runs in every smart phone, tablet, car, TV, game console and server on earth. The report concluded it found five “Confirmed security vulnerabilities”. I think using the term confirmed is a little amusing when the AI says it confidently by itself. Yes, the AI thinks they are confirmed, but the curl security team has a slightly different take. Five issues felt like nothing as we had expected an extensive list. Once my curl security team fellows and I had poked on the this short list for a number of hours and dug into the details, we had trimmed the list down and were left with one confirmed vulnerability. The other four were three false positives (they highlighted shortcomings that are documented in API documentation) and the fourth we deemed “just a bug”. The single confirmed vulnerability is going to end up a severity low CVE planned to get published in sync with our pending next curl release 8.21.0 in late June. The flaw is not going to make anyone grasp for breath. All details of that vulnerability will of course not get public before then, so you need to hold out for details on that. The Mythos report on curl also contained a number of spotted bugs that it concluded were not vulnerabilities, much like any new code analyzer does when you run it on hundreds of thousands of lines of code. All the bugs in the report are being investigated and one by one we are fixing those that we agree with. All in all about twenty bugs that are described and explained very nicely. Barely any false positives, so I presume they have had a rather high threshold for certainty. curl is certainly getting better thanks to this report, but counted by the volume of issues found, all the previous AI tools we have used have resulted in larger bugfix amounts. This is only natural of course since the first tools we ran had many more and easier bugs to find. As we have fixed issues along the way, finding new ones are slowly becoming harder. Additionally, a bug can be small or big so it’s not always fair to just compare numbers My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing. I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing. This is just one source code repository and maybe it is much better on other things. I can only tell and comment on what it found here. But allow me to highlight and reiterate what I have said before: AI powered code analyzers are significantly better at finding security flaws and mistakes in source code than any traditional code analyzers did in the past. All modern AI models are good at this now. Anyone with time and some experimental spirits can find security problems now. The high quality chaos is real. Any project that has not scanned their source code with AI powered tooling will likely find huge number of flaws, bugs and possible vulnerabilities with this new generation of tools. Mythos will, and so will many of the others. Not using AI code analyzers in your project means that you leave adversaries and attackers time and opportunity to find and exploit the flaws you don’t find. Zero memory-safety vulnerabilities found. Methodology note: this review is hand-driven analysis using LLM subagents for parallel file reads, with every candidate finding re-verified by direct source inspection in the main session before being recorded. The CVE to variant-hunt mapping was built from curl’s own vuln.json. No automated SAST tooling was used. This outcome is consistent with curl’s status as one of the most heavily fuzzed and audited C codebases. The defensive infrastructure (capped dynbufs everywhere, with explicit max on every numeric parse, overflow guard, CURL_PRINTF format-string enforcement, per-protocol response-size caps, pingpong 64KB line cap) systematically closes the bug classes that would normally be productive in a codebase this size. Coverage now includes: all minor protocols, all file parsers, all TLS backends’ verify paths, http/1/2/3, ftp full depth, mprintf, x509asn1, doh, all auth mechanisms, content encoding, connection reuse, session cache, CLI tool, platform-specific code, and CI/build supply chain. It should be noted that the AI tools find the usual and established kind of errors we already know about. It just finds new instances of them. We have not seen any AI so far report a vulnerability that would somehow be of a novel kind or something totally new. They do not reinvent the field in that way, but they do dig up more issues than any other tools did before. These were absolutely not the last bugs to find or report. Just while I was writing the drafts for this blog post we have received more reports from security researchers about suspected problems. The AI tools will improve further and the researchers can find new and different ways to prompt the existing AIs to make them find more. We have not reached the end of this yet. I hope we can keep getting more curl scans done with Mythos and other AIs, over and over until they truly stop finding new problems. Thanks to Anthropic and Alpha Omega for providing the model, the tools and doing the scan for us. Thanks also to the individual who did the scan for us. Much appreciated! Top image by Jin Kim from Pixabay Thanks for flying curl. It’s never dull. They can spot when the comment says something about the code and then conclude that the code does not work as the comment says. It can check code for platforms and configurations we otherwise cannot run analyzers for It “knows” details about 3rd party libraries and their APIs so it can detect abuse or bad assumptions. It “knows” details about protocols curl implements and can question details in the code that seem to violate or contradict protocol specifications They are typically good at summarizing and explaining the flaw, something which can be rather tedious and difficult with old style analyzers. They can often generate and offer a patch for its found issue (even if the patch usually is not a 100% fix).

0 views
Blargh Yesterday

Quantum safe amateur radio secure shell

I’ve previously pointed out that the AX.25 implementation in the kernel is pretty poor . It’s not really being maintained, and even when it gets fixes after I reported it , with people running LTS OSs it can take like 5 years before before the fix actually reaches users, if ever. So when writing applications, you still have to work around kernel bugs from a decade ago. This makes it kind of pointless to upstream patches. The exception is security patches, and reading between the lines of why the AX.25 code is now being removed from the kernel , it sounds like maybe some LLM (like the looming “Mythos” and the related Glasswing ) may have found some severe problems. But even if there aren’t any known security problems yet, having code is now more of a liability than ever. Code needs to be removed, or taken responsibility of. (tangent about ffmpeg at the bottom of this post) With the kernel code removed, say goodbye to the old walkthrough . Well, not “new”, per se, but “replacement”. With the socket based API about to be gone, we need some other way for applications to send packets and manage connections. For sending raw packets to and from the modem there’s KISS . I have no real complaints about it. Not much to get wrong about sending frames. It’s implemented by most modems, like the software modem Direwolf and by some radios like the Kenwood TH-D75 , so it’s not going anywhere. For connected mode (streams of in order data, like with TCP) the biggest contender seems to be AGW . Direwolf implements it, and I’ve made a messy implementation of an AGW client in Rust . The Rust API works, as we’ll see, but the code needs some refactoring and cleanup due to it being written exploratorily while I was deciding what it should even do, and how. The AGW protocol is not super amazing, but it gets the job done. One can build a connection API on top of it, as I have , and never have to think about the AGW protocol ever again. There’s another protocol called RHP, specified here and here . It came out of the XRouter project. Since XRouter is closed source, I have a strong aversion to it. It seems both counter to how I see amateur radio, and anachronistic, for it to be closed source. It’s bad enough that VARA and [Winlink][winlkn] are closed source. And people are definitely working on replacing VARA with various other modes because of it. tl;dr: I’m going with AGW for now. If someone writes a Rust crate for RHP exposing a compatible API, I certainly wouldn’t mind adding that dependency to optionally use. I have not yet implemented AGW (or RHP) in my own AX.25 stack , but I plan to. For now that means I’ll use Direwolf. My previous axsh implementation, since deleted , had some problems: So with everything but terminal management needing a rewrite, this is a reason to rewrite the whole thing. Non-requirement: Encrypt — This would violate the amateur radio license. And then, why not just use SSH? If you have an AGW server, such as Direwolf, then it’s easy to run axsh. Just start a server: Then log in: Then wait like 30-40 seconds for the handshake to complete. The reason for the wait is the large ML-DSA signatures used in the handshake. It can’t be the same direwolf instance, since Direwolf only shuffles packets between the radio and AGW clients, not from one AGW client to another. In my case I had one Direwolf connected to an ICom 9700, and another to a Baofeng UV5R using an AIOC (all in one cable) . AIOC is highly recommended for experimentation over the air. So yeah my test is between just about the cheapest VHF/UHF radio that exists, and maybe the most expensive one. In addition to running : With KISS providing packet support (and AGW providing a higher level API on top, if preferred), why not just run TCP/IP, and let the very stable OS TCP implementation take care of everything? TCP is definitely more modern, stable, and maintained, but it doesn’t scale down to slow speeds very well. A TCP+IPv4 header is at least 40 bytes, and if you don’t want to be some sort of caveman, IPv6 is another 20 bytes. At 1200bps that would be 267-400ms overhead for every packet 1 . Checking a random TCP data packet on my laptop I see that with TCP options TCP/IPv4 is actually 52 bytes, or 350ms. Counting the air time (milliseconds, not just bytes) makes this overhead problem more obvious. And because of amateur radio license reasons TCP would still need to identify the callsign, you probably have to add 17 bytes (113ms) as a surrounding header. That leaves TCP with 69 or 89 bytes overhead per packet, meaning 460ms or 593ms. And since you don’t want to tie up the RF channel for too long (only for the whole packet to be dropped due to interference), you won’t want to send packets that are too large. Of course it’s 4x as slow if you want to do something like Bell 103 on HF. AX.25 connected mode takes that down to 19 bytes (126ms) overhead (if using Mod 128 mode) per data packet. Because of the AX.25 segmenter, for bulk data TCP is not as bad as it may have sounded. For a 1500 byte TCP segment, fitting in just under 8 200 byte AX.25 frames (totalling bytes of overhead), this means 1367ms overhead instead of plain AX.25 (at bytes) 1013ms. A 1500 byte payload takes 10 seconds to send, so that’s an overhead of 13.7% instead of 10.1%. But for interactive use cases, worst case a single payload packet, it’s 467ms vs 133ms. And that’s only counting the data frames, not the acknowledgments. A TCP ACK is at a minimum bytes, or 380ms. An AX.25 RR is 18-19 bytes, or 120-127ms. That makes TCP about three times less efficient, compared to AX.25. A bigger problem with TCP, especially untweaked, is resend timers and window sizes. At 1200bps you don’t actually want too big a window size, since you don’t want to tie up the RF channel for several minutes if the other end has gone away. So a bunch of airtime tweaks are needed. And at best you’ll end up with the numbers above. Maybe you could tweak TCP to be more friendly to lower speeds, and find the other overhead acceptable. If so, then you’ll be happy to hear that axsh supports running on TCP as well. Well first, it inherits the same problems from TCP/IP. Sure, the UDP header is smaller than the TCP header, but then on top of that there’s the QUIC header. The second problem is that QUIC is meant to be encrypted. Ripping out encryption, while staying secure, seems more dangerous that keeping it simple and just working from the requirements. Probably the whole handshake would have to be redesigned. AX.25 being removed from the Linux kernel reminds me of LLM finding that bug in ffmpeg , causing all that drama. I have no dog in this fight, but in my opinion ffmpeg is in the wrong, here. Their argument seems to be all about how this particular encoder is rarely used, is just a hobby project, etc.. Ok, but it’s in your code base. Even if disabled by default, why would you want to ship a security footgun? Maybe some hobbyists out there build ffmpeg with all encoders enabled. Do you want them to be vulnerable to someone’s virus? So Google should either keep quiet, or give a patch? Well, keeping quiet because the codec is rarely used is not really an option. That’s borderline negligent and morally culpable, for when someone eventually gets hacked. So Google “should” always provide a patch in these cases? Perhaps, depending on the meaning of the word “should”. Google is rich, so “should” be morally forced to contribute to your software, just because Google (presumably, via youtube) is a heavy user of ffmpeg? Well, that just sounds like the the (non-)problem with open source software (or free software) in general. The license permits use and profit without contribution. If you wanted a tithe then you should have put that in the license. Sounds like you want everyone to be free only to do what you want. That’s not how that works. This is also why I don’t like the AGPL license . It’s not free software if it binds me in your serfdom. Actually, it’s a tiny bit more, because of the occasional bit stuffing ↩ it was implemented in C++, and not only do I prefer Rust, how could I even call something written in C++ “secure”? (a blog post for another day) used the kernel API, so that needs rewriting, used , which proved to be a bit “weird” when interoperating with some other APIs, and used crypto primitives vulnerable to quantum computers. Don’t use kernel AX.25 sockets — this means use AGW. Also work on TCP (mainly for debugging) — This means using an internal framing protocol. Be quantum safe — Use ML-DSA+ed25519 dual signed for authentication of server and client. Be efficient — This means don’t use ML-DSA for per packet signatures (they are huge), at the cost of some quantum safety (see [the README][axsh]). Actually, it’s a tiny bit more, because of the occasional bit stuffing ↩

0 views
Loren Stewart Yesterday

Fat Skills, Thin Harness, No Terminal

Tan's fat-skills/thin-harness architecture maps almost one-to-one onto an AI coworker for non-developers. The catch: when users don't have a repo, skills can't be files. Skills have to be the product.

0 views

Async I/O in Zig 0.16, today

Zig 0.16 shipped last month with , a cross-platform interface for I/O and concurrency. This is a big step for the ecosystem. Libraries can now be written against a standard I/O abstraction, independent of the runtime, and application developers can plug in whatever implementation they want. The only usable implementation shipped with 0.16 is , which uses a thread pool. When you spawn concurrent tasks, it creates OS threads to run them. Let’s see how it works with a simple example: This spawns 10,000 concurrent tasks, each sleeping for 10 seconds. On my machine, it completes in about 20 seconds: The overhead comes from spawning OS threads. If you try increasing this to 50,000 tasks, it will likely fail on most systems due to thread limits ( on Linux). This isn’t just an arbitrary benchmark. Asynchronous I/O exists to solve a real problem: network servers with many connected clients. You don’t want to spawn an OS thread for every client connection. That’s why we have event loops, coroutines, and async I/O. There is in the standard library, which is meant to use io_uring on Linux and kqueue on BSD/macOS. It’s still a work in progress though, missing many functions and doesn’t currently compile. I’ve written about zio before , and I’ve just released version 0.11 with a full implementation. It uses stackful coroutines and asynchronous OS-level I/O APIs (io_uring or epoll on Linux, kqueue on BSD/macOS, IOCP on Windows). Here’s the same example using zio: The code is almost identical. You just initialize a zio runtime and use its method to get the interface. With zio, the same 10,000 tasks complete in about 10 seconds: That’s the expected time, since all tasks run truly concurrently. You can increase this to 50,000 or more tasks and it will continue to work, limited only by available memory. You can use this instance for anything you’d use for. To write an HTTP server with , for example, just pass zio’s and it will work the same way. If you want to use async I/O in Zig 0.16 with the standard APIs, you don’t need to wait for to be ready. Zio’s implementation is still new, so if you hit any problems, please reach out on GitHub and I’ll be happy to help.

0 views
Susam Pal Yesterday

The Problem of Pedagogy in Advanced Mathematics

It is a commonly held opinion that educational institutions could do more to improve the pedagogy of mathematics. This is especially true in school, when students are first exposed to new subjects. Poor exposition can turn students away from mathematics for a lifetime. Only the highly motivated ones continue to engage with the subject. This is very unfortunate because mathematics is a beautiful subject and it is filled with wonder. It also teaches rigour in reasoning, clarity of thought and the discipline of constructing arguments from first principles to obtain intricate and often beautiful results. What is perhaps less known is that pedagogy is a problem even for graduate-level mathematics students and professional mathematicians. The proofs in many graduate-level mathematics textbooks are, in my humble opinion, not really proofs at all. They are closer to high-level outlines of proofs. The authors simply do not show their work. The student then has to put in an extraordinary amount of effort to understand and justify each line. Sometimes a 10-line argument in a textbook might expand into a 10-page proof if the student really wants to convince themselves that the argument works. I am not a mathematician, but out of personal interest, I have worked with professional mathematicians in the past to help refine notes that explain certain intermediate steps in textbooks (for example, Galois Theory by Stewart, in a specific case). I was surprised to find that it was not just me who found the intermediate steps of certain proofs obscure. Even professional mathematicians who had studied the subject for much of their lives found them obscure. It took us two days of working together to untangle a complicated argument and present it in a way that satisfied three properties: correctness, completeness and accessibility to a reasonably motivated student. I don't mean that the books merely omit basic results from elementary topics like group theory or field theory, which students typically learn in their undergraduate courses. Even if we take all the elementary results from undergraduate courses for granted, the proofs presented in graduate-level textbooks are often nowhere near a complete explanation of why the arguments work. They are high-level outlines at best. I find this hugely problematic, especially because students often have to learn a topic under difficult deadlines. If the exposition does not include sufficient detail, some students might never learn exactly why the proof works, because not everyone has the time to work out a 10-page proof for every 10 lines in the book. Many good universities provide accompanying notes that expand the difficult arguments by giving rigorous proofs and adding commentary to aid intuition. I think that is a great practice. I have studied several graduate-level textbooks in the last few years, and while these textbooks are a boon to the world because textbooks that expose the subject are better than no textbooks at all, I am also disappointed by how inaccessible such material often is. If I had unlimited time, I would write accompaniments to those textbooks that provide a detailed exposition of all the arguments. But of course, I don't have unlimited time. Even so, I am thinking of at least making a start by writing accompaniment notes for some topics whose exposition quality I feel strongly about, such as s-arc transitivity of graphs, field extensions and related topics. Read on website | #miscellaneous

0 views

Principles almost always have exceptions, often when they conflict with other principles (Rule 5)

I try hard not to lie, but would I lie to save my family from being murdered? Of course. In that case honesty loses to protecting my family. Principles almost always have exceptions, often when they conflict with other principles (which I’m calling Rule 5 in this rules series ). This rule follows from Rule 1: Reality is always more complicated . Put another way, there are always edge cases. Often edge cases between principles (personal or otherwise) get resolved via a principle priority stack , an implicit or explicit hierarchy of principles. An example is Isaac Asimov’s Three Laws of Robotics : First Law: A robot may not injure a human being or, through inaction, allow a human being to come to harm. Second Law: A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. Third Law: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. Asimov spent much of his robot fiction dissecting situations where these “Laws” explicitly conflict or otherwise get subverted by edge cases. While these are make-believe situations, the priority stack mental model explains a lot of otherwise seemingly irrational behavior in the real world. For example, why do politicians and political parties seem to flip-flop on supposedly strongly held principles? I do not think in all cases it is because they actually don’t have strongly held principles, but that there is something higher up in their principle priority stack, usually winning elections. The same is true with corporations, especially public ones. Most have profits (or a similar financial metric) at the top of their principle priority stack, above other customer-focused principles like sound privacy practices, good service, etc. Some corporations have it the other way around, like DuckDuckGo where we’ve forgone a lot of profits because privacy is higher on our principle priority stack, or some B corps with other elevated non-financial priorities. The priority stack model itself has edge cases! A given priority stack holds under normal conditions, but extreme conditions can reshuffle the ranking. For example, politicians do occasionally reach a breaking point where a usually lower principle (to winning elections) is about to be violated badly enough that they temporarily elevate it to the top, accepting re-election risk as a result. It’s rare, but it does happen. What can you practically do with this Rule? I think at least two things. First, when thinking about principles, either your own or others’, you can ask how they arrange in a priority stack, especially relative to a given situation you are facing or are concerned with facing in the future. This arrangement can help clarify what you or others would actually do. Second, if you’d like to convey that you care about a particular principle, I think it helps to publicly signal it in a priority stack context, at least against one other perceived high priority. For example, I'd love to vote for politicians who name a few principles they hold higher than re-election, such that they publicly commit to voting for those principles regardless of future electoral consequences. You can't fully predict or trust the future, but a stated priority stack is more trustworthy than silence, and more trustworthy still when there's a track record behind it. The Dark Knight (2008). Thanks for reading! Subscribe for free to receive new posts or get the audio version .

0 views
Unsung Yesterday

A preview of the future

In his latest video , Shelby from Tech Tangents unpacked, installed, and put to use a truly forgotten product: IBM 3119, one of the first consumer flatbed scanners. The setup was a small nightmare, needing a rare hardware card installed in a specific computer, an ultra-particular combination of two operating systems working in lockstep, and even some careful memory balancing. Even after all that, a 300dpi page scanner in the late 1980s was still a force to be reckoned with. It’s hard to remember how enormous scanned files were compared to anything else then, even on a black-and-white scanner like this one. The video shows a simple 90-degree image rotation in highest quality requiring over 9 hours , and I believe it. But deep inside the video, at precisely 19:31, for only ten seconds, something appears that is absolutely worth celebrating. The nascent scanner software has a “curves” feature that allows you to redraw the shades of gray to capture shadows, highlights, and midtones exactly how you want them. Today, the feature would look something like this, with a real-time preview: There would be absolutely no way to do something like this in the late 1980s, when just rotating an image is an overnight operation, right? And yet: How was this accomplished? Absolutely brilliantly. Remember the palette swapping technique? Here, the entire screen’s palette is 256 shades of gray. It’s a very particular kind of a linear palette, and so you can easily take that line and… well, turn it into a curve. Since palette swapping happens on the graphic card, it takes as little as one frame of time, allowing for it to react to mouse movements as they happen. This must have been mind blowing to experience in the moment. Sure, it’s only a preview, and actually applying curves to the image would take many minut— No. This is a wrong frame of mind. Here’s my hot take: There are moments in software where the preview is more important than the feature following it. That’s because the preview making things faster isn’t just the difference between finishing something sooner or later. It’s a difference of doing something or not doing it at all. Would you even attempt to use curves if each adjustment took minutes or hours, especially in a land without undo? I love this preview that hints at what the future will be. I like this clever use of extremely limited technology and tight collaboration between engineering and design. It must have been nice to be in the room whenever someone had the flash of insight to use palette swapping this way. #above and beyond #flow #graphics #history

0 views
Jim Nielsen Yesterday

Out With the JS, In With the HTML

I’ve been posting about how you can make lots of HTML pages and leverage navigations over in-page, JS-dependent interactions . Now I’m gonna post another example. On my icon sites, I have a little widget that allows you to resize the icons you’re looking at. Previously, I implemented this functionality as a web component that looked something like this: The attribute corresponded to an enumeration like which mapped to actual pixel dimensions like 64×64 or 512×512. When the little widget was clicked to render icons at a different size, JavaScript changed the attribute on the custom element. From there, the web component’s JS took over changing the dimensions of the children elements, their attributes, etc. It all worked pretty well. However, because that was a client-side solution to my otherwise entirely pre-rendered static site, it required some templating logic and data be duplicated and sent over the wire to every client. I didn’t love that for various reasons — like “Crap, I updated this one small part of how my icon list renders on the server, but forgot to tweak it on the client, so things are slightly broken now.” Then one day the thought hit me: instead of relying on JS to make that interaction work (click, execute JS, modify in-page DOM to a new list), what if I just made that interaction a navigation? Click, navigate to a new list. Instead of “every list of icons ships with some JS that allows them to re-render at four different sizes” I could do “every list of icons ships in four different sizes”. So I tried it. And guess what? Once I added some code to support CSS view transitions, I got a cool effect amongst the icons for free — that’s right, by removing code! Works nice on mobile too! I know I’m not doing anything particularly novel here, but as we continue to get new, powerful primitives on the web — like CSS view transitions — I find it really interesting to revisit basic patterns and explore what’s possible now that wasn’t previously. It’s fun to ask yourself: “Could I remove some client-side JS and get a better overall experience?” If the answer is yes, I’ll bet you the development experience (and maintenance burden) is much improved too! Reply via: Email · Mastodon · Bluesky Previously: one page, like , with JS to re-render the icon list based on user interactions. Idea: four pages, like , each a different icon list size.

0 views