Latest Posts (20 found)

Summer Break: Week of June 29

Stratechery is on summer break the week of June 29. There will be no Weekly Article or Updates. The next Update will be on Monday, July 6. Dithering ,  Sharp Tech , and  Sharp China  will also return the week of July 6.  Greatest of All Talk  and  Asianometry  will continue to publish. The full Stratechery posting schedule is  here .

0 views

Do excellent vulnerability reports

Over the years, we have received, read and handled way over one thousand vulnerability reports filed against curl . We have seen most kinds. It is time for me to try to help future reporters by providing a short guide on how to submit a truly excellent vulnerability report to an Open Source project. We tend to call everyone who reports a security problem a security researcher , because by the act of the submission itself they fulfill the definition. There are however many different kinds of people who submit reports; from the most rookie youngster with limited experience, to the multi-decade experienced senior in the field. Most reports submitted to a project like curl come from reporters who never submitted anything to the project before and are completely previously unknown. Many reporters use hacker handles or pseudonyms, so there is not a lot to learn about the person behind the report either. We don’t know the reporters’ age, experience level, employer, sex or on which continent they live. But also: none of those things matter. When you submit a vulnerability report, consider telling the project how you want to get credited, should they consider your report real. There is a potentially almost unlimited amount of security researchers that can find problems in a project. The project receiving your report only has a limited small number of overloaded maintainers that take care of the reports. Consider this imbalance. Make your report as easy as possible for the team to manage. To us maintainers who receive a steady stream of vulnerability reports, it rarely matters exactly how the problem was detected. Whether you fell over it by accident, you found it by reading every single line of source code or if an AI pointed it out to you, it has little relevance to the security team. The team primarily cares about if the problem is real and if it is, how serious the impact is . If the problem is documented, then it likely isn’t a vulnerability. This is a common theme in curl: people report that they can find something strange or peculiar to happen when they do something, only to have one of us point out that the action is either documented to have that side-effect, or the action was done in spite of clear warnings in the documentation. To make a good vulnerability report, you should make sure you understand what the software is supposed to do – and what the documentation says its limitations and conditions are. A good Open Source project has those things documented. Figure out where and how to submit your report. If you found several problems, it is considered polite to ask the team how they want to receive the rest. As separate individual submissions or maybe as a curated list. Perhaps paced at a slow rate to avoid overflow. Never circumvent the submission method suggested by the project. That is impolite. Consider the initial submitting of the issue to be the first step in a multi-step communication process with the project that will continue for as long as at least one of your reported issues has not been resolved or dismissed. This can be days, weeks or in some cases even months. Expect responses and follow-up questions. Be prepared to clarify, expand and maybe provide more code and reasoning. Remember that you submit vulnerability reports in order to help and improve the project. These days people like to create enormously long and detailed reports that have all the details, often explained three times and with several embedded lists using bullet points describing impact and providing more or less good analysis attempts. Your first paragraph of the report should be a human-written, brief explainer of what the problem is and what badness it leads to. You should be able to explain that in just a few sentences. It is a reality-check, because if you can’t do this, if you don’t understand the flaw enough yourself to write such a paragraph, then you have homework to do. Figure it out, then come back and write the intro paragraph. Having a quality intro saves a lot of time for the security team receiving your report. Be aware that the Open Source project you contact may be overloaded, on vacation or seeing your report as yet another duplicate they already saw reported seven times. Be helpful and respect that you add a load to a small team that probably consists of volunteers working on this in their spare time. Even if you have used a lot of or just a little AI when finding the issue and writing up the report, you must make sure that you communicate as a human . With your human communication skills. Your report should contain a reproducer. Ideally a fully contained and stand-alone script or source code that the security team can build and run to see the vulnerability trigger. A reproducer helps prove to the team that the problem is real or maybe already an accepted risk or behavior. It is also convenient for the developers to first understand and reproduce the issue, and then they can convert the reproducer into a project test case for the pending fix. Without providing a reproducer in your report, you instead push that work to the receiving end. We still need the reproducer. We still need a test case. Provide a patch for the problem. If you can figure out a way to fix the code to make your finding no longer trigger, that is great information for the security team and such a patch usually helps them understand the issue better and get a speedier result. It reduces the load. Sure, such a patch is often perhaps not perfect and it can usually be improved and expanded as the developers have a different view and a more nuanced understanding of the problem and the software architecture involved. It still helps. Getting 80% towards the target is still valuable. Usually you should look for vulnerabilities in the latest version of the software, often even using an up-to-date git repository. Whatever version you used to find it, you need to specify that in your report. If the problem turns out to be real, which your report claims and you should never report anything if you don’t think so, it is then also immediately interesting to know when this problem first appeared . Which is the earliest version of the software that you can trigger this problem with? The project will want to know this to write up a proper advisory for the issue. You can help figuring this out by bisecting etc. Remain available after your initial submission. In the curl project at least, we want to work with the reporter to make sure we get every angle and detail right. First, when trying to understand and assess the initial report and agreeing on a severity for it. Then, we jointly produce and agree to a remedy (patch) for the problem, which ideally means taking the reporter’s version and massaging it into perfection. If the problem is serious enough, there could be reasons to discuss a rushed patch release at an earlier date than the pending release would otherwise happen on. To reduce the time users in the wild remain vulnerable. Finally, we collaborate on the description and explainer for the problem that goes into the security advisory . For every CVE that is registered and assigned to a particular vulnerability, there needs to be a detailed security advisory written. It should ideally describe the issue, how it triggers, what it means, the impact, the affected version ranges and more. Everything related to the vulnerability that we can think might help users. Your job as a security researcher is to make sure the description in the advisory matches your finding, your understanding of the problem and that the description is understandable. For every confirmed security report, the receiving project will try to learn from it and fix code and practices to avoid making the same mistake again. As a reporter, your job is to learn from the submission experience and try to improve your reporting procedure and approach for the next time. Then submit your next report!

0 views

Books: January to June, 2026

I stopped tracking books using apps or services, even though there are good ones out there. I have two little shelves in my bedroom, on the left I put books I want to read, on the right the ones I have read. The plan was to empty the one on the right halfway through the year and post a picture here on the site to remember what I have read. This is that picture, and those are the books I have read so far in 2026. A lot of Terzani, a lot of stories about death and suffering, about misery and tough times, but also a lot of stories about nature and mountains. The fiction-to-non-fiction ratio is probably 3:1, which is unusual for me, considering I read non-fiction almost exclusively for most of my life, but that’s fine. Look forward to fill up the shelf again and post a second picture here on the site somewhere in late December. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
iDiallo Today

I turned my prologue into a short video

It's hard to write a whole book. So for now at least, I've turned the prologue of my book into a short video. I hope you enjoy it.

0 views

Notes from Bryan Cantrill’s “Intelligence is not Enough”

I quite enjoyed this talk from Bryan Cantrill where he discusses the difficult engineering problems they overcame while working on their company Oxide . Some of the problems they ran into were bugs. But these weren’t any ordinary bugs, they were company-destroying bugs: bugs that, if they couldn’t be fixed, would sink the entire company. And the difficulty in solving these bugs was that they had no precedent. Any documentation or knowledge they could find around the symptoms of the problem was actively incorrect. In fact, Bryan says that the team’s breakthroughs on these bugs were solutions that an artificial super intelligence would’ve never suggested because they ran against all known and available reasoning, documentation, and knowledge. His point being: intelligence isn’t everything. Human values are still incredibly important. Intelligence alone does not solve problems like [the ones we encountered]. Our ability to solve these problems had nothing to do with our collective intelligence as a team. We’ve got a terrific team, but it’s a lot more than just intelligence. And in particular for these [kinds of] problems, and many like them, we had to summon the elements of our character not our intelligence. Our resilience. Our teamwork. Our rigor. Our optimism. […] We talk about super intelligence, but is anyone talking about super collaboration or super teamwork? We absolutely needed teamwork [at Oxide]. If human values like curiosity are what led to breakthroughs — not the application of synthetic intelligence — why is there so much emphasis on intelligence these days? Bryan has a curt analysis: This infatuation with intelligence comes from people who just don’t get outside enough. He notes how intelligence isn’t everything in a job interview. Like, you don’t hire people by giving out an exam and taking whoever scores highest. You try to suss out other aptitudes. Nobody looks at applicants who lack values like teamwork or optimism and says, “Well, they can’t work with anyone and they’re incredibly unpleasant to be around, but their intelligence is great — let’s hire them!” Intelligence is great, but it’s not everything. We do a disservice to our own humanity when we pretend that [AI] can engineer autonomously. A cogent case for the values of our humanity. More like this please. Reply via: Email · Mastodon · Bluesky

0 views

On ends

I’m sitting on a rock, in the middle of a forest. On my right, not even 30cm away from me, a dog panting like crazy, because even though it’s almost 8pm, it’s still way too warm for his liking. To be fair, anything above freezing probably fits that description. Behind me, the ruins of a church that was, and no longer is. A stone arch and a few chunks of walls are all that’s left. I don’t know what happened to this church. I could probably look it up, but I don’t need to do it. Knowing would not add anything to my experience of sitting here. Is it important to know how things end? Is it important to know when something has ended? Some things are clearly easy to know when they’re done: I have a bottle of water that’s almost empty, and the end is gonna come pretty fast. Other things are a lot trickier. When does a life end? I remember reading that the medical definition of death keeps evolving as our technology progresses and we’re able to bring people back to life. Maybe in the future we’ll be able to upload our brains to the matrix and “live” forever, who knows. I’ve been thinking a lot about the end of things lately, as my mind wandered around, stressed out by a series of things not worth discussing. And thinking about the end of myself is weirdly comforting. The classic this too shall pass. Everything is transitory after all, and life itself is impermanent. We’re here now, we might be gone tomorrow. And when gone, what’s left? Maybe just ruins, traces of our past, books left on a bookshelf, photos in a box, a blog online perhaps, destined to be washed away quickly like everything else in the digital world. If you’re wondering where I’m going with this post, I’m afraid the answer is nowhere. I’m just sitting on a rock, in the middle of nowhere, thinking about death as a way to figure out how to go through life. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
Unsung Today

“Icons that are iconic”

Apple might have undone the macOS Tahoe menu icons decision , but this wasn’t the only contentious iconography issue in their ecosystem. On his blog, Jim Nielsen writes how Apple filed away so much expression by forcing rigid icon bureaucracy in macOS. Nielsen focuses mostly on distinctiveness; previously, you could make the icon unique by its general shape or the shape of its contents, but one of these two levers has now been taken away: This over-emphasis on “systems” design seems endemic to modern software. Systems prescribe rules because they are the easiest attributes to document, enforce, and automate — “All icons must use this shape, this lighting, this stroke.” Excellence, by contrast, is harder to systematize. It requires judgment, taste, care, experience, and a sensitivity to context — all in service of meaning and purpose, not superficial similarity. However, one also can’t help but notice how ugly and amateurish the Creator Studio icons are, so it all feels absolutely like a net negative – the new system took something away and the proposed replacement feels low quality: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/1.1600w.avif" type="image/avif"> Elsewhere, on Rogue Amoeba’s blog , Paul Kafasis straight up asks Apple to undo the 2025 decision to contain macOS icons inside squircles: Apple’s prohibition on shapes is a step backward for both usability and creativity in app icons. Icons are now harder to distinguish because they’re no longer allowed to be distinctive. But there’s no technical reason for it. Apple could, and should, once again allow icons to take on a wide variety of shapes. Both these prompted me to think a bit of Apple’s app iconography as a system. Let’s start with iOS: = 3x)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/2-framed.1600w.avif" type="image/avif"> Recently, a new option has been added to remove names of apps, which is another way to disambiguate them. = 3x)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/3-framed.1600w.avif" type="image/avif"> = 3x)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/4-framed.1600w.avif" type="image/avif"> Also recently, Apple’s generally unpleasant-looking theming options (color tinting and glassification) reduced color coding as a way to recognize a particular icon. = 3x)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/5-framed.1600w.avif" type="image/avif"> = 3x)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/6-framed.1600w.avif" type="image/avif"> At the same time, iOS is still highly spatial . Most apps have a specific physical place on a specific page of the Springboard, or inside a specific folder. I believe that this helps a lot even if shape coding, color coding, and name disambiguation are failing or turned off to begin with. Now, for MacOS: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/7.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/7.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/8.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/8.1600w.avif" type="image/avif"> However, more recently, the iOS squircle shape has been first strongly suggested ( in 2020 ) and then rigidly enforced (in 2025) for macOS as well. But then, the usage of app icons in macOS is different than in iOS. First of all, macOS isn’t nearly as spatial as it used to be, and I would say not as spatial as iOS. Even Dock is more malleable compared to the memory palace rigidity of the Springboard, and its overflow section with suggestions and hand-off is very fluid. ⌘Tab is completely non-spatial and just like the Dock doesn’t upfront identify apps by their names. App icons also appear in more fluid contexts like Spotlight, Finder, and the right side of the menubar (I know iOS has some of those as well, but I would imagine they’re getting much less use overall). This all increases the pressure on icons to be easily distinguishable. At the same time, there are fewer issues with custom backgrounds on macOS. Most icon surfaces have opaque backgrounds and while you can keep your apps on the desktop or put backgrounds in Finder windows, I don’t think that’s very common. I’m probably missing some other aspects, but this would be my summary of where we’re at: = 3x)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/9-framed.1600w.avif" type="image/avif"> = 3x)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/10-framed.1600w.avif" type="image/avif"> People’s trust in Apple’s skillset has deteriorated after the unveiling of horrendous icon redesigns in 2025’s Tahoe , and more recently in the abovementioned Creator Studio (the 2026 updates are nice, but very minor ). This is in some contrast with other controversial visually-motivated changes appearing at the same time. Say what you want about Liquid Glass, but there are moments it looks absolutely gorgeous (see the video below for perhaps my favourite Liquid Glass surface). Forced menu icons felt similar: embarrassingly naïve as a system, but with icons themselves executed well (which you can still appreciate when perusing SF Symbols ). But the app icon changes seem to have been assigned to the team that delivered on neither good visual craft, nor good systems thinking. I think it’s fair to look at Creator Studio specifically, and fear Apple is following in Microsoft’s and especially Adobe’s unforgivable footsteps in prioritizing abstract corporate identity goals over both functional and visual aspects of app iconography. Adobe’s product icons used to be beautiful and distinct before they got all shoved into the same “uppercase + lowercase letter” framework that became a canonical example of a system that took something away from the user but didn’t really give anything in return: = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/13.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/13.1600w.avif" type="image/avif"> I also feel this feeds right into another fear of Apple’s actions steamrolling over particularly indie app developers where being able to express one’s identity via the app icon feels much more important than it would be for a huge company. I don’t see Apple abandoning their stance on the rigid, distinctive app icon squircle shape. It’s possible that iOS apps will start appearing on touchscreen Macs outside of screen mirroring. Even without that, it just simplifies things for them, even if the jobs for macOS app icons are not the same as those for iOS app icons. At the same time, I could see Apple allowing the app icons to stick out of the basic squircle shape, like some macOS apps did in between 2020 and 2025; I believe it would even be possible to detect programmatically if the basic squircle shape is still there in the background. This would improve shape coding, and give icon designers some clearly much-desired flexibility. The icons below still register as squircles to me – why not allow this as an option? (For both macOS and iOS.) = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/14.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/14.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/15.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/15.1600w.avif" type="image/avif"> = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/16.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/icons-that-are-iconic/16.1600w.avif" type="image/avif"> I wish Apple standardized app icon changing UI on iOS. Right now, each app offers their own interface in a different place – you could see that above – and rarely links to that place from the Springboard’s long-press menu. But imagine if you could nicely change app icons in situ in the same flow when you’re customizing the Springboard itself! (And then, the same for Dock and macOS.) I think it would also be a nice gesture to allow to rename iOS Springboard apps to whatever you want the same way you can rename folders, to give some users an opportunity to disambiguate by that if everything else fails. #apple #craft #iconography I believe the rigid squircle shape of app icons starting with the first iPhone was to make them look like a grid of buttons, and also to establish apps as a new primitive, particularly with the subsequent arrival of the App Store. (Similarly how over time “a face in a circle” became recognizable as a “personal avatar,” a user proxy primitive.) Soon, the rigid shape also helped when custom Springboard wallpapers arrived in 2010 – it reduced the likelihood of apps blending with the background. Recently, a new option has been added to remove names of apps, which is another way to disambiguate them. Also recently, Apple’s generally unpleasant-looking theming options (color tinting and glassification) reduced color coding as a way to recognize a particular icon. The original Mac OS X followed in the footsteps of the classic Mac OS and allowed arbitrary shapes, allowing for more flexible shape coding , although with some guidance on angles and styling: However, more recently, the iOS squircle shape has been first strongly suggested ( in 2020 ) and then rigidly enforced (in 2025) for macOS as well. Apple has not done a good job shepherding their app iconography system. The system feels too rigid, and some of its ostensible benefits (dark mode, color tinting, glassification) have been executed poorly. You could imagine a better tinting system that doesn’t feel like a cheap CSS filter applied to the icon, or (my dream!) a way to tint individual app icons. I personally love when apps – here Raindrop, Bear, and Retro – give you a lot of icon options in various colors, so I can invest in color coding: People’s trust in Apple’s skillset has deteriorated after the unveiling of horrendous icon redesigns in 2025’s Tahoe , and more recently in the abovementioned Creator Studio (the 2026 updates are nice, but very minor ). This is in some contrast with other controversial visually-motivated changes appearing at the same time. Say what you want about Liquid Glass, but there are moments it looks absolutely gorgeous (see the video below for perhaps my favourite Liquid Glass surface). Forced menu icons felt similar: embarrassingly naïve as a system, but with icons themselves executed well (which you can still appreciate when perusing SF Symbols ). But the app icon changes seem to have been assigned to the team that delivered on neither good visual craft, nor good systems thinking. I think it’s fair to look at Creator Studio specifically, and fear Apple is following in Microsoft’s and especially Adobe’s unforgivable footsteps in prioritizing abstract corporate identity goals over both functional and visual aspects of app iconography. Adobe’s product icons used to be beautiful and distinct before they got all shoved into the same “uppercase + lowercase letter” framework that became a canonical example of a system that took something away from the user but didn’t really give anything in return: I also feel this feeds right into another fear of Apple’s actions steamrolling over particularly indie app developers where being able to express one’s identity via the app icon feels much more important than it would be for a huge company. I don’t see Apple abandoning their stance on the rigid, distinctive app icon squircle shape. It’s possible that iOS apps will start appearing on touchscreen Macs outside of screen mirroring. Even without that, it just simplifies things for them, even if the jobs for macOS app icons are not the same as those for iOS app icons. At the same time, I could see Apple allowing the app icons to stick out of the basic squircle shape, like some macOS apps did in between 2020 and 2025; I believe it would even be possible to detect programmatically if the basic squircle shape is still there in the background. This would improve shape coding, and give icon designers some clearly much-desired flexibility. The icons below still register as squircles to me – why not allow this as an option? (For both macOS and iOS.) I wish Apple standardized app icon changing UI on iOS. Right now, each app offers their own interface in a different place – you could see that above – and rarely links to that place from the Springboard’s long-press menu. But imagine if you could nicely change app icons in situ in the same flow when you’re customizing the Springboard itself! (And then, the same for Dock and macOS.) I think it would also be a nice gesture to allow to rename iOS Springboard apps to whatever you want the same way you can rename folders, to give some users an opportunity to disambiguate by that if everything else fails.

0 views
Kev Quirk Yesterday

The Laziest Generation

by Ibrahim Diallo Ibrahim talks about house prices in the US, how it's only getting worse, and the perception from previous generations that kids today are somehow lazy because they can't afford a house before the age of 40. Read post ➡ Being the father of 2 young people, this worries me too. Despite this post being US-centric, the script is the same here in the UK. Unless my kids generation come out of school on a 6 figure salary, they don't have a hope in hell of buying a decent house. To put that in context, here in the UK a £100,000 salary puts you in the top 3% of earners . In the late 90's a house would cost around 4x a person's salary on average. Today it's 8x . So most can forget about saving for a deposit. Instead younger generations will have to rely on inheritance, which will only exacerbate the late stages of life in which people are buying houses. Something has to give. Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views
iDiallo Yesterday

The Laziest Generation

I don't understand why this generation can't afford a home. When my grandfather was 18, he had already saved enough money from his paper route and various odd jobs to buy his first home. By the time my father turned 26, he was already married, had his first child, and was moving into his first home. We lived frugally, and our parents taught us the value of spending wisely. Today's man-children, at the ripe old age of 40, still cannot afford a home. Yet they have no problem eating out every day, going to the movies, and buying popcorn and avocado toast. Add in subscriptions to ten different services they barely use, and that's money thrown out the window. They don't see the correlation between their spending habits and their inability to buy a home or save money in the first place. I understand that my grandfather's house only cost $12,000, and my father bought his for $50,000. Mine was much more expensive, I paid $150,000, and that house is now worth a million. I understand that by the time my children are 26, it will probably be worth $10 million. If they start saving now, they'll have a shot. But I can tell they will choose reckless spending over saving, and I simply do not understand this generation. Last week I found a flyer wedged into my front door from a real estate agent in the neighborhood. On it was a list of homes she had sold, each entry showing a picture of the house and its sale price. The cheapest was $970k. For her, this was a record of her work. "Hire me and I'll sell your house," a calling card of bragging rights. For me, it was a nightmare. I don't live in an affluent neighborhood, yet somehow all the homes are worth a million dollars. Thirteen years ago, a colleague of mine bought hers in this same neighborhood for around $200k. It was a savvy investment. If she sells now, she'll get at least five times what she paid. While that price was reasonable at the time, meaning you could dedicate a third of your salary to your mortgage, at a million dollars, you're paying far more. That's between $7,000 and $10,000 per month. Good luck finding a job that pays three times that. To satisfy that requirement, you'd need to earn $250k to $360k a year. Cutting back on avocado toast or prepping your own meals won't save you nearly enough. If you squint and stretch your imagination, maybe it's possible to afford these homes, not by cutting back, but by finding new sources of income. But what about the next generation? My kids. When they're in their 20s and 30s, how much will houses cost? If we continue at this pace, the wooden houses in this neighborhood are going to cost at least $10 million each. And we'll call the next generation even lazier. Maybe we'll tell them they're splurging on water bottles. "Back in my day, we drank tap water." Or maybe they're not using Grok enough to come up with a smarter financial strategy. I don't think this is sustainable. The only way forward may be for everything to collapse first. See you at the homeless camp where we'll all end up.

0 views
Unsung Yesterday

The curious case of the disappearing Polish S

Speaking of remastering (and diacritics ), I grabbed my older Medium deep dive called The curious case of the disappearing Polish S , and put it on the new site. It looks so much better than on Medium and while I was at it, I’ve redone all the visuals, and updated it a little bit. = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/the-curious-case-of-the-disappearing-polish-s/1.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/the-curious-case-of-the-disappearing-polish-s/1.1600w.avif" type="image/avif"> It’s still probably one of my favourite bugs I’ve encountered. I hope you enjoy! #bug deep dives #keyboard #localization #marcin wichary #text editing

0 views
ava's blog 2 days ago

enduring the heat wave in germany

I live in an apartment that first gets heated up on one side before noon, then later from the other side. My kitchen is especially hot each year because it has a huge bay window with no shutters installed. My strategies for keeping cool have been to air out everything at night, and if possible draw in and circulate air via a fan during some of it. Then as soon as the sun is coming up, closing windows, lowering the existing outside shutters so the sun can’t heat up the glass or insides, and always keeping the kitchen door closed so the heat is contained within. I avoid opening the windows during the day to not let heat in, except if I really need fresh air or the humidity is too high. Humidity is the thing that is wrecking us the most in this, which is why it is often futile to ask people elsewhere how they deal with these high temperatures when those people live in very dry climates. The humidity messes with your body’s ability to exude heat, and in worst case, results in the wet bulb effect . That is also why even people from hotter countries can suddenly struggle elsewhere (like Europe), together with the angle at which sunlight hits Earth at that area being different (a lower sun angle spreads the same amount of energy over a larger area, making it feel cooler, while a higher angle concentrates energy on a smaller area, increasing warmth). This is why fans with water cooling and tips like hanging a wet T-shirt in front of a fan, constantly misting yourself or wearing wet clothes etc. can sort of backfire and make your home a bit more unbearable, depending on the circumstances. I also have a fan with water cooling with optional cooling bricks/batteries, and it’s currently on because we hang out in front of it, but I’m mindful of when I turn that mode on and for how long. In the next few weeks, we are planning to add sun protection foil to some windows, and when the extreme demand is over in fall, I’ll buy a Midea Porta Split and install it in the living room. Good tips in general, some summarized from above: Hydrate a lot, even before you are actually thirsty. Stay inside if possible. Keep the added humidity to a minimum. Know what you are trying to do with drinks and showers. Cool drinks and showers offer relief, but can make you heat up after. Hot beverages and showers can make everything feel cooler after and help you sweat. I like both, depending on the situation. Wrap ice packs or similar stuff in a towel and put them under your feet or in your armpits. If possible, lower shutters so the sun cannot heat up the interior and the glass. Maybe install sun protection foil on windows (most are plant-friendly). I’ve also seen others provisionally use those reflecting covers for cars on their windows, or aluminium foil. Make sure that if it’s behind the glass, the heat won’t be trapped and make the glass crack, so preferably attach it on the outside. Sunscreen, wide breathable and covering clothing, sun umbrellas and hats. During fall/winter, maybe during Black Friday sales, get a portable split cooling system. Portables do not need structural changes to the building, which is why they tend to be allowed in rental units as they can be removed without a trace and aren’t in use all year. Shitty landlords might get mad to see it in your window, but in many countries, there already is positive case law about them and the usual AC dismissals don’t apply to them. Set out flat bowls of water in the shadow for wild animals and refill. Consider different ones for different sizes (a flat one with stone pebbles for insects, a relatively flat but water-only one for hedgehogs etc., one bird bath…). Use cool tiles and cooling mats for pets. Keep an eye out for baby birds who flee their overheated nests too early; maybe you can save some of them. Especially bitdd who live in attics and roofs are dying right now (swifts etc.) If possible and you can plan the shipment, avoid deliveries. Keep water around for delivery personnel. Eat smaller snacks and portions spread out throughout the day instead of big meals so your body doesn’t heat up as much during digestion. Leave the windows open all day. Let the sun heat up your interior, if possible; try at least covering windows with blankets if there are no shutters. Set out water for animals where it heats up drastically, or in a beverage where they might become trapped and drown. Walk your dog when the ground is heated up - asphalt burns happen quickly past 25 degrees Celsius. Fall for scalpers, scammers and increased prices for ACs and fans who are using the current demand and availability issues for profit. The Porta Split I mean to get can be bought for 550-750 Euro under normal circumstances, now during the heat wave, prices have exploded to over 1.4k. Only buy that if it is an emergency. Think fans or ACs can make you sick. This is a widely held belief especially in older generations in Germany at least, together with the myth that any wind can cause a cold and stiff neck. It is bullshit. It’s a big reason why this country is not prepared for this heat and there’s a 20% adoption rate for ACs here. Think you need to keep the fan off or not buy one at all because of the electricity bill. The increase is lower for newer models and for the few days you need to use it (more) (for now). You are also not meaningfully contributing to climate change with this increased energy use. Like, come on, they wanna build entire data centers eating away gigawatts, your heat protection is not the issue here. Still, all of these tend to be hyperindividualistic solutions, just like when Covid happened, and we need more widespread, structural solutions. Not everyone can stay home; many people still have to work and commute. You might tell people to hydrate as much as possible, but their work doesn’t offer free (or extra) water to them, and many places like restaurants and cafés still don’t. We tell people to invest in ACs and fans, but landlords and workplaces don’t want to install any, forbid the use, or don’t cover the price of these things. It’s like heat management is still an incredibly personal thing where everyone has to feel like they are fending for themselves, investing their own money into stockpiling resources and tech, and utilizing the privilege to avoid a lot of the heat by working from home/working inside, taking time off, calling in sick and so on. More collective heat management can look like: Free water in establishments everywhere, and drinking fountains spread throughout cities, with signs pointing to the next one. Designating libraries, community centers, schools, transit hubs and big shops like huge supermarkets as cooling centers during heat waves. Keeping trees, bushes, grass etc. intact and adding more. They help keep cities cooler, together with reflective roofs and lighter pavements. Legally mandating landlords to install ACs in rental units, especially ones directly below the roof (attic/loft/penthouse apartments), and cover specific windows in protective foil or external shutters. Requiring new(er) buildings to have specific insulation that helps in summer as well as winter, ventilation strategies, ACs, etc. and updating building codes so new housing remains habitable during prolonged heat waves, even without continuous air conditioning. More shaded areas in crowded places, waiting spots (public transportation), shaded pathways between major destinations. Rollout of functioning and resilient AC in all public transportation, hospitals, schools, universities, elderly homes etc. Extending opening hours into the early morning and late evening during extreme heat, with closure inbetween (or at the bare minimum, siestas). Temperature thresholds that trigger additional protections or suspension of certain work or studies. Preparing railroads, normal roads and other parts of the public from the intense heat effects or making them more heat resistant; otherwise you risk bent rails, melting bitumen etc. Distributing fans or subsidizing cooling equipment where appropriate. Strengthening electrical grids to cope with increased cooling demand, subsidizing electricity costs during declared heat emergencies, expanding renewable generation to reduce the emissions associated with increased cooling needs. And likely more I forgot. Yes, people will cry that this costs soooo much money. But remember that we have no problem investing that money into wars, AI, data centers, expensive proprietary software licenses, politicians’ money schemes and making billionaires richer. Landlords need to invest the rent into the property instead of enriching themselves and getting other people to pay off their mortgage. These aren’t one-time events, it will continue to get worse. Earlier in the year, longer, higher. Many people and animals will die. Everyone has to start preparing and learning from it now, and stop buying into the bullshit that “it was hot when I was a child too, we are just complaining more!!1!”. Your government is failing you if they are not acting now, and it is intentional, as the heat affects vulnerable and powerless groups the most. Make sure you check on old, sick, disabled people and people you know who take medication that makes them more vulnerable to the sun and/or heat. For example, diuretics, beta blockers, anticholinergics, and some antidepressants and stimulants. Reply via email Published 27 Jun, 2026 Hydrate a lot, even before you are actually thirsty. Stay inside if possible. Keep the added humidity to a minimum. Know what you are trying to do with drinks and showers. Cool drinks and showers offer relief, but can make you heat up after. Hot beverages and showers can make everything feel cooler after and help you sweat. I like both, depending on the situation. Wrap ice packs or similar stuff in a towel and put them under your feet or in your armpits. If possible, lower shutters so the sun cannot heat up the interior and the glass. Maybe install sun protection foil on windows (most are plant-friendly). I’ve also seen others provisionally use those reflecting covers for cars on their windows, or aluminium foil. Make sure that if it’s behind the glass, the heat won’t be trapped and make the glass crack, so preferably attach it on the outside. Sunscreen, wide breathable and covering clothing, sun umbrellas and hats. During fall/winter, maybe during Black Friday sales, get a portable split cooling system. Portables do not need structural changes to the building, which is why they tend to be allowed in rental units as they can be removed without a trace and aren’t in use all year. Shitty landlords might get mad to see it in your window, but in many countries, there already is positive case law about them and the usual AC dismissals don’t apply to them. Set out flat bowls of water in the shadow for wild animals and refill. Consider different ones for different sizes (a flat one with stone pebbles for insects, a relatively flat but water-only one for hedgehogs etc., one bird bath…). Use cool tiles and cooling mats for pets. Keep an eye out for baby birds who flee their overheated nests too early; maybe you can save some of them. Especially bitdd who live in attics and roofs are dying right now (swifts etc.) If possible and you can plan the shipment, avoid deliveries. Keep water around for delivery personnel. Eat smaller snacks and portions spread out throughout the day instead of big meals so your body doesn’t heat up as much during digestion. Leave the windows open all day. Let the sun heat up your interior, if possible; try at least covering windows with blankets if there are no shutters. Set out water for animals where it heats up drastically, or in a beverage where they might become trapped and drown. Walk your dog when the ground is heated up - asphalt burns happen quickly past 25 degrees Celsius. Fall for scalpers, scammers and increased prices for ACs and fans who are using the current demand and availability issues for profit. The Porta Split I mean to get can be bought for 550-750 Euro under normal circumstances, now during the heat wave, prices have exploded to over 1.4k. Only buy that if it is an emergency. Think fans or ACs can make you sick. This is a widely held belief especially in older generations in Germany at least, together with the myth that any wind can cause a cold and stiff neck. It is bullshit. It’s a big reason why this country is not prepared for this heat and there’s a 20% adoption rate for ACs here. Think you need to keep the fan off or not buy one at all because of the electricity bill. The increase is lower for newer models and for the few days you need to use it (more) (for now). You are also not meaningfully contributing to climate change with this increased energy use. Like, come on, they wanna build entire data centers eating away gigawatts, your heat protection is not the issue here. Free water in establishments everywhere, and drinking fountains spread throughout cities, with signs pointing to the next one. Designating libraries, community centers, schools, transit hubs and big shops like huge supermarkets as cooling centers during heat waves. Keeping trees, bushes, grass etc. intact and adding more. They help keep cities cooler, together with reflective roofs and lighter pavements. Legally mandating landlords to install ACs in rental units, especially ones directly below the roof (attic/loft/penthouse apartments), and cover specific windows in protective foil or external shutters. Requiring new(er) buildings to have specific insulation that helps in summer as well as winter, ventilation strategies, ACs, etc. and updating building codes so new housing remains habitable during prolonged heat waves, even without continuous air conditioning. More shaded areas in crowded places, waiting spots (public transportation), shaded pathways between major destinations. Rollout of functioning and resilient AC in all public transportation, hospitals, schools, universities, elderly homes etc. Extending opening hours into the early morning and late evening during extreme heat, with closure inbetween (or at the bare minimum, siestas). Temperature thresholds that trigger additional protections or suspension of certain work or studies. Preparing railroads, normal roads and other parts of the public from the intense heat effects or making them more heat resistant; otherwise you risk bent rails, melting bitumen etc. Distributing fans or subsidizing cooling equipment where appropriate. Strengthening electrical grids to cope with increased cooling demand, subsidizing electricity costs during declared heat emergencies, expanding renewable generation to reduce the emissions associated with increased cooling needs.

0 views
Kev Quirk 2 days ago

3D Printers are actually very useful

I recently started getting into 3D printing , but so far I've spent most of my time getting setup and learning the ropes. I've now completed my first little project with the 3D printers and I'm really happy with the result. As a biker of many years I have a number of helmets lying around. This is because you're supposed to replace a helmet every 5 years, because the protective foam inside degrades over time. So I have a small collection of lids and nothing to really do with them. So, I decided to print myself some helmet stands and mount them in my office. There were 3 helmets I wanted to display. A reasonably well rated helmet stand on Amazon costs around £11 . So for the 3 lids, I'd be looking at £33 (~$45). Instead of handing over 33 of my finest pounds to Jeff Bezos, I decided to have a nose on Maker World and found this helmet stand that was very well rated. So I downloaded the files and set my printers to work, and a day or so later I had these little beauties: They feel really solid and have no problem holding a helmet on the wall. Better yet, they only cost me around £2.50 ($3.30) each in filament (~750g of filament in total), so way cheaper than the Amazon option. Today I finally had time to mount the lids to the wall, and I think they look great! Sure, I could have pissed about making toy dragons or whatever, but I think these are a far better use of my 3D printers, and really why I bought them. I'm so glad that the printed results are good enough to be useable. I already have some ideas of things I want to create next, but I'm going to have to start familiarising myself with FreeCAD for that project. We'll see how that goes... Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views
Ahead of AI 2 days ago

Using Local Coding Agents

Many people reached out to me in the past asking about my local agent stack as well as how I set up my local agent stack. So, I thought it might be useful to put together a little tutorial on how to set up a local (coding) agent using open-source tools and open-weight LLMs. Figure 1: Overview of the local stack, that is, a coding agent harness that uses a local model hosted through an inference engine / runtime server. This article is a tutorial on setting up a production-ready coding agent with a fully local stack. We will use a locally served LLM together with a local coding harness that can read files, make edits, run commands, and verify changes as shown in the figure above. Here, we can think of the LLM as the engine that provides the reasoning and code generation. And the surrounding harness provides the operating environment that allows the LLM to do meaningful coding work in our local projects. Why local? For many coding workflows, a local setup is an interesting alternative to proprietary services such as GPT in Codex or Opus in Claude Code. The local setup is transparent, inspectable, and free to run apart from hardware and electricity costs. It also stays fully under your control, and you can modify the coding harness in any way you like. Plus, it’s a lot of fun! By the way, in case you want a bit more background information on coding agent harnesses, I covered the core components of coding agents (and building a coding agent from scratch for learning purposes) here: I have to admit that I still primarily alternate between Codex and Claude Code as my daily drivers, for now (and just to keep up with the new tooling and functions that are constantly being added). Also, the plan limits (especially for Codex) are still so generous that I haven’t had to worry about costs so far. However, I’ve been using local solutions for a while, too, to test things and because it somehow gives me joy to have and use a fully local setup (versus proprietary services). Either way, local solutions become more and more attractive each day. One aspect is the costs. If you have the hardware, they are practically free to run. And then there’s, of course, the privacy angle. For example, for organizing and processing my receipts, I’d be more comfortable with a local model ingesting them rather than sending the data over to OpenAI or Anthropic. (Then, if we keep in mind that Anthropic was recently throttling their flagship model’s performance for LLM research , proprietary services may become more restrictive over time, and it’s maybe a good idea to be comfortable with open-weight alternatives as a backup.) And there are many, many additional reasons and use cases like that. Your motivations for using local LLMs and coding harnesses may include: Predictable, fixed costs if you reach your subscription plan limits, and immunity to API price changes. Reproducibility; sometimes it’s nice if a model is upgraded (e.g., GPT 5.4 -> GPT 5.5 -> GPT 5.6) and it solves all your queries more reliably. However, this can also break existing workflows. Offline use in the classic airplane flight scenario with slow or no internet, or when going on a coding/writing retreat in the cabin in the woods w/o a Starlink subscription. And there are probably several others. So, in this article, we will set up and use popular harnesses like Codex and Claude Code with open-weight models and investigate whether using a model-specific harness (like Qwen-Code for Qwen3.6) brings any additional benefits. (Of course, there are many more harnesses like OpenCode, Cline, Pi, and Noumena Code, but I thought that most people already have muscle memory with either Codex or Claude Code, which makes switching to open-weight models a bit smoother). Most coding agent harnesses follow similar principles and have more or less the same features and functionality. However, the implementation details may differ, and certain LLMs have usually been primarily optimized for a specific harness. Of course, many open-weight LLMs like GLM 5.2, for example, would run Claude Code, etc. However, if an LLM developer also develops a coding harness, it is somewhat safe to assume that their model is optimized for their own harness first (while also supporting others). Here, I am primarily going to use Qwen3.6 with the Qwen-Coder coding client. However, I will also go over other options for using a local LLM with other agent harnesses, for example, Claude Code, Codex, and the increasingly popular Cline, but more on that later. The reason why I am primarily using Qwen-Code when working with Qwen models is that: it is open-source, like Codex ( https://github.com/openai/codex ) but unlike Claude Code; Qwen models have been specifically optimized for the Qwen-Code harness (more information below); I can run both Codex (with the latest GPT model) and Qwen-Code with a local Qwen model side by side on the same machine without having to switch manually back and forth between models. Regarding the second point in the list above, that Qwen models work better in Qwen-Code, Nvidia’s Polar: Agentic RL on Any Harness at Scale paper (May 2026) has a benchmark showing that the Qwen3.5-4B base model has the best coding performance in said Qwen-Code harness (both before and after their Polar-RL training), which I included below. Figure 2: Qwen model performance in different coding harnesses via Polar: Agentic RL on Any Harness at Scale ( https://arxiv.org/abs/2605.24220 ) The benchmark in the table above is for an older Qwen3.5 model, and I am assuming that the latest Qwen3.6 models are even further optimized to do well in Qwen-Code specifically. However, Pi ( https://github.com/earendil-works/pi ) also seems to be a very interesting candidate that I need to play around with in the future. By the way, Qwen3.6 35B-A3B is about 22 GB to download, requires roughly 30-40 GB of RAM, and runs pretty swiftly on both a Mac Mini with M4 and a DGX Spark. Based on the recent benchmarks shared by Cohere earlier in June, it is currently the best local model in its size class. Figure 3: Cohere benchmark from North Mini Code report published in June ( https://huggingface.co/blog/CohereLabs/introducing-north-mini-code ) As seen above, Qwen3.6 35B-A3B dominates all but one benchmark in this size class. However, that being said, Qwen Code is a general harness and also supports other types of models. For instance, we could also connect North Mini Code or Gemma 4 in Qwen Code. Figure 4: Yes, Qwen3.6 35B-A3B is a really good model! (Via x.com/pupposandro/status/2064707907489272147/) Architecture-wise, the Qwen3.6 35B-A3B model has hybrid attention similar to Qwen3-Coder and Qwen3.5. I wrote more about it in Beyond Standard LLMs . Figure 5: Qwen3.6 architecture and fact sheet from my LLM gallery . Alternatively, if you don’t want to use Qwen3.6, Cohere’s North Mini Code is probably the most interesting, capable alternative at this size class right now. I will go over this model in the next local LLM setup section as well. Figure 6: North Mini Code architecture and fact sheet from my LLM gallery . No matter what agent harness we use (Qwen-Code, Codex, or Claude Code), we have to set up a local LLM, such as Qwen3.6 35B-A3B, first. There are several options like Ollama, LM Studio, vLLM, SGLang, MLX, etc to serve models locally. You know from my Build A Large Language Model (From Scratch) and Build A Reasoning Model (From Scratch) projects that I like to code these myself. Implementing a model from scratch has the benefits that we understand the whole stack, plus we can modify and further train and fine-tune it. However, here, we just look for a model serving framework that has been super optimized for inference speed and resource needs since we don’t plan to do any training or fine-tuning at this point. (We could, as an extra step, convert and import our own from-scratch fine-tuned model into these efficient serving stacks, but this is out of the scope for this article.) For this tutorial, we will use Ollama as our efficient model serving engine because it’s relatively easy to install and use from the command line across different operating systems (although LM Studio also added a non-GUI client, but I am less familiar with it). By the way, I am not affiliated with any of the tools mentioned in this article, but one nice thing about Ollama is that they also optionally support open-weight models hosted in the cloud, including the currently strongest open-weight model, GLM 5.2, which is too large to run locally on consumer hardware. (The cloud models are not free, of course, but have similar subscription plans as ChatGPT and Claude; it’s still nice though that this option exists to conveniently test the latest state-of-the-art open-weight models “locally.”) Anyways, setting up Ollama is pretty straightforward, and you can find the official macOS/Linux/Windows download instructions on their download page. After installing, I recommend downloading a model for a quick test run. For instance, on macOS, we can use the ollama app to download models directly via the GUI: Figure 7: Using the Ollama app to find and download models Otherwise, this can be done on the command line as well via By the way, the above-mentioned qwen3.6:35b-mlx is a model using Apple’s Metal performance shaders, i.e., optimized for Macs with Apple silicon chips. I highly recommend using *-mlx versions of models working on Macs (if available). Figure 8: Prefer the MLX version when using a Mac (with an Apple Silicon chip). On a Linux machine, use the non-MLX version: Then, to make sure that it works, you can either use the GUI again or launch Ollama from the command line. Figure 9: Running Ollama in the terminal. You can exit this session via the command. As mentioned before, the currently best alternative to this Qwen3.6 35B-A3B model is North Mini Code 1.0 of similar size. Figure 10: North Mini Code 1.0 as an alternative to Qwen3.6 35B A3B. Before deciding on whether to use an LLM as a local coding agent, it’s usually not a bad idea to run a quick speed and quality assessment. Here, for the speed assessment, I would look for tokens/sec performance. Additionally, I’d also make sure this stays stable for (very) long contexts, which is what we are usually dealing with during agentic coding workflows (as opposed to simpler chatbots). Of course, we also don’t want the memory cost to explode either. You could run my ollama_speed_memory_bench.py script to do a quick check. In a nutshell, it sends different prompts (ranging from 1k to 50k words) to an Ollama model and asks it to generate up to 8k tokens by default. It reports simple statistics like prefill speed from Ollama’s prompt evaluation metrics, generation speed from output-token timing, and memory use from the Ollama process plus NVIDIA GPU memory when available. For example, to evaluate the on macOS, if you downloaded or cloned the scripts from https://github.com/rasbt/local-coding-agent-evals , we can run the following, which takes about 5 minutes: On Linux, we can run: Note that this assumes that you already downloaded the respective model as explained in the previous section. Also, depending on your system, if you have less than 30 GB RAM, you may have to use a smaller model like gemma4:e2b, which uses up to about 8 GB RAM on long contexts. Of course, there are also many smaller models, but in my experience, they make pretty bad local coding agents.) Note that for models, the RSS RAM report is not super accurate on macOS (especially for mlx model variants that utilize the Metal backend), and I suggest keeping an eye on the activity monitor’s RAM usage for Ollama during the run as well. In this case, the RAM usage fluctuated between 20 - 29 GB. Anyways, the bottom line is that for 50k contexts, the Qwen3.6 and North Mini Code models use up to 30 GB RAM and generate output with about 40 tok/sec on a recent Mac Mini and 30 tok/sec on a DGX. Below is a visual summary of the different runs. Figure 11: Quick speed comparison of the different models on different systems. Note that the macOS RAM consumption is not super accurate there. Also, note that the Qwen 35B-A3B model is faster on Mac than on the DGX Spark (which is the other way around for the Gemma 4 E2B model) thanks to the optimized MLX version. Code to reproduce: https://github.com/rasbt/local-coding-agent-evals Another interesting question is how Qwen 35B-A3B compares to the similarly-sized Cohere North Mini model? If we take similarly quantized models into account (above, I was using the Qwen3.6 default), they are pretty similar, although North Mini is perhaps slightly ahead overall, as shown below. Figure 12: Q4-quantized Qwen3.6 35B vs North Mini Code. Code to reproduce: https://github.com/rasbt/local-coding-agent-evals Anyway, the bottom line is that, in my opinion, anything faster than 20-30 tok/sec is pretty reasonable for local agent work. This is about the same speed as GPT 5.5 with “high” reasoning . In this case, both models clear the bar easily. By the way, personally, I run my agents almost exclusively on my DGX Spark because I don’t want my Mac Mini to get too hot and I want to have the RAM available for other tasks. Of course, there are always ways to optimize this more with different frameworks (other than Ollama), quantizations, MTP, and so on. However, Ollama is a good plug & play allrounder with minimal setup time that connects easily to various coding agent frameworks and where it’s super simple to swap and try out different models. After checking that the model is fast enough for convenient local work, I recommend doing a quick modeling performance assessment. Sure, there are many standardized benchmarks out there we could take a look at and even run ourselves. Usually, you can find the numbers for relevant benchmarks in the model’s technical report or model hub page. Usually, I also find it useful to look at a relative comparison with other models on https://artificialanalysis.ai/models/ . Figure 13: Benchmark from https://artificialanalysis.ai/models/ . Average performance (top), coding performance (center), agentic performance (bottom). Based on the figure above, we can see that Qwen3 35B-A3B is much more capable than the Gemma 4 E4B and E2B models, for example. Note that the Artificial Intelligence Index numbers keep changing over time as they swap benchmarks and update the weighting, so there are no “absolute” numbers we could use as a reference point for deciding which model is “good enough”. Rather, I would compare a new, interesting model to a model you used before as an anchor or reference point. Beyond standard benchmarks, I would also curate a personal set of tasks that are relevant to you to do a quick check whether this model is even suitable for any type of work that you might want it to perform. Below are the outputs of a reasoning- and code-related set of questions that also test the tool calling capabilities of the models. Here, the model returns the tool call but doesn’t execute the code itself. For instance, we can say that gets the conceptual debugging and security-review tasks right, but still struggles with agentic judgment around “what file/action first” tasks. is usable but not fully reliable for autonomous tool use. But a harness that constrains actions, adds retries, and maybe gives stronger project context could make it pretty usable. On the other hand, failing is a strong signal that it is less suitable for this kind of tool-use reasoning, even if it is fast. Note that the failures are not just formatting issues. It looks like it chooses the wrong tool, asks for clarification when enough context is present, etc. I would probably not use it as a coding-agent model beyond very narrow or heavily constrained tasks. Now, after this lengthy preamble setting up a local LLM, let’s get back to the main topic, the coding agent harness. As mentioned at the beginning of this article, we will use the qwen-code ( https://github.com/QwenLM/qwen-code ) harness, as Qwen models have been optimized for it. Figure 14: Next, we are trying to connect the locally served model to the coding agent harness. If you are familiar with Claude Code, it’s basically the same thing but fully open-source. However, I will also go over how to connect the local Qwen3.6 model to Codex and Claude Code in the next sections. Note that coding harnesses are much more capable than LLMs by themselves. This is where I recommend being more careful about what you are running and where. For instance, when trying new (coding) agents, I like to Do an audit of the (open-source) agent code base first. Run it on separate hardware (e.g., my DGX Spark) or a separate user account and/or virtual environment on my machine at the very least. Regarding the audit, I recommend looking for data sharing/egress and the default blast radius when it comes to file permissions, as well as some baseline robustness to prompt injection. The figure below attempts to summarize the main points. Figure 15: Practical audit checklist before running an installed coding agent harness. Similar concerns apply to the local model serving engine (e.g., Ollama) as well. However, coding agents require even more attention as they can directly read data from your machine and manipulate files. To do a basic audit, I recommend the following: Clone the repo: Ask a trusted agent you used before (like GPT 5.5 in Codex or Opus 4.8 in Claude Code) to review it with a focused prompt. Something like the following: You are auditing ./qwen-code before I install or run the agent on my machine. Focus only on practical local-machine risk from the installed agent and the code paths that create it: install scripts and package lifecycle hooks shell command execution by the agent file read/write boundaries at runtime secret handling and environment-variable inheritance how repo files, project instructions, and tool output can influence the agent MCP, plugin, extension, or tool integrations network calls and telemetry update mechanisms after installation terminal escape/output handling data egress and data residency Ignoring internet downloads that are strictly required for installation, check whether the installed agent can send prompts, files, telemetry, logs, identifiers, or metadata to remote servers when I use a local model through Ollama. Ignore cloud-model configurations. Do not infer risk from the project owner alone. Identify concrete endpoints, SDKs, default providers, environment variables, config defaults, and docs that control network behavior, including any endpoints operated in foreign countries or by third-party companies. Do not do broad style review. Do not refactor. Produce: high-risk findings with file/line references medium-risk concerns network/data-egress findings, including any foreign, third-party, or China-linked endpoints or defaults commands I should avoid running until reviewed settings or environment variables that reduce local-machine risk a short recommendation: safe to test in sandbox, safe to use, or do not run For each item, say whether it is expected behavior for a coding agent or inherently riskier than Codex or Claude Code. Below is a summary of the main findings (because the full report may be a bit boring and too long for this article): Local execution Qwen Code can run shell commands on our machine through its shell tool but there are strict approval controls unless permissive modes such as are enabled. This is expected for a coding agent, and it’s actually what makes it useful in practice. But of course it becomes risky if run unsandboxed or with a full environment containing secrets. Data egress Even with local Ollama, Qwen Code can send usage telemetry and metadata to Alibaba/Aliyun endpoints unless usage statistics and telemetry are disabled (more on that below). This is riskier than a local-only setup because model prompts may stay local, but session IDs, tool metadata, model info, and local base URL metadata can still leave the machine. But again, this is also common among all kinds of tools (yes, Codex and Claude do that as well). File and secret boundaries Workspace files are readable by default, while writes generally require approval and include some overwrite protections. This is good and standard agent practice. Prompt injection surfaces Repo instructions, tool output, MCP tools, extensions, and project config can influence the agent’s behavior. Prompt injection attacks can be reduced via the approval gates mentioned above. This is normal for coding agents, but untrusted repos should be treated as hostile by default because they can steer the agent toward reading files, running commands, or sending data through approved tools. Regarding the main privacy concerns in point 2, most of it is fixable via a custom with the following contents: The setting is a tradeoff. Security fixes will not be installed automatically, but I prefer having explicit control over when updates happen instead of letting the tool pull and apply new code in the background. By the way, cline ( https://github.com/Cline/Cline ), Codex ( https://github.com/openai/codex ), and Claude Code have similar telemetry data sharing defaults that would need to be disabled explicitly. (Note that Claude Code doesn’t have an official open-source version of their codebase, which makes trusting it even trickier, and it does seem to send data to both Anthropic and Datadog.) Either way, overall, it seems Qwen-Code follows standard practices, and as of this writing, there is no particular concern that is non-standard for coding agents. If we accept the reported findings and risks (personally, I didn’t see any red flags), we can now proceed with the installation and hook up our local Qwen3.6-35B-A3B model to Qwen Code (and Codex and Claude Code in the next sections). As mentioned before, I preferably experiment with and run coding agents, which can read and edit local files, on a separate machine (in my case a DGX Spark, but it could also be a separate Mac or Linux workstation). Alternatively, I would run it in a VM or set up a separate macOS or Linux user account as a practical middle ground. (I heard from some friends that they also rent servers for that, like Linode or Heroku, for tinkering purposes. However, instead of the monthly hosting costs for a somewhat capable machine, I would probably rather get a relatively cheap $200-500 hardware box, or even an old retired laptop, and run a local harness and then use a stronger open-weight model hosted in the cloud via Ollama cloud models, OpenRouter, etc if you are looking for alternatives to GPT or Claude.) Anyways, let’s install Qwen-Code. The listed options include, e.g., However, running the commands above assumes that the published artifacts match the code we just reviewed in the GitHub repo. If we are extra careful/paranoid, we can also build it ourselves from the GitHub repo. Be warned, this is more manual/messier though (I recommend executing them one at a time instead of copy & pasting the whole block into the terminal): After completing the installation, we can now launch the Qwen-Code client via the qwen command from the terminal to complete the setup and connect to the locally served LLM. For this, after running the qwen command, we select “Custom Provider”, as shown below. Figure 16: Choose “Custom Provider,” which lets us connect the Ollama LLM. Ollama uses the OpenAI API standard. So, next, we follow the on-screen setup guide and choose the “OpenAI-compatible” option. Figure 17: Since Ollama follows the OpenAI API standard, we choose “OpenAI-compatible” here. Next, we need to provide the API endpoint of the running Ollama application that serves our local LLM. Usually that’s the local address by default. We enter (including the /v1) since that’s the OpenAI-compatible base URL. Figure 18: Configure Qwen Code to use Ollama’s local OpenAI-compatible endpoint, . Next, we enter as our custom provider. Figure 19: Enter as the API key placeholder for the local custom provider. Next, we can select the available models. These are the ones that we downloaded via . You can enter only a single model or multiple ones separated by commas. You can double-check the list of downloaded models via . By the way, you can always add more models easily later (I’ll explain after completing the setup). Figure 20: Select the local Ollama models that Qwen Code should make available through the custom provider. We are almost done! In step 5/6, we of course select “Enable thinking” mode, which will result in higher token usage but the better resulting problem-solving capabilities are worth it. Figure 21: Enable thinking mode for the local model provider. And that’s basically it. Step 6 is basically a review step that we can confirm by pressing “Enter”. Congratulations, you should now have a working fully-local LLM workflow set up. The usage is pretty much similar to Claude Code, where you can use / commands for various functionality. E.g., you can switch models via the command, as shown below. Figure 22: Use to switch models. By the way, as I mentioned before, it’s relatively easy to add new models from ollama. Once you pull a new model via , you can add it as a new entry in . Here, just copy & paste an existing entry into the file and change the “id” and “name” to that of the Ollama model name. Figure 23: We can add new ollama models by editing the config file. Here, is the name of the ollama model name, e.g., . By the way, to update the qwen-code tool once in a while, if we used the git clone & local build route, we can pull a recent GitHub snapshot and update it as follows: Now that we have a fully working, local coding agent, the question is: how well does it perform, and is it actually good enough for my tasks? Of course, there are benchmarks for this, but in my opinion, nothing beats trying it for yourself on some of your workflow. In other words, this basically means using it for a day or two to decide whether it meets your bar. I also recommend compiling a small set of tasks that reflect your common coding agent usage. And if you come upon a particularly challenging one when working on a given project, it may not be a bad idea to add it to this set to evaluate future models. As an example of what I mean, I shared a relatively small, simple, and general set of tasks we can use to test the agents here on GitHub: https://github.com/rasbt/local-coding-agent-evals/tree/main/agent-problem-pack . This is basically an extension of the tasks from the Local LLM Setup section. The details on how to run these are in the GitHub README: https://github.com/rasbt/local-coding-agent-evals/tree/main/agent-problem-pack#quick-start-running-benchmarks-manually . Below is the outcome for the different LLMs tested in Qwen-Code. Figure 24: Small local agent capability benchmark using Qwen-Code. Code to reproduce: https://github.com/rasbt/local-coding-agent-evals As we can see, both the Qwen3.6 and North Mini Code 35B-A3B models solve 4 out of 5 of these problems. Gemma 4 E2B fails a lot. Out of curiosity, I also added the a bit older Nemotron 3 Nano model. It has a similar size and compute performance as the aforementioned Qwen and North models, and it performs similarly well. Figure 25: Nemotron 3 Nano architecture overview from my LLM Gallery After setting up the local coding agent (and the article exceeding 5000 words), this would probably be a reasonable place to stop. However, as a bonus, I also thought it might be interesting to add brief Codex and Claude Code notes for completeness. Unfortunately, as far as I know, the Codex UI does not support non-OpenAI models, but we can use the Codex CLI to run our Ollama models. If you haven’t installed the OpenAI Codex CLI yet, you can get and install it analogously to qwen-code from their open-source GitHub directory: https://github.com/openai/codex (Yes, the Codex CLI is open source!) I will spare you the lengthy listing of the commands and recommend checking the repo’s README instead for the official instructions. (Cloning the repo and running an audit similar to qwen-code is not a bad idea here, as well.) Then, once installed, there are multiple ways to enable local model use. In my opinion, the most convenient way is to set up a separate config (inside the existing folder) with some default options: Figure 26: Set up a separate Ollama profile for Codex for convenience. Then, we can still use to launch the regular “Codex with GPT 5.5” mode and use our Ollama model via . Figure 27: Launch Codex using a local Ollama model. When rerunning the test cases from the Agent Capability Assessment section, to my surprise, Qwen3.6 does actually perform better via Codex compared to its “native” Qwen-Code coding harness, as shown below. Figure 28: Small local agent capability benchmark in Codex. Even though this is just a small set of benchmarks, it suggests that using Codex as the universal coding agent harness may not be such a bad idea after all. Of course, there is also the popular Claude Code agent harness that we could use as a harness around our local LLMs. While very popular and capable, this is probably my least favorite option for local setups because the codebase is proprietary. That also means we cannot readily inspect and/or disable Anthropic’s data logging practices. To set it up, if you don’t have Claude Code already installed on your machine, I suggest checking the official docs for recommended installation commands: https://code.claude.com/docs/en/quickstart . Claude Code itself does not expose the same local-provider configuration path as Codex. However, Ollama provides an integration via : https://docs.ollama.com/integrations/claude-code I.e., we can execute to run the Claude Code harness with an Ollama model. By the way, this also works for codex via , but I personally prefer the route we discussed earlier, as it gives me a bit more insight and control about how things works etc. Figure 29: Claude Code with a local Qwen3.6 model through Ollama. However, as a user, it feels like Claude Code takes much longer to come up with a solution. It probably has a much higher token usage. So, below, I additionally looked at the token usage of all three harnesses. As we can see, Claude Code uses by far the most tokens on average, Codex the least. Figure 30: Average token usage of the three harnesses for different LLMs. Code to reproduce: https://github.com/rasbt/local-coding-agent-evals When it comes to the little agent capability assessment benchmark, the Qwen and North Mini Code models also get 5/5, and even the small Gemma 4 model does ok! Interestingly, we can also see that the token usage is largely driven by the harness, not the LLM itself. I.e., among all three LLMs that are capable of solving (almost) all 5 tasks, they all use the same number of tokens (e.g., Qwen3.6 uses roughly the same number of tokens as North Mini Code and Nemotron 3 Nano when used inside Claude Code). Only Gemma 4 uses fewer tokens, but it also fails almost all tasks, likely because of insufficient tool-calling capabilities where the tasks interrupt early. For reference, below is again the summarized task-success rate. Figure 31: Summarized task success rates. Anyway, the takeaway here is that if more tokens help the model-harness combination to solve more (and more complex) problems, great! But if we have two harnesses that both have an equal task success rate, a harness that uses 50% fewer tokens (e.g., Codex over Claude Code), then this is a huge win, because it will make tasks run twice as fast. However, the big caveat here is that task correctness is a necessary criterion, but it doesn’t measure code quality and readability, which are hard to assess automatically. PS: I tried to analyze why Claude Code uses more tokens, and it seems that the difference mainly comes from input tokens rather than output tokens. In other words, Claude is not writing twice as much. The logs suggest that Claude is repeatedly feeding more context back into the model across turns, including previous messages, tool calls, command outputs, and file contents. For example, one Claude run used about 578k input tokens but only about 4.5k output tokens across 25 turns. So the likely explanation is that Claude’s harness accumulates or accounts for a larger prompt-side history during multi-step agent runs. So far, all the setups we discussed assumed that we were running the local LLM on the same machine as the coding harness. However, what if we developed some trust in the coding agent harness and want to use it on our main Mac while the model itself is hosted on a different machine, e.g., a DGX Spark? In my opinion, the best (or most convenient) setup is an SSH tunnel from the Mac to the DGX. First, I suggest quitting Ollama on the Mac or changing the to something else below. Assuming we quit the Ollama app on the Mac, check that the following returns an empty output to indicate that Ollama is not available: Then run the following command on that Mac in a terminal window on the Mac side: That command means that we open an SSH connection to as user , which you need to adjust to whatever your username and machine name are. Then, the command forwards the Mac’s local port to on the DGX because of . Note that this is the Ollama address. The terminal running will look like it is hanging. That is normal. Keep it open while you use Qwen Code, Codex, or Claude Code. Press to stop the tunnel. So after it is running, use this on your Mac to see if the Mac can indeed access the ollama models from the DGX: If that returns the DGX models, your Mac tools can use the DGX Ollama server as if it were local. Then, just use Qwen Code and Codex just like above. For Claude via , the key is that the Mac-side command must see the tunneled endpoint. If needed: We focused on Qwen Code, Codex, and Claude Code because they are the most direct fit for coding-agent workflows. OpenClaw and Hermes are also capable, but they are broader agent harnesses. They are better suited when you want one agent to coordinate across tools, apps, browsers, terminals, and longer-running workflows. For coding work, I recommend starting with Qwen Code, Codex, or Claude Code first (and there are also many other interesting coding harnesses like OpenCode, Cline, Pi, and Noumena Code). And I would treat OpenClaw and Hermes as interesting follow-up options for things beyond coding rather than the first baseline for this local coding-agent setup. This was a long article with lots of information and configuration. If there are a few main takeaways, I’d say that it’s not the mechanistic setup pipeline but rather the considerations when running coding agents locally. That is, the most important part is not getting one specific tool installed, but understanding the model-serving layer, the agent harness, the permission model, and how to evaluate whether the setup actually solves coding tasks reliably. Of course, GPT 5.5 and Opus 4.8 are currently better than smaller open-weight models that run on a Mac or DGX Spark. But the newer Mixture-of-Experts models in the 30-35B range (such as Qwen3.6, North Mini Code, and Nemotron 3 Nano) are all very, very capable and really sufficient for a lot of tasks. And yes, they run with the same token speed as GPT 5.5 through a Pro subscription, so it should not necessarily slow down your workflows. The main consideration when setting up local agents, besides the model itself, is also which harness we want to use. The common perception is that models are usually optimized more for a specific harness than others (e.g., Qwen3.6 may work better in Qwen Code than Claude Code, for example). Based on the small agent assessment, this may not necessarily be true, though (this is only a very small benchmark, so take it with a big grain of salt). So, if you are more comfortable with a different harness that you have a lot of muscle memory with, like Codex and Claude Code, maybe it’s not a bad idea to just stick the model into that one and give it a try! Anyways, I hope the article was useful, and it got you interested in doing some tinkering with open-weight models. They are becoming more capable by the day, and it’s for some inexplicable reason just fun to run models locally. If you want to try the benchmarks yourself, the code and small evaluation tasks used in this article are available here: https://github.com/rasbt/local-coding-agent-evals Also, my Build a Reasoning Model (From Scratch) book has now gone to print and started shipping. I wanted to post a picture, but it will be 3 more days until it arrives. Build a Reasoning Model (From Scratch) If you liked my previous Build a Large Language Model (From Scratch) book, this is essentially a sequel implementing inference-time scaling techniques and reinforcement learning algorithms from scratch. And if you want to support future long-form articles like this one, consider becoming a paid subscriber . It helps me keep writing these independent deep dives and sharing the accompanying code, figures, and experiments. Figure 1: Overview of the local stack, that is, a coding agent harness that uses a local model hosted through an inference engine / runtime server. This article is a tutorial on setting up a production-ready coding agent with a fully local stack. We will use a locally served LLM together with a local coding harness that can read files, make edits, run commands, and verify changes as shown in the figure above. Here, we can think of the LLM as the engine that provides the reasoning and code generation. And the surrounding harness provides the operating environment that allows the LLM to do meaningful coding work in our local projects. Why local? For many coding workflows, a local setup is an interesting alternative to proprietary services such as GPT in Codex or Opus in Claude Code. The local setup is transparent, inspectable, and free to run apart from hardware and electricity costs. It also stays fully under your control, and you can modify the coding harness in any way you like. Plus, it’s a lot of fun! By the way, in case you want a bit more background information on coding agent harnesses, I covered the core components of coding agents (and building a coding agent from scratch for learning purposes) here: 1. Intro I have to admit that I still primarily alternate between Codex and Claude Code as my daily drivers, for now (and just to keep up with the new tooling and functions that are constantly being added). Also, the plan limits (especially for Codex) are still so generous that I haven’t had to worry about costs so far. However, I’ve been using local solutions for a while, too, to test things and because it somehow gives me joy to have and use a fully local setup (versus proprietary services). Either way, local solutions become more and more attractive each day. One aspect is the costs. If you have the hardware, they are practically free to run. And then there’s, of course, the privacy angle. For example, for organizing and processing my receipts, I’d be more comfortable with a local model ingesting them rather than sending the data over to OpenAI or Anthropic. (Then, if we keep in mind that Anthropic was recently throttling their flagship model’s performance for LLM research , proprietary services may become more restrictive over time, and it’s maybe a good idea to be comfortable with open-weight alternatives as a backup.) And there are many, many additional reasons and use cases like that. Your motivations for using local LLMs and coding harnesses may include: Predictable, fixed costs if you reach your subscription plan limits, and immunity to API price changes. Reproducibility; sometimes it’s nice if a model is upgraded (e.g., GPT 5.4 -> GPT 5.5 -> GPT 5.6) and it solves all your queries more reliably. However, this can also break existing workflows. Offline use in the classic airplane flight scenario with slow or no internet, or when going on a coding/writing retreat in the cabin in the woods w/o a Starlink subscription. it is open-source, like Codex ( https://github.com/openai/codex ) but unlike Claude Code; Qwen models have been specifically optimized for the Qwen-Code harness (more information below); I can run both Codex (with the latest GPT model) and Qwen-Code with a local Qwen model side by side on the same machine without having to switch manually back and forth between models. Figure 3: Cohere benchmark from North Mini Code report published in June ( https://huggingface.co/blog/CohereLabs/introducing-north-mini-code ) As seen above, Qwen3.6 35B-A3B dominates all but one benchmark in this size class. However, that being said, Qwen Code is a general harness and also supports other types of models. For instance, we could also connect North Mini Code or Gemma 4 in Qwen Code. Figure 4: Yes, Qwen3.6 35B-A3B is a really good model! (Via x.com/pupposandro/status/2064707907489272147/) Architecture-wise, the Qwen3.6 35B-A3B model has hybrid attention similar to Qwen3-Coder and Qwen3.5. I wrote more about it in Beyond Standard LLMs . Figure 5: Qwen3.6 architecture and fact sheet from my LLM gallery . Alternatively, if you don’t want to use Qwen3.6, Cohere’s North Mini Code is probably the most interesting, capable alternative at this size class right now. I will go over this model in the next local LLM setup section as well. Figure 6: North Mini Code architecture and fact sheet from my LLM gallery . 3. Local LLM Setup No matter what agent harness we use (Qwen-Code, Codex, or Claude Code), we have to set up a local LLM, such as Qwen3.6 35B-A3B, first. There are several options like Ollama, LM Studio, vLLM, SGLang, MLX, etc to serve models locally. You know from my Build A Large Language Model (From Scratch) and Build A Reasoning Model (From Scratch) projects that I like to code these myself. Implementing a model from scratch has the benefits that we understand the whole stack, plus we can modify and further train and fine-tune it. However, here, we just look for a model serving framework that has been super optimized for inference speed and resource needs since we don’t plan to do any training or fine-tuning at this point. (We could, as an extra step, convert and import our own from-scratch fine-tuned model into these efficient serving stacks, but this is out of the scope for this article.) For this tutorial, we will use Ollama as our efficient model serving engine because it’s relatively easy to install and use from the command line across different operating systems (although LM Studio also added a non-GUI client, but I am less familiar with it). By the way, I am not affiliated with any of the tools mentioned in this article, but one nice thing about Ollama is that they also optionally support open-weight models hosted in the cloud, including the currently strongest open-weight model, GLM 5.2, which is too large to run locally on consumer hardware. (The cloud models are not free, of course, but have similar subscription plans as ChatGPT and Claude; it’s still nice though that this option exists to conveniently test the latest state-of-the-art open-weight models “locally.”) Anyways, setting up Ollama is pretty straightforward, and you can find the official macOS/Linux/Windows download instructions on their download page. After installing, I recommend downloading a model for a quick test run. For instance, on macOS, we can use the ollama app to download models directly via the GUI: Figure 7: Using the Ollama app to find and download models Otherwise, this can be done on the command line as well via By the way, the above-mentioned qwen3.6:35b-mlx is a model using Apple’s Metal performance shaders, i.e., optimized for Macs with Apple silicon chips. I highly recommend using *-mlx versions of models working on Macs (if available). Figure 8: Prefer the MLX version when using a Mac (with an Apple Silicon chip). On a Linux machine, use the non-MLX version: Then, to make sure that it works, you can either use the GUI again or launch Ollama from the command line. Figure 9: Running Ollama in the terminal. You can exit this session via the command. As mentioned before, the currently best alternative to this Qwen3.6 35B-A3B model is North Mini Code 1.0 of similar size. Figure 10: North Mini Code 1.0 as an alternative to Qwen3.6 35B A3B. 4. Simple Speed Performance Assessment Before deciding on whether to use an LLM as a local coding agent, it’s usually not a bad idea to run a quick speed and quality assessment. Here, for the speed assessment, I would look for tokens/sec performance. Additionally, I’d also make sure this stays stable for (very) long contexts, which is what we are usually dealing with during agentic coding workflows (as opposed to simpler chatbots). Of course, we also don’t want the memory cost to explode either. You could run my ollama_speed_memory_bench.py script to do a quick check. In a nutshell, it sends different prompts (ranging from 1k to 50k words) to an Ollama model and asks it to generate up to 8k tokens by default. It reports simple statistics like prefill speed from Ollama’s prompt evaluation metrics, generation speed from output-token timing, and memory use from the Ollama process plus NVIDIA GPU memory when available. For example, to evaluate the on macOS, if you downloaded or cloned the scripts from https://github.com/rasbt/local-coding-agent-evals , we can run the following, which takes about 5 minutes: On Linux, we can run: Note that this assumes that you already downloaded the respective model as explained in the previous section. Also, depending on your system, if you have less than 30 GB RAM, you may have to use a smaller model like gemma4:e2b, which uses up to about 8 GB RAM on long contexts. Of course, there are also many smaller models, but in my experience, they make pretty bad local coding agents.) Note that for models, the RSS RAM report is not super accurate on macOS (especially for mlx model variants that utilize the Metal backend), and I suggest keeping an eye on the activity monitor’s RAM usage for Ollama during the run as well. In this case, the RAM usage fluctuated between 20 - 29 GB. Anyways, the bottom line is that for 50k contexts, the Qwen3.6 and North Mini Code models use up to 30 GB RAM and generate output with about 40 tok/sec on a recent Mac Mini and 30 tok/sec on a DGX. Below is a visual summary of the different runs. Figure 11: Quick speed comparison of the different models on different systems. Note that the macOS RAM consumption is not super accurate there. Also, note that the Qwen 35B-A3B model is faster on Mac than on the DGX Spark (which is the other way around for the Gemma 4 E2B model) thanks to the optimized MLX version. Code to reproduce: https://github.com/rasbt/local-coding-agent-evals Another interesting question is how Qwen 35B-A3B compares to the similarly-sized Cohere North Mini model? If we take similarly quantized models into account (above, I was using the Qwen3.6 default), they are pretty similar, although North Mini is perhaps slightly ahead overall, as shown below. Figure 12: Q4-quantized Qwen3.6 35B vs North Mini Code. Code to reproduce: https://github.com/rasbt/local-coding-agent-evals Anyway, the bottom line is that, in my opinion, anything faster than 20-30 tok/sec is pretty reasonable for local agent work. This is about the same speed as GPT 5.5 with “high” reasoning . In this case, both models clear the bar easily. By the way, personally, I run my agents almost exclusively on my DGX Spark because I don’t want my Mac Mini to get too hot and I want to have the RAM available for other tasks. Of course, there are always ways to optimize this more with different frameworks (other than Ollama), quantizations, MTP, and so on. However, Ollama is a good plug & play allrounder with minimal setup time that connects easily to various coding agent frameworks and where it’s super simple to swap and try out different models. 5. Simple Benchmark Performance Assessment After checking that the model is fast enough for convenient local work, I recommend doing a quick modeling performance assessment. Sure, there are many standardized benchmarks out there we could take a look at and even run ourselves. Usually, you can find the numbers for relevant benchmarks in the model’s technical report or model hub page. Usually, I also find it useful to look at a relative comparison with other models on https://artificialanalysis.ai/models/ . Figure 13: Benchmark from https://artificialanalysis.ai/models/ . Average performance (top), coding performance (center), agentic performance (bottom). Based on the figure above, we can see that Qwen3 35B-A3B is much more capable than the Gemma 4 E4B and E2B models, for example. Note that the Artificial Intelligence Index numbers keep changing over time as they swap benchmarks and update the weighting, so there are no “absolute” numbers we could use as a reference point for deciding which model is “good enough”. Rather, I would compare a new, interesting model to a model you used before as an anchor or reference point. Beyond standard benchmarks, I would also curate a personal set of tasks that are relevant to you to do a quick check whether this model is even suitable for any type of work that you might want it to perform. Below are the outputs of a reasoning- and code-related set of questions that also test the tool calling capabilities of the models. Here, the model returns the tool call but doesn’t execute the code itself. For instance, we can say that gets the conceptual debugging and security-review tasks right, but still struggles with agentic judgment around “what file/action first” tasks. is usable but not fully reliable for autonomous tool use. But a harness that constrains actions, adds retries, and maybe gives stronger project context could make it pretty usable. On the other hand, failing is a strong signal that it is less suitable for this kind of tool-use reasoning, even if it is fast. Note that the failures are not just formatting issues. It looks like it chooses the wrong tool, asks for clarification when enough context is present, etc. I would probably not use it as a coding-agent model beyond very narrow or heavily constrained tasks. 6. Agent Code Base Audit Now, after this lengthy preamble setting up a local LLM, let’s get back to the main topic, the coding agent harness. As mentioned at the beginning of this article, we will use the qwen-code ( https://github.com/QwenLM/qwen-code ) harness, as Qwen models have been optimized for it. Figure 14: Next, we are trying to connect the locally served model to the coding agent harness. If you are familiar with Claude Code, it’s basically the same thing but fully open-source. However, I will also go over how to connect the local Qwen3.6 model to Codex and Claude Code in the next sections. Note that coding harnesses are much more capable than LLMs by themselves. This is where I recommend being more careful about what you are running and where. For instance, when trying new (coding) agents, I like to Do an audit of the (open-source) agent code base first. Run it on separate hardware (e.g., my DGX Spark) or a separate user account and/or virtual environment on my machine at the very least. Figure 15: Practical audit checklist before running an installed coding agent harness. Similar concerns apply to the local model serving engine (e.g., Ollama) as well. However, coding agents require even more attention as they can directly read data from your machine and manipulate files. To do a basic audit, I recommend the following: Clone the repo: Ask a trusted agent you used before (like GPT 5.5 in Codex or Opus 4.8 in Claude Code) to review it with a focused prompt. Something like the following: install scripts and package lifecycle hooks shell command execution by the agent file read/write boundaries at runtime secret handling and environment-variable inheritance how repo files, project instructions, and tool output can influence the agent MCP, plugin, extension, or tool integrations network calls and telemetry update mechanisms after installation terminal escape/output handling data egress and data residency high-risk findings with file/line references medium-risk concerns network/data-egress findings, including any foreign, third-party, or China-linked endpoints or defaults commands I should avoid running until reviewed settings or environment variables that reduce local-machine risk a short recommendation: safe to test in sandbox, safe to use, or do not run Local execution Qwen Code can run shell commands on our machine through its shell tool but there are strict approval controls unless permissive modes such as are enabled. This is expected for a coding agent, and it’s actually what makes it useful in practice. But of course it becomes risky if run unsandboxed or with a full environment containing secrets. Data egress Even with local Ollama, Qwen Code can send usage telemetry and metadata to Alibaba/Aliyun endpoints unless usage statistics and telemetry are disabled (more on that below). This is riskier than a local-only setup because model prompts may stay local, but session IDs, tool metadata, model info, and local base URL metadata can still leave the machine. But again, this is also common among all kinds of tools (yes, Codex and Claude do that as well). File and secret boundaries Workspace files are readable by default, while writes generally require approval and include some overwrite protections. This is good and standard agent practice. Prompt injection surfaces Repo instructions, tool output, MCP tools, extensions, and project config can influence the agent’s behavior. Prompt injection attacks can be reduced via the approval gates mentioned above. This is normal for coding agents, but untrusted repos should be treated as hostile by default because they can steer the agent toward reading files, running commands, or sending data through approved tools.

0 views
iDiallo 2 days ago

All Chinese Models Will Be Illegal in 3... 2... 1...

The Washington Post reported that the US government will decide who can use state-of-the-art LLMs . After the ban of Fable and the limitations coming to ChatGPT 5.6, what's next? My bet is Chinese models. For all of Anthropic's doomsaying and propping up of their secret model Mythos, several open-weight models have proven capable of similar feats, and at a fraction of the cost. DeepSeek rocked the AI world in December 2024 with their initial release, nearly sending shockwaves through American stock markets. Last year, I looked into getting a BYD electric car. At the price they were selling for, I figured that even with a 100% tariff slapped on top, it would still be a bargain. Then I discovered that not only is there a steep import tariff, you simply cannot register the car in the United States. The car itself is illegal. According to reviews from people who actually own one, it's a fantastic vehicle that would outcompete most cars on the US market. Because of that, the US simply banned it. So what does this mean for large language models? If we're now told that state-of-the-art LLMs are too dangerous for the general public, what happens to Chinese models that are equally powerful? People will start flocking to DeepSeek and zAI. The quality matches OpenAI and Anthropic, the models are open-weight, and the cost is dramatically lower. The logical next step, if you're a DC lobbyist on retainer for a San Francisco AI lab, is to ban them. We don't live in rational times. The only path to an IPO for Anthropic and OpenAI is to kick the ladder out from under everyone else and get Washington to call it "safety policy." Download the models while you still can, because once the regulation drops, owning a local copy of DeepSeek might just make you a dissident.

0 views
Unsung 2 days ago

Noise as information and information as noise

In 1982, the videogame Yars’ Revenge for the Atari 2600 needed to show a “neutral zone” in the middle of the screen. The console was so primitive – an entire great book was written about this – that it didn’t have any video memory. Any cheap effect would do, even random noise… but something as simple as generating noise was also too much for the underpowered system. So the creator of the game decided to do something that in any other situation would mean at the very least trouble, if not a downright security disaster. He crossed the wires and output on screen… the game’s own source code: The source code looked noisy enough, and the problem was solved. (Somewhat recently, Retro Game Mechanics Explained analyzed it carefully in this YouTube video , to make sure it’s not just a myth.) = 2x) and (width >= 700px)" srcset="https://unsung.aresluna.org/_media/noise-as-information-and-information-as-noise/yt1-play.2096w.avif" type="image/avif"> = 3x) or (width >= 700px)" srcset="https://unsung.aresluna.org/_media/noise-as-information-and-information-as-noise/yt1-play.1600w.avif" type="image/avif"> A similar approach was used in a Nintendo GameCube game Metroid Prime , at a moment when the protagonist’s visor needed to appear disrupted. It was two decades later, but the team still bounced off of hardware limitations, this time around memory : The GameCube only has 24MB of RAM, so every texture has to be carefully considered. If we used a low resolution texture (64x64) to save memory the “static” would be blurry and not crisp. One engineer on the team came up with a great idea: what if we just use the memory holding the Metroid Prime code itself! We quickly tried it out and it looked amazing. When you see Samus’s visor affected by electrical “noise” in game, you’re actually seeing the bits and bytes of the Metroid Prime software code itself being rendered on the screen. Turns out machine code is sufficiently random to work great as a static noise texture! This is how it looked: A few years later, in 2008, people working on Xbox 360 were testing a new interface for their entire console. It was called NXE – New Xbox Experience – and in the bottom-right corner it showed delightful ripples: …or, not just delightful. While NXE was tested internally, the ripples actually encoded the serial number of the console, to prevent leaks . Apparently, it was built specifically so that Microsoft only needed just two images to find out the entire serial number. A less surreptitious version of this idea exists today – for example, setting up a new Apple Watch shows a pretty pattern… …that also happens to encode enough information to identify the specific one watch. It really appears to be nothing more than an obfuscated QR Code, and “boy, have they patented it .” I know concealing a message inside another message is called steganography . I don’t think all of these fall under that umbrella, and I don’t even know all the above can be called “hacks.” I just thought they were interesting examples of information masquerading as noise, and noise pretending to be information. #games #graphics #hacks #security #youtube

0 views
Phil Eaton 2 days ago

The feature in OxCaml that more languages should steal

This is an external post of mine. Click here if you are not redirected.

0 views

Premium: Notes From The Bubble, Volume 1

It’s been an incredibly long few weeks, and as a result my previously-planned Hater’s Guide just isn’t possible within what little time I have left in this week, which is why I’m starting an ongoing series — Notes From The Bubble — where I’m going to dig into the various stories that have stood out to me in the last few weeks and what they mean for the greater tech ecosystem. It’ll be my weapon of choice going forward for the (few) weeks where a greater narrative is taking longer to pull together than usual. I also think it’s time for something a little more light-hearted after a few hundred thousand words of deeply-researched financial nightmare fuel. As serious as the tech industry’s descent into cargo cultism has become, it’s really important to laugh at how disordered and goofy everybody has become as they realize that we’re flat out of hypergrowth ideas . Every time you see something stupid, desperate, ridiculous or disconnected from reality, know it’s a symptom of the greater fear that AI isn’t the next big thing, and that everything is an attempt to put off accepting that truth or, alternatively, create another hype cycle so we can avoid talking about it. I know this all sounds a little reductive, but look at the current state of the tech industry. Meta is creating a Polymarket competitor . Snap is launching its third generation of AR glasses that nobody wants , I assume to compete with Meta’s AI glasses that are exclusively owned by influencers and people that should be banned from public restrooms. Microsoft has gone from loving OpenAI to loving Anthropic to loving open source LLMs and decrying the idea that any one company could control the entire AI ecosystem, somehow missing that Microsoft is the largest AI infrastructure provider in the world and is the reason that this industry exists. Google invested $75 million in movie studio A24 as part of some sort of nebulous AI partnership that will likely result in very little actually happening.  Oh, and you can now watch Instagram on your TV . This is the modern tech industry: a series of cobbled-together ideas pushed out by also-rans with massive monopolies and talent suffocated by executives that haven’t had a human experience in decades. Can you imagine Satya Nadella or Mark Zuckerberg buying something from a hardware store? Do you think they know how to use a vending machine? When did any of these people last pay a bill, or worry about anything other than shareholder value and stock-based compensation? How often do you think Sundar Pichai actually uses Google, Google Docs, or any other products blighted with a Gemini pop-up?  Today’s newsletter will be a longer-form column, a series of thoughts on the current state of the tech industry. Welcome…to Notes From The Bubble.

0 views
Stratechery 3 days ago

2026.26: Summer Vibes

Welcome back to This Week in Stratechery! As a reminder, each week, every Friday, we’re sending out this overview of content in the Stratechery bundle; highlighted links are free for everyone . Additionally, you have complete control over what we send to you. If you don’t want to receive This Week in Stratechery emails (there is no podcast), please uncheck the box in your delivery settings . On that note, here were a few of our favorites this week. This week’s Stratechery video is on Anthropic’s Safety Superpower . A Vibe Coding Adventure. It is thrilling to be an analyst in the age of AI, particularly because the questions seem so weighty. Are software companies doomed? Will white collars work exist in a decade? Might chip policy lead to war in the Taiwan Strait? All valid! And, at the same time, fretting about the future can foreclose an appreciation at how incredibly awesome this technology is, and that the possibilities really are endless. You can do anything — even organize your garage. That might sound silly, but technology, for all of its importance, is also fun, and I’m having a blast . — Ben Thompson Apple in Europe (but not Siri AI).  It was a footnote to Apple’s announcements at WWDC two weeks ago, but as expected, the now-fully-function Apple Intelligence products — aka Siri AI — will not be released in Europe because of the company’s ongoing battle with European regulators over the Digital Markets Act. On Dithering Tuesday, Ben and Gruber had a great 15-minute discussion about how maddening the situation continues to be, but I also appreciated the end of Ben’s Daily Update on Tuesday , which covered the same topic and explained why Apple’s own policies may well be what creates the long-term competitive changes the EU hopes to see.  — Andrew Sharp A Midsummer Mailbag on Sharp Tech. Every time a major holiday approaches, we try to celebrate on Sharp Tech with an extended mailbag that, thanks to the listeners, tends to be a lot of fun. Ben and I did that again for this week’s episode , and in addition to thoughts on the future of the memory chip market and more of Ben’s experience with vibe coding, we hit questions on our daily caffeine intake, Sam Altman’s PR strategy, data centers in the ocean, and how to improve international soccer. Come for both substance and pre-vacation goofiness, and whether you’re traveling next week or not, happy 4th of July!  — AS Apple Price Increases, Apple Intelligence and the E.U. — Apple is (finally) raising prices, but they’re not shipping Siri AI to the E.U. Memory Chips and China, Microsoft and Chinese Models — The big three memory makers may come to regret opening up the door to Chinese memory makers; Microsoft, meanwhile, is very incentivized to use Chinese models. My Vibe Coding Adventure, The App and the Experience, Ten Takeaways — My experience and reflections on vibe coding an app that I plan on actually using regularly. An Interview with Figma CEO Dylan Field About Design and AI — An interview with Figma CEO Dylan Field about building Figma, and why he believes AI gives the company a tailwind. Hopes, Fears, and the Wizards — A window into Washington Wizards fandom during a very big week, after a very long decade. No Siri for EU Price Hikes Embedded Memories: The Next Generation Party Building and Xi’s Dominance; Memory Chips and ASML Accusations; Germany’s Puzzling Push for Plaza Accords Draft Week Winners and Losers, Miami Gets Giannis and Boston Gets Awkward, Micah Nori and a Blazers Experiment A Summer Break Mailbag: Memory Mania, Vibe Coding, Mafia PR, Caffeine Intake, Garages, and How to Fix Soccer

0 views

New iPad

After my last post , I pulled the trigger and went with an iPad Pro 11” with Apple Pencil Pro and Magic Keyboard case. Thankfully I got it the night before the massive Apple price hikes (although it still cost way too much). I gotta say, I love this thing! Obviously it’s a huge upgrade, I jumped forward 6 years in tech from my last iPad. The form factor is much nicer as well, the 12.9” was simply too big. 11” is perfect for getting work done, sketching, gaming and using it as an e-reader. I’m planning to sell off my Kindle Oasis and Supernote Nomad, the new iPad has easily replaced both. In addition to the iPad, I super splurged and grabbed a new lens for my Sony camera. Both purchases are in preparation for our trip to China in August. My goal is to pack light, since we’ll be traveling with two kids. The iPad replaces the need for a computer + e-reader + game console (hey, it’s a long trip)! The camera lens is significantly smaller and less bulky than my other lenses, increasing the likelihood I’ll carry the camera and snap more photos. I’ve already tested out photo editing on the iPad with Pixelmator Pro and the RAW files from my Sony. The experience is excellent, especially with the Apple Pencil in the mix. The M5 processor rips through any task I throw at it (it’s funny my iPad is now significantly more powerful than my MacBook Pro). Outside of our trip, I expect my traditional computers (desktop + laptops) will see a lot less usage. At this stage in my life, the iPad does 90% of what I need. For example, my entire blog publishing flow is now possible on this tablet. I can connect my SD card, edit photos with Pixelmator Pro, write the post and upload a draft with iA Writer, then attach the photos and publish via the Micro Blog website (yes, I changed again in preparation for the trip). I’m excited to use this setup in the “field”. I’ll have to find a nice cafe in Baotou to write and edit photos from 😜.

0 views