Posts in Css (20 found)
iDiallo 4 days ago

The Server Older than my Kids!

This blog runs on two servers. One is the main PHP blog engine that handles the logic and the database, while the other serves all static files. Many years ago, an article I wrote reached the top position on both Hacker News and Reddit. My server couldn't handle the traffic . I literally had a terminal window open, monitoring the CPU and restarting the server every couple of minutes. But I learned a lot from it. The page receiving all the traffic had a total of 17 assets. So in addition to the database getting hammered, my server was spending most of its time serving images, CSS and JavaScript files. So I decided to set up additional servers to act as a sort of CDN to spread the load. I added multiple servers around the world and used MaxMindDB to determine a user's location to serve files from the closest server . But it was overkill for a small blog like mine. I quickly downgraded back to just one server for the application and one for static files. Ever since I set up this configuration, my server never failed due to a traffic spike. In fact, in 2018, right after I upgraded the servers to Ubuntu 18.04, one of my articles went viral like nothing I had seen before . Millions of requests hammered my server. The machine handled the traffic just fine. It's been 7 years now. I've procrastinated long enough. An upgrade was long overdue. What kept me from upgrading to Ubuntu 24.04 LTS was that I had customized the server heavily over the years, and never documented any of it. Provisioning a new server means setting up accounts, dealing with permissions, and transferring files. All of this should have been straightforward with a formal process. Instead, uploading blog post assets has been a very manual affair. I only partially completed the upload interface, so I've been using SFTP and SCP from time to time to upload files. It's only now that I've finally created a provisioning script for my asset server. I mostly used AI to generate it, then used a configuration file to set values such as email, username, SSH keys, and so on. With the click of a button, and 30 minutes of waiting for DNS to update, I now have a brand new server running Ubuntu 24.04, serving my files via Nginx. Yes, next months Ubuntu 26.04 LTS comes out, and I can migrate it by running the same script. I also built an interface for uploading content without relying on SFTP or SSH, which I'll be publishing on GitHub soon. It's been 7 years running this server. It's older than my kids. Somehow, I feel a pang of emotion thinking about turning it off. I'll do it tonight... But while I'm at it, I need to do something about the 9-year-old and 11-year-old servers that still run some crucial applications.

0 views
Max Woolf 2 weeks ago

An AI agent coding skeptic tries AI agent coding, in excessive detail

You’ve likely seen many blog posts about AI agent coding/ vibecoding where the author talks about all the wonderful things agents can now do supported by vague anecdata, how agents will lead to the atrophy of programming skills, how agents impugn the sovereignty of the human soul, etc etc. This is NOT one of those posts. You’ve been warned. Last May, I wrote a blog post titled As an Experienced LLM User, I Actually Don’t Use Generative LLMs Often as a contrasting response to the hype around the rising popularity of agentic coding. In that post, I noted that while LLMs are most definitely not useless and they can answer simple coding questions faster than it would take for me to write it myself with sufficient accuracy, agents are a tougher sell: they are unpredictable, expensive, and the hype around it was wildly disproportionate given the results I had seen in personal usage. However, I concluded that I was open to agents if LLMs improved enough such that all my concerns were addressed and agents were more dependable. In the months since, I continued my real-life work as a Data Scientist while keeping up-to-date on the latest LLMs popping up on OpenRouter . In August, Google announced the release of their Nano Banana generative image AI with a corresponding API that’s difficult to use, so I open-sourced the gemimg Python package that serves as an API wrapper. It’s not a thrilling project: there’s little room or need for creative implementation and my satisfaction with it was the net present value with what it enabled rather than writing the tool itself. Therefore as an experiment, I plopped the feature-complete code into various up-and-coming LLMs on OpenRouter and prompted the models to identify and fix any issues with the Python code: if it failed, it’s a good test for the current capabilities of LLMs, if it succeeded, then it’s a software quality increase for potential users of the package and I have no moral objection to it. The LLMs actually were helpful: in addition to adding good function docstrings and type hints, it identified more Pythonic implementations of various code blocks. Around this time, my coworkers were pushing GitHub Copilot within Visual Studio Code as a coding aid, particularly around then-new Claude Sonnet 4.5 . For my data science work, Sonnet 4.5 in Copilot was not helpful and tended to create overly verbose Jupyter Notebooks so I was not impressed. However, in November, Google then released Nano Banana Pro which necessitated an immediate update to for compatibility with the model. After experimenting with Nano Banana Pro, I discovered that the model can create images with arbitrary grids (e.g. 2x2, 3x2) as an extremely practical workflow, so I quickly wrote a spec to implement support and also slice each subimage out of it to save individually. I knew this workflow is relatively simple-but-tedious to implement using Pillow shenanigans, so I felt safe enough to ask Copilot to , and it did just that although with some errors in areas not mentioned in the spec (e.g. mixing row/column order) but they were easily fixed with more specific prompting. Even accounting for handling errors, that’s enough of a material productivity gain to be more optimistic of agent capabilities, but not nearly enough to become an AI hypester. In November, just a few days before Thanksgiving, Anthropic released Claude Opus 4.5 and naturally my coworkers were curious if it was a significant improvement over Sonnet 4.5. It was very suspicious that Anthropic released Opus 4.5 right before a major holiday since companies typically do that in order to bury underwhelming announcements as your prospective users will be too busy gathering with family and friends to notice. Fortunately, I had no friends and no family in San Francisco so I had plenty of bandwidth to test the new Opus. One aspect of agents I hadn’t researched but knew was necessary to getting good results from agents was the concept of the AGENTS.md file: a file which can control specific behaviors of the agents such as code formatting. If the file is present in the project root, the agent will automatically read the file and in theory obey all the rules within. This is analogous to system prompts for normal LLM calls and if you’ve been following my writing, I have an unhealthy addiction to highly nuanced system prompts with additional shenanigans such as ALL CAPS for increased adherence to more important rules (yes, that’s still effective). I could not find a good starting point for a Python-oriented I liked, so I asked Opus 4.5 to make one: I then added a few more personal preferences and suggested tools from my previous failures working with agents in Python: use and instead of the base Python installation, use instead of for data manipulation, only store secrets/API keys/passwords in while ensuring is in , etc. Most of these constraints don’t tell the agent what to do, but how to do it. In general, adding a rule to my whenever I encounter a fundamental behavior I don’t like has been very effective. For example, agents love using unnecessary emoji which I hate, so I added a rule: Agents also tend to leave a lot of redundant code comments, so I added another rule to prevent that: My up-to-date file for Python is available here , and throughout my time working with Opus, it adheres to every rule despite the file’s length, and in the instances where I accidentally query an agent without having an , it’s very evident. It would not surprise me if the file is the main differentiator between those getting good and bad results with agents, although success is often mixed . As a side note if you are using Claude Code , the file must be named instead because Anthropic is weird; this blog post will just use for consistency. With my file set up, I did more research into proper methods of prompting agents to see if I was missing something that led to the poor performance from working with Sonnet 4.5. From the Claude Code quickstart . Anthropic’s prompt suggestions are simple, but you can’t give an LLM an open-ended question like that and expect the results you want! You, the user, are likely subconsciously picky, and there are always functional requirements that the agent won’t magically apply because it cannot read minds and behaves as a literal genie . My approach to prompting is to write the potentially-very-large individual prompt in its own Markdown file (which can be tracked in ), then tag the agent with that prompt and tell it to implement that Markdown file. Once the work is completed and manually reviewed, I manually commit the work to , with the message referencing the specific prompt file so I have good internal tracking. I completely ignored Anthropic’s advice and wrote a more elaborate test prompt based on a use case I’m familiar with and therefore can audit the agent’s code quality. In 2021, I wrote a script to scrape YouTube video metadata from videos on a given channel using YouTube’s Data API , but the API is poorly and counterintuitively documented and my Python scripts aren’t great. I subscribe to the SiIvagunner YouTube account which, as a part of the channel’s gimmick ( musical swaps with different melodies than the ones expected), posts hundreds of videos per month with nondescript thumbnails and titles, making it nonobvious which videos are the best other than the view counts. The video metadata could be used to surface good videos I missed, so I had a fun idea to test Opus 4.5: The resulting script is available here , and it worked first try to scrape up to 20,000 videos (the max limit). The resulting Python script has very Pythonic code quality following the copious rules provided by the , and it’s more robust than my old script from 2021. It is most definitely not the type of output I encountered with Sonnet 4.5. There was a minor issue however: the logging is implemented naively such that the API key is leaked in the console. I added a rule to but really this is the YouTube API’s fault for encouraging API keys as parameters in a GET request . I asked a more data-science-oriented followup prompt to test Opus 4.5’s skill at data-sciencing: The resulting Jupyter Notebook is…indeed thorough. That’s on me for specifying “for all columns”, although it was able to infer the need for temporal analysis (e.g. total monthly video uploads over time) despite not explicitly being mentioned in the prompt. The monthly analysis gave me an idea: could Opus 4.5 design a small webapp to view the top videos by month? That gives me the opportunity to try another test of how well Opus 4.5 works with less popular frameworks than React or other JavaScript component frameworks that LLMs push by default. Here, I’ll try FastAPI , Pico CSS for the front end (because we don’t need a JavaScript framework for this), and HTMX for lightweight client/server interactivity: The FastAPI webapp Python code is good with logical integration of HTMX routes and partials, but Opus 4.5 had fun with the “YouTube-themed” aspect of the prompt: the video thumbnail simulates a YouTube thumbnail with video duration that loads an embedded video player when clicked! The full code is open-source in this GitHub repository . All of these tests performed far better than what I expected given my prior poor experiences with agents. Did I gaslight myself by being an agent skeptic? How did a LLM sent to die finally solve my agent problems? Despite the holiday, X and Hacker News were abuzz with similar stories about the massive difference between Sonnet 4.5 and Opus 4.5, so something did change. Obviously an API scraper and data viewer alone do not justify an OPUS 4.5 CHANGES EVERYTHING declaration on social media, but it’s enough to be less cynical and more optimistic about agentic coding. It’s an invitation to continue creating more difficult tasks for Opus 4.5 to solve. From this point going forward, I will also switch to the terminal Claude Code, since my pipeline is simple enough and doesn’t warrant a UI or other shenanigans. If you’ve spent enough time on programming forums such as Hacker News, you’ve probably seen the name “Rust”, often in the context of snark. Rust is a relatively niche compiled programming language that touts two important features: speed, which is evident in framework benchmarks where it can perform 10x as fast as the fastest Python library, and memory safety enforced at compile time through its ownership and borrowing systems which mitigates many potential problems. For over a decade, the slogan “Rewrite it in Rust” became a meme where advocates argued that everything should be rewritten in Rust due to its benefits, including extremely mature software that’s infeasible to actually rewrite in a different language. Even the major LLM companies are looking to Rust to eke out as much performance as possible: OpenAI President Greg Brockman recently tweeted “rust is a perfect language for agents, given that if it compiles it’s ~correct” which — albeit that statement is silly at a technical level since code can still be logically incorrect — shows that OpenAI is very interested in Rust, and if they’re interested in writing Rust code, they need their LLMs to be able to code well in Rust. I myself am not very proficient in Rust. Rust has a famously excellent interactive tutorial , but a persistent issue with Rust is that there are few resources for those with intermediate knowledge: there’s little between the tutorial and “write an operating system from scratch.” That was around 2020 and I decided to wait and see if the ecosystem corrected this point (in 2026 it has not), but I’ve kept an eye on Hacker News for all the new Rust blog posts and library crates so that one day I too will be able to write the absolutely highest performing code possible. Historically, LLMs have been poor at generating Rust code due to its nicheness relative to Python and JavaScript. Over the years, one of my test cases for evaluating new LLMs was to ask it to write a relatively simple application such as but even without expert Rust knowledge I could tell the outputs were too simple and half-implemented to ever be functional even with additional prompting. However, due to modern LLM postraining paradigms, it’s entirely possible that newer LLMs are specifically RLHF-trained to write better code in Rust despite its relative scarcity. I ran more experiments with Opus 4.5 and using LLMs in Rust on some fun pet projects, and my results were far better than I expected. Here are four such projects: As someone who primarily works in Python, what first caught my attention about Rust is the PyO3 crate: a crate that allows accessing Rust code through Python with all the speed and memory benefits that entails while the Python end-user is none-the-wiser. My first exposure to was the fast tokenizers in Hugging Face tokenizers , but many popular Python libraries now also use this pattern for speed, including orjson , pydantic , and my favorite polars . If agentic LLMs could now write both performant Rust code and leverage the bridge, that would be extremely useful for myself. I decided to start with a very simple project: a project that can take icons from an icon font file such as the ones provided by Font Awesome and render them into images at any arbitrary resolution. I made this exact project in Python in 2021, and it’s very hacky by pulling together several packages and cannot easily be maintained. A better version in Rust with Python bindings is a good way to test Opus 4.5. The very first thing I did was create a for Rust by telling Opus 4.5 to port over the Python rules to Rust semantic equivalents. This worked well enough and had the standard Rust idioms: no to handle lifetimes poorly, no unnecessary , no code, etc. Although I am not a Rust expert and cannot speak that the agent-generated code is idiomatic Rust, none of the Rust code demoed in this blog post has traces of bad Rust code smell. Most importantly, the agent is instructed to call clippy after each major change, which is Rust’s famous linter that helps keep the code clean, and Opus is good about implementing suggestions from its warnings. My up-to-date Rust is available here . With that, I built a gigaprompt to ensure Opus 4.5 accounted for both the original Python implementation and a few new ideas I had, such as supersampling to antialias the output. It completed the assignment in one-shot, accounting for all of the many feature constraints specified. The “Python Jupyter Notebook” notebook command at the end is how I manually tested whether the bridge worked, and it indeed worked like a charm. There was one mistake that’s my fault however: I naively chose the fontdue Rust crate as the renderer because I remember seeing a benchmark showing it was the fastest at text rendering. However, testing large icon generation exposed a flaw: achieves its speed by only partially rendering curves, which is a very big problem for icons, so I followed up: Opus 4.5 used its Web Search tool to confirm the issue is expected with and implemented ab_glyph instead which did fix the curves. icon-to-image is available open-source on GitHub . There were around 10 prompts total adding tweaks and polish, but through all of them Opus 4.5 never failed the assignment as written. Of course, generating icon images in Rust-with-Python-bindings is an order of magnitude faster than my old hacky method, and thanks to the better text rendering and supersampling it also looks much better than the Python equivalent. There’s a secondary pro and con to this pipeline: since the code is compiled, it avoids having to specify as many dependencies in Python itself; in this package’s case, Pillow for image manipulation in Python is optional and the Python package won’t break if Pillow changes its API. The con is that compiling the Rust code into Python wheels is difficult to automate especially for multiple OS targets: fortunately, GitHub provides runner VMs for this pipeline and a little bit of back-and-forth with Opus 4.5 created a GitHub Workflow which runs the build for all target OSes on publish, so there’s no extra effort needed on my end. When I used word clouds in Rust as my test case for LLM Rust knowledge, I had an ulterior motive: I love word clouds. Back in 2019, I open-sourced a Python package titled stylecloud : a package built on top of Python’s word cloud, but with the added ability to add more color gradients and masks based on icons to easily conform it into shapes (sound familiar?) However, stylecloud was hacky and fragile, and a number of features I wanted to add such as non-90-degree word rotation, transparent backgrounds, and SVG output flat-out were not possible to add due to its dependency on Python’s wordcloud / matplotlib , and also the package was really slow. The only way to add the features I wanted was to build something from scratch: Rust fit the bill. The pipeline was very similar to above: ask Opus 4.5 to fulfill a long list of constraints with the addition of Python bindings. But there’s another thing that I wanted to test that would be extremely useful if it worked: WebAssembly (WASM) output with wasm-bindgen . Rust code compiled to WASM allows it to be run in any modern web browser with the speed benefits intact: no dependencies needed, and therefore should be future-proof. However, there’s a problem: I would have to design an interface and I am not a front end person, and I say without hyperbole that for me, designing even a simple HTML/CSS/JS front end for a project is more stressful than training an AI. However, Opus 4.5 is able to take general guidelines and get it into something workable: I first told it to use Pico CSS and vanilla JavaScript and that was enough, but then I had an idea to tell it to use shadcn/ui — a minimalistic design framework normally reserved for Web Components — along with screenshots from that website as examples. That also worked. After more back-and-forth with design nitpicks and more features to add, the package is feature complete. However, it needs some more polish and a more unique design before I can release it, and I got sidetracked by something more impactful… was another Rust stress test I gave to LLMs: command line terminals can’t play audio, right? Turns out, it can with the rodio crate. Given the success so far with Opus 4.5 I decided to make the tasks more difficult: terminals can play sound, but can it compose sound? So I asked Opus 4.5 to create a MIDI composer and playback DAW within a terminal, which worked. Adding features forced me to learn more about how MIDIs and SoundFonts actually work, so it was also educational! miditui is available open-sourced on GitHub , and the prompts used to build it are here . During development I encountered a caveat: Opus 4.5 can’t test or view a terminal output, especially one with unusual functional requirements. But despite being blind, it knew enough about the ratatui terminal framework to implement whatever UI changes I asked. There were a large number of UI bugs that likely were caused by Opus’s inability to create test cases, namely failures to account for scroll offsets resulting in incorrect click locations. As someone who spent 5 years as a black box Software QA Engineer who was unable to review the underlying code, this situation was my specialty. I put my QA skills to work by messing around with , told Opus any errors with occasionally a screenshot, and it was able to fix them easily. I do not believe that these bugs are inherently due to LLM agents being better or worse than humans as humans are most definitely capable of making the same mistakes. Even though I myself am adept at finding the bugs and offering solutions, I don’t believe that I would inherently avoid causing similar bugs were I to code such an interactive app without AI assistance: QA brain is different from software engineering brain. One night — after a glass of wine — I had another idea: one modern trick with ASCII art is the use of Braille unicode characters to allow for very high detail . That reminded me of ball physics simulations, so what about building a full physics simulator also in the terminal? So I asked Opus 4.5 to create a terminal physics simulator with the rapier 2D physics engine and a detailed explanation of the Braille character trick: this time Opus did better and completed it in one-shot, so I spent more time making it colorful and fun . I pessimistically thought the engine would only be able to handle a few hundred balls: instead, the Rust codebase can handle over 10,000 logical balls! I explicitly prompted Opus to make the Colors button have a different color for each letter. ballin is available open-sourced on GitHub , and the prompts used to build it are here . The crate also published a blog post highlighting a major change to its underlying math engine , in its 0.32.0 version so I asked Opus 4.5 to upgrade to that version…and it caused crashes, yet tracing the errors showed it originated with itself. Upgrading to 0.31.0 was fine with no issues: a consequence of only using agentic coding for this workflow is that I cannot construct a minimal reproducible test case to file as a regression bug report or be able to isolate it as a side effect of a new API not well-known by Opus 4.5. The main lesson I learnt from working on these projects is that agents work best when you have approximate knowledge of many things with enough domain expertise to know what should and should not work. Opus 4.5 is good enough to let me finally do side projects where I know precisely what I want but not necessarily how to implement it. These specific projects aren’t the Next Big Thing™ that justifies the existence of an industry taking billions of dollars in venture capital, but they make my life better and since they are open-sourced, hopefully they make someone else’s life better. However, I still wanted to push agents to do more impactful things in an area that might be more worth it. Before I wrote my blog post about how I use LLMs, I wrote a tongue-in-cheek blog post titled Can LLMs write better code if you keep asking them to “write better code”? which is exactly as the name suggests. It was an experiment to determine how LLMs interpret the ambiguous command “write better code”: in this case, it was to prioritize making the code more convoluted with more helpful features, but if instead given commands to optimize the code, it did make the code faster successfully albeit at the cost of significant readability. In software engineering, one of the greatest sins is premature optimization , where you sacrifice code readability and thus maintainability to chase performance gains that slow down development time and may not be worth it. Buuuuuuut with agentic coding, we implicitly accept that our interpretation of the code is fuzzy: could agents iteratively applying optimizations for the sole purpose of minimizing benchmark runtime — and therefore faster code in typical use cases if said benchmarks are representative — now actually be a good idea? People complain about how AI-generated code is slow, but if AI can now reliably generate fast code, that changes the debate. Multiplication and division are too slow for Opus 4.6. As a data scientist, I’ve been frustrated that there haven’t been any impactful new Python data science tools released in the past few years other than . Unsurprisingly, research into AI and LLMs has subsumed traditional DS research, where developments such as text embeddings have had extremely valuable gains for typical data science natural language processing tasks. The traditional machine learning algorithms are still valuable, but no one has invented Gradient Boosted Decision Trees 2: Electric Boogaloo. Additionally, as a data scientist in San Francisco I am legally required to use a MacBook, but there haven’t been data science utilities that actually use the GPU in an Apple Silicon MacBook as they don’t support its Metal API; data science tooling is exclusively in CUDA for NVIDIA GPUs. What if agents could now port these algorithms to a) run on Rust with Python bindings for its speed benefits and b) run on GPUs without complex dependencies? This month, OpenAI announced their Codex app and my coworkers were asking questions. So I downloaded it, and as a test case for the GPT-5.2-Codex (high) model, I asked it to reimplement the UMAP algorithm in Rust. UMAP is a dimensionality reduction technique that can take in a high-dimensional matrix of data and simultaneously cluster and visualize data in lower dimensions. However, it is a very computationally-intensive algorithm and the only tool that can do it quickly is NVIDIA’s cuML which requires CUDA dependency hell. If I can create a UMAP package in Rust that’s superfast with minimal dependencies, that is an massive productivity gain for the type of work I do and can enable fun applications if fast enough. After OpenAI released GPT-5.3-Codex (high) which performed substantially better and faster at these types of tasks than GPT-5.2-Codex, I asked Codex to write a UMAP implementation from scratch in Rust, which at a glance seemed to work and gave reasonable results. I also instructed it to create benchmarks that test a wide variety of representative input matrix sizes. Rust has a popular benchmarking crate in criterion , which outputs the benchmark results in an easy-to-read format, which, most importantly, agents can easily parse. Example output from . At first glance, the benchmarks and their construction looked good (i.e. no cheating) and are much faster than working with UMAP in Python. To further test, I asked the agents to implement additional different useful machine learning algorithms such as HDBSCAN as individual projects, with each repo starting with this 8 prompt plan in sequence: The simultaneous constraints of code quality requirements via , speed requirements with a quantifiable target objective, and an output accuracy/quality requirement, all do succeed at finding meaningful speedups consistently (atleast 2x-3x) Codex 5.3 after optimizing a principal component analysis implementation. I’m not content with only 2-3x speedups: nowadays in order for this agentic code to be meaningful and not just another repo on GitHub, it has to be the fastest implementation possible . In a moment of sarcastic curiosity, I tried to see if Codex and Opus had different approaches to optimizing Rust code by chaining them: This works . From my tests with the algorithms, Codex can often speed up the algorithm by 1.5x-2x, then Opus somehow speeds up that optimized code again to a greater degree. This has been the case of all the Rust code I’ve tested: I also ran the and the word cloud crates through this pipeline and gained 6x cumulative speed increases in both libraries. Can these agent-benchmaxxed implementations actually beat the existing machine learning algorithm libraries, despite those libraries already being written in a low-level language such as C/C++/Fortran? Here are the results on my personal MacBook Pro comparing the CPU benchmarks of the Rust implementations of various computationally intensive ML algorithms to their respective popular implementations, where the agentic Rust results are within similarity tolerance with the battle-tested implementations and Python packages are compared against the Python bindings of the agent-coded Rust packages: I’ll definitely take those results with this unoptimized prompting pipeline! In all cases, the GPU benchmarks are unsurprisingly even better and with wgpu and added WGSL shaders the code runs on Metal without any additional dependencies, however further testing is needed so I can’t report numbers just yet. Although I could push these new libraries to GitHub now, machine learning algorithms are understandably a domain which requires extra care and testing. It would be arrogant to port Python’s scikit-learn — the gold standard of data science and machine learning libraries — to Rust with all the features that implies. But that’s unironically a good idea so I decided to try and do it anyways. With the use of agents, I am now developing (extreme placeholder name), a Rust crate that implements not only the fast implementations of the standard machine learning algorithms such as logistic regression and k-means clustering , but also includes the fast implementations of the algorithms above: the same three step pipeline I describe above still works even with the more simple algorithms to beat scikit-learn’s implementations. This crate can therefore receive Python bindings and even expand to the Web/JavaScript and beyond. This also gives me the oppertunity to add quality-of-life features to resolve grievances I’ve had to work around as a data scientist, such as model serialization and native integration with pandas/polars DataFrames. I hope this use case is considered to be more practical and complex than making a ball physics terminal app. Many people reading this will call bullshit on the performance improvement metrics, and honestly, fair. I too thought the agents would stumble in hilarious ways trying, but they did not. To demonstrate that I am not bullshitting, I also decided to release a more simple Rust-with-Python-bindings project today: nndex, an in-memory vector “store” that is designed to retrieve the exact nearest neighbors as fast as possible (and has fast approximate NN too), and is now available open-sourced on GitHub . This leverages the dot product which is one of the simplest matrix ops and is therefore heavily optimized by existing libraries such as Python’s numpy …and yet after a few optimization passes, it tied even though leverages BLAS libraries for maximum mathematical performance. Naturally, I instructed Opus to also add support for BLAS with more optimization passes and it now is 1-5x numpy’s speed in the single-query case and much faster with batch prediction. 3 It’s so fast that even though I also added GPU support for testing, it’s mostly ineffective below 100k rows due to the GPU dispatch overhead being greater than the actual retrieval speed. Comparison of Python to numpy on test workloads. measures result matches (perfect match) and measure the largest difference between calculated cosine similarities (effectively zero). One of the criticisms about AI generated code is that it “just regurgitates everything on GitHub” but by construction, if the code is faster than what currently exists, then it can’t have been stolen and must be an original approach. Even if the explicit agentic nature of makes it risky to adopt downstream, the learnings from how it accomplishes its extreme speed are still valuable. Like many who have hopped onto the agent train post-Opus 4.5, I’ve become nihilistic over the past few months, but not for the typical reasons. I actually am not hitting burnout and I am not worried that my programming skills are decaying due to agents: on the contrary, the session limits intended to stagger server usage have unintentionally caused me to form a habit of coding for fun an hour every day incorporating and implementing new ideas. However, is there a point to me writing this blog post and working on these libraries if people will likely just reply “tl;dr AI slop” and “it’s vibecoded so it’s automatically bad”? The real annoying thing about Opus 4.6/Codex 5.3 is that it’s impossible to publicly say “Opus 4.5 (and the models that came after it) are an order of magnitude better than coding LLMs released just months before it” without sounding like an AI hype booster clickbaiting, but it’s the counterintuitive truth to my personal frustration. I have been trying to break this damn model by giving it complex tasks that would take me months to do by myself despite my coding pedigree but Opus and Codex keep doing them correctly. On Hacker News I was accused of said clickbaiting when making a similar statement with accusations of “I haven’t had success with Opus 4.5 so you must be lying.” The remedy to this skepticism is to provide more evidence in addition to greater checks and balances, but what can you do if people refuse to believe your evidence? A year ago, I was one of those skeptics who was very suspicious of the agentic hype, but I was willing to change my priors in light of new evidence and experiences, which apparently is rare. Generative AI discourse has become too toxic and its discussions always end the same way, so I have been experimenting with touching grass instead, and it is nice. At this point, if I’m not confident that I can please anyone with my use of AI, then I’ll take solace in just pleasing myself. Continue open sourcing my projects, writing blog posts, and let the pieces fall as they may. If you want to follow along or learn when releases, you can follow me on Bluesky . Moment of introspection aside, I’m not sure what the future holds for agents and generative AI. My use of agents has proven to have significant utility (for myself at the least) and I have more-than-enough high-impact projects in the pipeline to occupy me for a few months. Although certainly I will use LLMs more for coding apps which benefit from this optimization, that doesn’t imply I will use LLMs more elsewhere: I still don’t use LLMs for writing — in fact I have intentionally made my writing voice more sardonic to specifically fend off AI accusations. With respect to Rust, working with agents and seeing how the agents make decisions/diffs has actually helped me break out of the intermediate Rust slog and taught me a lot about the ecosystem by taking on more ambitious projects that required me to research and identify effective tools for modern Rust development. Even though I have technically released Rust packages with many stars on GitHub, I have no intention of putting Rust as a professional skill on my LinkedIn or my résumé. As an aside, how exactly do résumés work in an agentic coding world? Would “wrote many open-source libraries through the use of agentic LLMs which increased the throughput of popular data science/machine learning algorithms by an order of magnitude” be disqualifying to a prospective employer as they may think I’m cheating and faking my expertise? My obligation as a professional coder is to do what works best, especially for open source code that other people will use. Agents are another tool in that toolbox with their own pros and cons. If you’ve had poor experiences with agents before last November, I strongly urge you to give modern agents another shot, especially with an tailored to your specific coding domain and nuances (again here are my Python and Rust files, in conveient copy/paste format). Overall, I’m very sad at the state of agentic discourse but also very excited at its promise: it’s currently unclear which one is the stronger emotion. Two subtle ways agents can implicitly negatively affect the benchmark results but wouldn’t be considered cheating/gaming it are a) implementing a form of caching so the benchmark tests are not independent and b) launching benchmarks in parallel on the same system. I eventually added rules to ideally prevent both.  ↩︎ The crate beat the agent-optimized GBT crate by 4x on my first comparison test, which naturally I took offense: I asked Opus 4.6 to “Optimize the crate such that wins in ALL benchmarks against .” and it did just that.  ↩︎ Currently, only the macOS build has BLAS support as Win/Linux BLAS support is a rabbit hole that needs more time to investigate. On those platforms, numpy does win, but that won’t be the case for long!  ↩︎ Implement the package with the specific functional requirements and design goals; afterwards, create benchmarks with specific matrix sizes that are representative of typical use cases Do a second pass to clean up the code/comments and make further optimizations Scan the crate to find areas of algorithmic weaknesses in extreme cases, and write a sentence for each describing the problem, the potential solution, and quantifying the impact of the solution Leveraging the findings found, optimize the crate such that ALL benchmarks run 60% or quicker (1.4x faster). Use any techniques to do so, and repeat until benchmark performance converges, but don’t game the benchmarks by overfitting on the benchmark inputs alone 1 Create custom tuning profiles that take advantage of the inherent quantities of the input data and CPU thread saturation/scheduling/parallelization to optimize the crate such that ALL benchmarks run 60% or quicker (1.4x faster). You can use the flamegraph crate to help with the profiling Add Python bindings using 0.27.2 and , with relevant package-specific constraints (specifying the version is necessary to ensure compatability with Python 3.10+) Create corresponding benchmarks in Python, and write a comparison script between the Python bindings and an existing Python package Accuse the agent of potentially cheating its algorithm implementation while pursuing its optimizations, so tell it to optimize for the similarity of outputs against a known good implementation (e.g. for a regression task, minimize the mean absolute error in predictions between the two approaches) Instruct Codex to optimize benchmarks to 60% of runtime Instruct Opus to optimize benchmarks to 60% of runtime Instruct Opus to minimize differences between agentic implementation and known good implementation without causing more than a 5% speed regression on any benchmarks UMAP: 2-10x faster than Rust’s fast-umap , 9-30x faster than Python’s umap HDBSCAN (clustering algorithm): 23-100x faster than the hdbscan Rust crate, 3x-10x faster than Python’s hdbscan GBDT (tree-boosting algorithm): 1.1x-1.5x faster fit/predict than the treeboost Rust crate 2 , 24-42x faster fit/1-5x faster predict than Python’s xgboost Two subtle ways agents can implicitly negatively affect the benchmark results but wouldn’t be considered cheating/gaming it are a) implementing a form of caching so the benchmark tests are not independent and b) launching benchmarks in parallel on the same system. I eventually added rules to ideally prevent both.  ↩︎ The crate beat the agent-optimized GBT crate by 4x on my first comparison test, which naturally I took offense: I asked Opus 4.6 to “Optimize the crate such that wins in ALL benchmarks against .” and it did just that.  ↩︎ Currently, only the macOS build has BLAS support as Win/Linux BLAS support is a rabbit hole that needs more time to investigate. On those platforms, numpy does win, but that won’t be the case for long!  ↩︎

0 views
fLaMEd fury 2 weeks ago

Making fLaMEd fury Glow Everywhere With an Eleventy Transform

What’s going on, Internet? I originally added a CSS class as a fun way to make my name (fLaMEd) and site name (fLaMEd fury) pop on the homepage. shellsharks gave me a shoutout for it, which inspired me to take it further and apply the effect site-wide. Site wide was the original intent and the problem was it was only being applied manually in a handful of places, and I kept forgetting to add it whenever I wrote a new post or created a new page. Classic. Instead of hunting through templates and markdown files, I’ve added an Eleventy HTML transform that automatically applies the glow up. I had Claude Code help me figure out the regex and the transform config. This allowed me to get this done before the kids came home. Don't @ me. The effect itself is a simple utility class using : Swap for whatever gradient custom property you have defined. The repeats the gradient across the text for a more dynamic flame effect. The transform lives in its own plugin file and gets registered in . It runs after Eleventy has rendered each page, tokenises the HTML by splitting on tags, tracks a skip-tag stack, and only replaces text in text nodes. Tags in the set, along with any span already carrying the class, push onto the stack. No replacement happens while the stack is non-empty, so link text, code examples, the page , and already-wrapped instances are all left alone. HTML attributes like and are never touched because they sit inside tag tokens, not text nodes. A single regex handles everything in one pass. The optional group matches " fury" (with space) or “fury” (without), so “flamed fury” and “flamedfury” (as it appears in the domain name) are both wrapped as a unit. The flag covers every capitalisation variant (“fLaMEd fury”, “Flamed Fury”, “FLAMED FURY”) with the original casing preserved in the output. This helps because I can be inconsistent with the styling at times. Export the plugin from wherever you manage your Eleventy plugins: Then register it in . Register it before any HTML prettify transform so the spans are in place before reformatting runs: That’s it. Any mention of the site name (fLaMEd fury) in body text gets the gradient automatically, in posts, templates, data-driven content, wherever. Look out for the easter egg I’ve dropped in. Later. Hey, thanks for reading this post in your feed reader! Want to chat? Reply by email or add me on XMPP , or send a webmention . Check out the posts archive on the website.

0 views
Kev Quirk 2 weeks ago

Introducing Pure Comments (and Pure Commons)

A few weeks ago I introduced Pure Blog a simple PHP based blogging platform that I've since moved to and I'm very happy. Once Pure Blog was done, I shifted my focus to start improving my commenting system . I ended that post by saying: At this point it's battle tested and working great. However, there's still some rough edges in the code, and security could definitely be improved. So over the next few weeks I'll be doing that, at which point I'll probably release it to the public so you too can have comments on your blog, if you want them. I've now finished that work and I'm ready to release Pure Comments to the world. 🎉 I'm really happy with how Pure Comments has turned out; it slots in perfectly with Pure Blog, which got me thinking about creating a broader suite of apps under the Pure umbrella. I've had Simple.css since 2022, and now I've added Pure Blog and Pure Comments to the fold. So I decided I needed an umbrella to house these disparate projects. That's where Pure Commons comes in. My vision for Pure Commons is to build it into a suite of simple, privacy focussed tools that are easy to self-host, and have just what you need and no more. Well, concurrent to working on Pure Comments, I've also started building a fully managed version that people will be able to use for a small monthly fee. That's about 60% done at this point, so I should be releasing that over the next few weeks. In the future I plan to add a managed version of Pure Blog too, but that will be far more complex than a managed version of Pure Comments. So I think that will take some time. I'm also looking at creating Pure Guestbook , which will obviously be a simple, self-hosted guestbook along the same vein as the other Pure apps. This should be relatively simple to build, as a guestbook is basically a simplified commenting system, so most of the code is already exists in Pure Comments. Looking beyond Pure Guestbook I have some other ideas, but you will have to wait and see... In the meantime, please take a look as Pure Comments - download the source code , take it for a spin, and provide any feedback/bugs you find. If you have any ideas for apps I could add to the Pure Commons family, please get in touch. Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views
Thomasorus 2 weeks ago

Is frontend development a dead-end career?

It's been almost a year at my current job. I was hired as a frontend developer and UI/UX coordinator, but I've been slowly shifting to a project manager role, which I enjoy a lot and where I think I contribute more. We build enterprise web apps, the kinds you will never see online, never hear about, but that power entire companies. The config application for the car assembly line, the bank counsellor website that outputs your mortgage rate, the warehouse inventory systems... That's the kind of thing we do. For backend engineers that's a very interesting job. You get to understand how entire professional fields work, and try to reproduce them in code. But for us the frontend guys? The interest is, limited to put it simply. We're talking about pages with multi-step conditional forms, tables with lots of columns and filters, 20 nuances of inputs, modals... The real challenge isn't building an innovative UI or UX, it's maintaining a consistency in project that can last years and go into the hands of multiple developers. Hence my UI/UX coordinator role where I look at my colleagues work and sternly say "That margin should be , not " . Because here's the thing: this type of client doesn't care if it's pretty or not and won't pay for design system work or maintenance. To them, stock Bootstrap or Material Design is amazingly beautiful compared to their current Windev application. What they want is stability and predictability, they care it works the same when they encounter the same interface. Sometimes, if a process is too complex for new hires, they will engage into talks to make it more user friendly, but that's it. Until recently, the only generated code we used were the types for TypeScript and API calls functions generated from the backend, which saved us a lot of repetitive work. We made experiments with generative AI and found out we could generate a lot of our template code. All that's left to do is connect both, the front of the frontend and the back of the frontend , mostly click events, stores, reactivity, and so on. People will say that's where the fun is, and sometimes yes, I agree. I've been on projects where building the state was basically building a complex state machine out of dozens of calls from vendor specific APIs. But how often do you do that? And why would you do that if you are developing the backend yourself and can pop an endpoint with all the data your frontend needs? And so I've been wondering about the future. With frameworks, component libraries, LLMs, the recession pushing to deliver fast even if mediocre code and features, who needs someone who can write HTML, CSS, JS? Who can pay for the craft of web development? Are the common frontend developers folks, not the already installed elite freelancers building websites for prestigious clients , only destined to do assembly line of components by prompting LLMs before putting some glue between code blocks they didn't write?

0 views
Preah's Website 2 weeks ago

Learning Java Again

Java was my first programming language I learned, it’s my baby. Well also HTML and CSS. But for scripting, I’ve always enjoyed writing Java code. I’ve become pretty familiar with Python at this point, and haven’t touched Java in ages, but really feel the itch to pick it up seriously again since it is what taught me programming and computer science concepts to begin with. I actually still recommend Java as a first programming language over Python, since it touches a lot of concepts that I think are good to start with from the beginning. It’s easier to move to Python than to move to Java or C++ from Python. Anyone have project ideas or recommendations for writing more Java? Let me know :) Subscribe via email or RSS

1 views
Jim Nielsen 2 weeks ago

Making Icon Sets Easy With Web Origami

Over the years, I’ve used different icon sets on my blog. Right now I use Heroicons . The recommended way to use them is to copy/paste the source from the website directly into your HTML. It’s a pretty straightforward process: If you’re using React or Vue, there are also npm packages you can install so you can import the icons as components. But I’m not using either of those frameworks, so I need the raw SVGs and there’s no for those so I have to manually grab the ones I want. In the past, my approach has been to copy the SVGs into individual files in my project, like: Then I have a “component” for reading those icons from disk which I use in my template files to inline the SVGs in my HTML. For example: It’s fine. It works. It’s a lot of node boilerplate to read files from disk. But changing icons is a bit of a pain. I have to find new SVGs, overwrite my existing ones, re-commit them to source control, etc. I suppose it would be nice if I could just and get the raw SVGs installed into my folder and then I could read those. But that has its own set of trade-offs. For example: So the project’s npm packages don’t provide the raw SVGs. The website does, but I want a more programatic way to easily grab the icons I want. How can I do this? I’m using Web Origami for my blog which makes it easy to map icons I use in my templates to Heroicons hosted on Github. It doesn’t require an or a . Here’s an snippet of my file: As you can see, I name my icon (e.g. ) and then I point it to the SVG as hosted on Github via the Heroicons repo. Origami takes care of fetching the icons over the network and caching them in-memory. Beautiful, isn’t it? It kind of reminds me of import maps where you can map a bare module specifier to a URL (and Deno’s semi-abandoned HTTP imports which were beautiful in their own right). Origami makes file paths first-class citizens of the language — even “remote” file paths — so it’s very simple to create a single file that maps your icon names in a codebase to someone else’s icon names from a set, whether those are being installed on disk via npm or fetched over the internet. To simplify my example earlier, I can have a file like : Then I can reference those icons in my templates like this: Easy-peasy! And when I want to change icons, I simply update the entries in to point somewhere else — at a remote or local path. And if you really want to go the extra mile, you can use Origami’s caching feature: Rather than just caching the files in memory, this will cache them to a local folder like this: Which is really cool because now when I run my site locally I have a folder of SVG files cached locally that I can look at and explore (useful for debugging, etc.) This makes vendoring really easy if I want to put these in my project under source control. Just run the file once and boom, they’re on disk! There’s something really appealing to me about this. I think it’s because it feels very “webby” — akin to the same reasons I liked HTTP imports in Deno. You declare your dependencies with URLs, then they’re fetched over the network and become available to the rest of your code. No package manager middleman introducing extra complexity like versioning, transitive dependencies, install bloat, etc. What’s cool about Origami is that handling icons like this isn’t a “feature” of the language. It’s an outcome of the expressiveness of the language. In some frameworks, this kind of problem would require a special feature (that’s why you have special npm packages for implementations of Heroicons in frameworks like react and vue). But because of the way Origami is crafted as a tool, it sort of pushes you towards crafting solutions in the same manner as you would with web-based technologies (HTML/CSS/JS). It helps you speak “web platform” rather than some other abstraction on top of it. I like that. Reply via: Email · Mastodon · Bluesky Go to the website Search for the icon you want Click to “Copy SVG” Go back to your IDE and paste it Names are different between icon packs, so when you switch, names don’t match. For example, an icon might be named in one pack and in another. So changing sets requires going through all your templates and updating references. Icon packs are often quite large and you only need a subset. might install hundreds or even thousands of icons I don’t need.

0 views
David Bushell 3 weeks ago

Everything you never wanted to know about visually-hidden

Nobody asked for it but nevertheless, I present to you my definitive “it depends” tome on visually-hidden web content. I’ll probably make an amendment before you’ve finished reading. If you enjoy more questions than answers, buckle up! I’ll start with the original premise, even though I stray off-topic on tangents and never recover. I was nerd-sniped on Bluesky. Ana Tudor asked : Is there still any point to most styles in visually hidden classes in ’26? Any point to shrinking dimensions to and setting when to nothing via / reduces clickable area to nothing? And then no dimensions = no need for . @anatudor.bsky.social Ana proposed the following: Is this enough in 2026? As an occasional purveyor of the class myself, the question wriggled its way into my brain. I felt compelled to investigate the whole ordeal. Spoiler: I do not have a satisfactory yes-or-no answer, but I do have a wall of text! I went so deep down the rabbit hole I must start with a table of contents: I’m writing this based on the assumption that a class is considered acceptable for specific use cases . My final section on native visually-hidden addresses the bigger accessibility concerns. It’s not easy to say where this technique is appropriate. It is generally agreed to be OK but a symptom of — and not a fix for — other design issues. Appropriate use cases for are far fewer than you think. Skip to the history lesson if you’re familiar. , — there have been many variations on the class name. I’ve looked at popular implementations and compiled the kitchen sink version below. Please don’t copy this as a golden sample. It merely encompasses all I’ve seen. There are variations on the selector using pseudo-classes that allow for focus. Think “skip to main content” links, for example. What is the purpose of the class? The idea is to hide an element visually, but allow it to be discovered by assistive technology. Screen readers being the primary example. The element must be removed from layout flow. It should leave no render artefacts and have no side effects. It does this whilst trying to avoid the bugs and quirks of web browsers. If this sounds and looks just a bit hacky to you, you have a high tolerance for hacks! It’s a massive hack! How was this normalised? We’ll find out later. I’ll whittle down the properties for those unfamiliar. Absolute positioning is vital to remove the element from layout flow. Otherwise the position of surrounding elements will be affected by its presence. This crops the visible area to nothing. remains as a fallback but has long been deprecated and is obsolete. All modern browsers support . These two properties remove styles that may add layout dimensions. This group effectively gives the element zero dimensions. There are reasons for instead of and negative margin that I’ll cover later. Another property to ensure no visible pixels are drawn. I’ve seen the newer value used but what difference that makes if any is unclear. This was added to address text wrapping inside the square (I’ll explain later). So basically we have and a load of properties that attempted to make the element invisible. We cannot use or or because those remove elements from the accessibility tree. So the big question remains: why must we still ‘zero’ the dimensions? Why is not sufficient? To make sense of this mystery I went back to the beginning. It was tricky to research this topic because older articles have been corrected with modern information. I recovered many details from the archives and mailing lists with the help of those involved. They’re cited along the way. Our journey begins November 2004. A draft document titled “CSS Techniques for WCAG 2.0” edited by Wendy Chisholm and Becky Gibson includes a technique for invisible labels. While it is usually best to include visual labels for all form controls, there are situations where a visual label is not needed due to the surrounding textual description of the control and/or the content the control contains. Users of screen readers, however, need each form control to be explicitly labeled so the intent of the control is well understood when navigated to directly. Creating Invisible labels for form elements ( history ) The following CSS was provided: Could this be the original class? My research jumped through decades but eventually I found an email thread “CSS and invisible labels for forms” on the W3C WAI mailing list. This was a month prior, preluding the WCAG draft. A different technique from Bob Easton was noted: The beauty of this technique is that it enables using as much text as we feel appropriate, and the elements we feel appropriate. Imagine placing instructive text about the accessibility features of the page off left (as well as on the site’s accessibility statement). Imagine interspersing “start of…” landmarks through a page with heading tags. Or, imagine parking full lists off left, lists of access keys, for example. Screen readers can easily collect all headings and read complete lists. Now, we have a made for screen reader technique that really works! Screenreader Visibility - Bob Easton (2003) Easton attributed both Choan Gálvez and Dave Shea for their contributions. In same the thread, Gez Lemon proposed to ensure that text doesn’t bleed into the display area . Following up, Becky Gibson shared a test case covering the ideas. Lemon later published an article “Invisible Form Prompts” about the WCAG plans which attracted plenty of commenters including Bob Easton. The resulting WCAG draft guideline discussed both the and ideas. Note that instead of using the nosize style described above, you could instead use postion:absolute; and left:-200px; to position the label “offscreen”. This technique works with the screen readers as well. Only position elements offscreen in the top or left direction, if you put an item off to the right or the bottom, many browsers will add scroll bars to allow the user to reach the content. Creating Invisible labels for form elements Two options were known and considered towards the end of 2004. Why not both? Indeed, it appears Paul Bohman on the WebAIM mailing list suggested such a combination in February 2004. Bohman even discovered possibly the first zero width bug. I originally recommended setting the height and width to 0 pixels. This works with JAWS and Home Page Reader. However, this does not work with Window Eyes. If you set the height and width to 1 pixel, then the technique works with all browsers and all three of the screen readers I tested. Re: Hiding text using CSS - Paul Bohman Later in May 2004, Bohman along with Shane Anderson published a paper on this technique. Citations within included Bob Easton and Tom Gilder . Aside note: other zero width bugs have been discovered since. Manuel Matuzović noted in 2023 that links in Safari were not focusable . The zero width story continues as recently as February 2026 (last week). In browse mode in web browsers, NVDA no longer treats controls with 0 width or height as invisible. This may make it possible to access previously inaccessible “screen reader only” content on some websites. NVDA 2026.1 Beta TWO now available - NV Access News Digger further into WebAIM’s email archive uncovered a 2003 thread in which Tom Gilder shared a class for skip navigation links . I found Gilder’s blog in the web archives introducing this technique. I thought I’d put down my “skip navigation” link method down in proper writing as people seem to like it (and it gives me something to write about!). Try moving through the links on this page using the keyboard - the first link should magically appear from thin air and allow you to quickly jump to the blog tools, which modern/visual/graphical/CSS-enabled browsers (someone really needs to come up with an acronym for that) should display to the left of the content. Skip-a-dee-doo-dah - Tom Gilder Gilder’s post links to a Dave Shea post which in turn mentions the 2002 book “Building Accessible Websites” by Joe Clark . Chapter eight discusses the necessity of a “skip navigation” link due to table-based layout but advises: Keep them visible! Well-intentioned developers who already use page anchors to skip navigation will go to the trouble to set the anchor text in the tiniest possible font in the same colour as the background, rendering it invisible to graphical browsers (unless you happen to pass the mouse over it and notice the cursor shape change). Building Accessible Websites - 08. Navigation - Joe Clark Clark expressed frustration over common tricks like the invisible pixel. It’s clear no class existed when this was written. Choan Gálvez informed me that Eric Meyer would have the css-discuss mailing list. Eric kindly searched the backups but didn’t find any earlier discussion. However, Eric did find a thread on the W3C mailing list from 1999 in which Ian Jacobs (IBM) discusses the accessibility of “skip navigation” links. The desire to visually hide “skip navigation” links was likely the main precursor to the early techniques. In fact, Bob Easton said as much: As we move from tag soup to CSS governed design, we throw out the layout tables and we throw out the spacer images. Great! It feels wonderful to do that kind of house cleaning. So, what do we do with those “skip navigation” links that used to be attached to the invisible spacer images? Screenreader Visibility - Bob Easton (2003) I had originally missed that in my excitement seeing the class. I reckon we’ve reached the source of the class. At least conceptually. Technically, the class emerged from several ideas, rather than a “eureka” moment. Perhaps more can be gleaned from other CSS techniques such a the desire to improve accessibility of CSS image replacement . Bob Easton retired in 2008 after a 40 year career at IBM. I reached out to Bob who was surprised to learn this technique was still a topic today † . Bob emphasised the fact that it was always a clumsy workaround and something CSS probably wasn’t intended to accommodate . I’ll share more of Bob’s thoughts later. † I might have overdone the enthusiasm Let’s take an intermission! My contact page is where you can send corrections by the way :) The class stabilised for a period. Visit 2006 in the Wayback Machine to see WebAIM’s guide to invisible content — Paul Bohman’s version is still recommended. Moving forward to 2011, I found Jonathan Snook discussing the “clip method”. Snook leads us to Drupal developer Jeff Burnz the previous year. […] we still have the big problem of the page “jump” issue if this is applied to a focusable element, such as a link, like skip navigation links. WebAim and a few others endorse using the LEFT property instead of TOP, but this no go for Drupal because of major pain-in-the-butt issues with RTL. In early May 2010 I was getting pretty frustrated with this issue so I pulled out a big HTML reference and started scanning through it for any, and I mean ANY property I might have overlooked that could possible be used to solve this thorny issue. It was then I recalled using clip on a recent project so I looked up its values and yes, it can have 0 as a value. Using CSS clip as an Accessible Method of Hiding Content - Jeff Burnz It would seem Burnz discovered the technique independently and was probably the first to write about it. Burnz also notes a right-to-left (RTL) issue. This could explain why pushing content off-screen fell out of fashion. 2010 also saw the arrival of HTML5 Boilerplate along with issue #194 in which Jonathan Neal plays a key role in the discussion and comments: If we want to correct for every seemingly-reasonable possibility of overflow in every browser then we may want to consider [code below] This was their final decision. I’ve removed for clarity. This is very close to what we have now, no surprise since HTML5 Boilterplate was extremely popular. I’m leaning to conclude that the additional properties are really just there for the “possibility” of pixels escaping containment as much as fixing any identified problem. Thierry Koblentz covered the state of affairs in 2012 noting that: Webkit, Opera and to some extent IE do not play ball with [clip] . Koblentz prophesies: I wrote the declarations in the previous rule in a particular order because if one day clip works as everyone would expect, then we could drop all declarations after clip, and go back to the original Clip your hidden content for better accessibility - Thierry Koblentz Sound familiar? With those browsers obsolete, and if behaves itself, can the other properties be removed? Well we have 14 years of new bugs features to consider first. In 2016, J. Renée Beach published: Beware smushed off-screen accessible text . This appears to be the origin of (as demonstrated by Vispero .) Over a few sessions, Matt mentioned that the string of text “Show more reactions” was being smushed together and read as “Showmorereactions”. Beach’s class did not include the kitchen sink. The addition of became standard alongside everything else. Aside note: the origin of remains elusive. One Bootstrap issue shows it was rediscovered in 2018 to fix a browser bug. However, another HTML5 Boilterplate issue dated 2017 suggests negative margin broke reading order. Josh Comeau shared a React component in 2024 without margin. One of many examples showing that it has come in and out of fashion. We started with WCAG so let’s end there. The latest WCAG technique for “Using CSS to hide a portion of the link text” provides the following code. Circa 2020 the property was added as browser support increased and became deprecated. An obvious change I not sure warrants investigation (although someone had to be first!) That brings us back to what we have today. Are you still with me? As we’ve seen, many of the properties were thrown in for good measure. They exist to ensure absolutely no pixels are painted. They were adapted over the years to avoid various bugs, quirks, and edge cases. How many such decisions are now irrelevant? This is a classic Chesterton’s Fence scenario. Do not remove a fence until you know why it was put up in the first place. Well we kinda know why but the specifics are practically folklore at this point. Despite all that research, can we say for sure if any “why” is still relevant? Back to Ana Tudor’s suggestion. How do we know for sure? The only way is extensive testing. Unfortunately, I have neither the time nor skill to perform that adequately here. There is at least one concern with the code above, Curtis Wilcox noted that in Safari the focus ring behaves differently. Other minimum viable ideas have been presented before. Scott O’Hara proposed a different two-liner using . JAWS, Narrator, NVDA with Edge all seem to behave just fine. As do Firefox with JAWS and NVDA, and Safari on macOS with VoiceOver. Seems also fine with iOS VO+Safari and Android TalkBack with Firefox or Chrome. In none of these cases do we get the odd focus rings that have occurred with other visually hidden styles, as the content is scaled down to zero. Also because not hacked into a 1px by 1px box, there’s no text wrapping occurring, so no need to fix that issue. transform scale(0) to visually hide content - Scott O’Hara Sounds promising! It turns out Katrin Kampfrath had explored both minimum viable classes a couple of years ago, testing them against the traditional class. I am missing the experience and moreover actual user feedback, however, i prefer the screen reader read cursor to stay roughly in the document flow. There are screen reader users who can see. I suppose, a jumping read cursor is a bit like a shifting layout. Exploring the visually-hidden css - Katrin Kampfrath Kampfrath’s limited testing found the read cursor size differs for each class. The technique was favoured but caution is given. A few more years ago, Kitty Giraudel tested several ideas concluding that was still the most accessible for specific text use. This technique should only be used to mask text. In other words, there shouldn’t be any focusable element inside the hidden element. This could lead to annoying behaviours, like scrolling to an invisible element. Hiding content responsibly - Kitty Giraudel Zell Liew proposed a different idea in 2019. Many developers voiced their opinions, concerns, and experiments over at Twitter. I wanted to share with you what I consolidated and learned. A new (and easy) way to hide content accessibly - Zell Liew Liew’s idea was unfortunately torn asunder. Although there are cases like inclusively hiding checkboxes where near-zero opacity is more accessible. I’ve started to go back in time again! I’m also starting to question whether this class is a good idea. Unless we are capable and prepared to thoroughly test across every combination of browser and assistive technology — and keep that information updated — it’s impossible to recommend anything. This is impossible for developers! Why can’t browser vendors solve this natively? Once you’ve written 3000 words on a twenty year old CSS hack you start to question why it hasn’t been baked into web standards by now. Ben Myers wrote “The Web Needs a Native .visually-hidden” proposing ideas from HTML attributes to CSS properties. Scott O’Hara responded noting larger accessibility issues that are not so easily handled. O’Hara concludes: Introducing a native mechanism to save developers the trouble of having to use a wildly available CSS ruleset doesn’t solve any of those underlying issues. It just further pushes them under the rug. Visually hidden content is a hack that needs to be resolved, not enshrined - Scott O’Hara Sara Soueidan had floated the topic to the CSS working group back in 2016. Soueidan closed the issue in 2025, coming to a similar conclusion. I’ve been teaching accessibility for a little less than a decade now and if there’s one thing I learned is that developers will resort to using utility to do things that are more often than not just bad design decisions. Yes, there are valid and important use cases. But I agree with all of @scottaohara’s points, and most importantly I agree that we need to fix the underlying issues instead of standardizing a technique that is guaranteed to be overused and misused even more once it gets easier to use. csswg-drafts comment - Sara Soueidan Adrian Roselli has a blog post listing priorities for assigning an accessible name to a control. Like O’Hara and Soueidan, Roselli recognises there is no silver bullet. Hidden text is also used too casually to provide information for just screen reader users, creating overly-verbose content . For sighted screen reader users , it can be a frustrating experience to not be able to find what the screen reader is speaking, potentially causing the user to get lost on the page while visually hunting for it. My Priority of Methods for Labeling a Control - Adrian Roselli In short, many believe that a native visually-hidden would do more harm than good. The use-cases are far more nuanced and context sensitive than developers realise. It’s often a half-fix for a problem that can be avoided with better design. I’m torn on whether I agree that it’s ultimately a bad idea. A native version would give software an opportunity to understand the developer’s intent and define how “visually hidden” works in practice. It would be a pragmatic addition. The technique has persisted for over two decades and is still mentioned by WCAG. Yet it remains hacks upon hacks! How has it survived for so long? Is that a failure of developers, or a failure of the web platform? The web is overrun with inaccessible div soup . That is inexcusable. For the rest of us who care about accessibility — who try our best — I can’t help but feel the web platform has let us down. We shouldn’t be perilously navigating code hacks, conflicting advice, and half-supported standards. We need more energy money dedicated to accessibility. Not all problems can be solved with money. But what of the thousands of unpaid hours, whether volunteered or solicited, from those seeking to improve the web? I risk spiralling into a rant about browser vendors’ financial incentives, so let’s wrap up! I’ll end by quoting Bob Easton from our email conversation: From my early days in web development, I came to the belief that semantic HTML, combined with faultless keyboard navigation were the essentials for blind users. Experience with screen reader users bears that out. Where they might occasionally get tripped up is due to developers who are more interested in appearance than good structural practices. The use cases for hidden content are very few, such as hidden information about where a search field is, when an appearance-centric developer decided to present a search field with no visual label, just a cute unlabeled image of a magnifying glass. […] The people promoting hidden information are either deficient in using good structural practices, or not experienced with tools used by people they want to help. Bob ended with: You can’t go wrong with well crafted, semantically accurate structure. Ain’t that the truth. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds. Accessibility notice Class walkthrough Where it all began Further adaptations Minimum viable technique Native visually-hidden Zero dimensions Position off-screen

6 views
David Bushell 3 weeks ago

Web font choice and loading strategy

When I rebuilt my website I took great care to optimise fonts for both performance and aesthetics. Fonts account for around 50% of my website (bytes downloaded on an empty cache). I designed and set a performance budget around my font usage. I use three distinct font families and three different methods to load them. Web fonts are usually defined by the CSS rule. The property allows us some control over how fonts are loaded. The value has become somewhat of a best practice — at least the most common default. The CSS spec says: Gives the font face an extremely small block period (100ms or less is recommended in most cases) and an infinite swap period . In other words, the browser draws the text immediately with a fallback if the font face isn’t loaded, but swaps the font face in as soon as it loads. CSS Fonts Module Level 4 - W3C That small “block period”, if implemented by the browser, renders an invisible font temporarily to minimise FOUC . Personally I default to and don’t change unless there are noticeable or measurable issues. Most of the time you’ll use swap. If you don’t know which option to use, go with swap. It allows you to use custom fonts and tip your hand to accessibility. font-display for the Masses - Jeremy Wagner Google Fonts’ default to which has performance gains. In effect, this makes the font files themselves asynchronous—the browser immediately displays our fallback text before swapping to the web font whenever it arrives. This means we’re not going to leave users looking at any invisible text (FOIT), which makes for both a faster and more pleasant experience. Speed Up Google Fonts - Harry Roberts Harry further notes that a suitable fallback is important, as I’ll discover below. My three fonts in order of importance are: Ahkio for headings. Its soft brush stroke style has a unique hand-drawn quality that remains open and legible. As of writing, I load three Ahkio weights at a combined 150 KB. That is outright greed! Ahkio is core to my brand so it takes priority in my performance budget (and financial budget, for that matter!) Testing revealed the 100ms † block period was not enough to avoid FOUC, despite optimisation techniques like preload . Ahkio’s design is more condensed so any fallback can wrap headings over additional lines. This adds significant layout shift. † Chrome blog mention a zero second block period . Firefox has a config preference default of 100ms. My solution was to use instead of which extends the block period from a recommended 0–100ms up to a much longer 3000ms. Gives the font face a short block period (3s is recommended in most cases) and an infinite swap period . In other words, the browser draws “invisible” text at first if it’s not loaded, but swaps the font face in as soon as it loads. CSS Fonts Module Level 4 - W3C This change was enough to avoid ugly FOUC under most conditions. Worst case scenario is three seconds of invisible headings. With my website’s core web vitals a “slow 4G” network can beat that by half. For my audience an extended block period is an acceptable trade-off. Hosting on an edge CDN with good cache headers helps minimised the cost. Update: Richard Rutter suggested which gives more fallback control than I knew. I shall experiment and report back! Atkinson Hyperlegible Next for body copy. It’s classed as a grotesque sans-serif with interesting quirks such as a serif on the lowercase ‘i’. I chose this font for both its accessible design and technical implementation as a variable font . One file at 78 KB provides both weight and italic variable axes. This allows me to give links a subtle weight boost. For italics I just go full-lean. I currently load Atkinson Hyperlegible with out of habit but I’m strongly considering why I don’t use . Gives the font face an extremely small block period (100ms or less is recommended in most cases) and a short swap period (3s is recommended in most cases). In other words, the font face is rendered with a fallback at first if it’s not loaded, but it’s swapped in as soon as it loads. However, if too much time passes, the fallback will be used for the rest of the page’s lifetime instead. CSS Fonts Module Level 4 - W3C The browser can give up and presumably stop downloading the font. The spec actually says that and “[must/should] only be used for small pieces of text.” Although it notes that most browsers implement the default with similar strategies to . 0xProto for code snippets. If my use of Ahkio was greedy, this is gluttonous! A default would be acceptable. My justification is that controlling presentation of code on a web development site is reasonable. 0xProto is designed for legibility with a personality that compliments my design. I don’t specify 0xProto with the CSS rule. Instead I use the JavaScript font loading API to conditionally load when a element is present. Note the name change because some browsers aren’t happy with a numeric first character. Not shown is the event wrapper around this code. I also load the script with both and attributes. This tells the browser the script is non-critical and avoids render blocking. I could probably defer loading even later without readers noticing the font pop in. Update: for clarity, browsers will conditionally load but JavaScript can purposefully delay the loading further to avoid fighting for bandwidth. When JavaScript is not available the system default is fine. There we have it, three fonts, three strategies, and a few open questions and decisions to make. Those may be answered when CrUX data catches up. My new website is a little chunkier than before but its well within reasonable limits. I’ll monitor performance and keep turning the dials. Web performance is about priorities . In isolation it’s impossible to say exactly how an individual asset should be loaded. There are upper limits, of course. How do you load a one megabyte font? You don’t. Unless you’re a font studio providing a complete type specimen. But even then you could split the font and progressive load different unicode ranges. I wonder if anyone does that? Anyway I’m rambling now, bye. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views
Dominik Weber 3 weeks ago

Lighthouse update February 16th

## Website to feed The past week had first and foremost one improvement, website to feed conversion. It enables users to subscribe to websites that don't provide an RSS feed. This feature consists of multiple areas. The backbone is extracting items from a website based on CSS selectors, and then putting those items through the same pipeline as items of an RSS feed. Meaning extracting full content, calculating reading time, creating a summary, creating an about sentence, interpreting language and topic, and so on. Additional areas are all about making it easier to use. Showing the website and letting users select items simplifies the feature, for many websites it's not necessary to even know about the selectors. This also required some heuristics about which elements to select and how to find the repeating items from just one selection. The user experience can always be improved, but I think as it is right now it's already quite decent. The next step for this feature is to automatically find the relevant items, without the user having to select anything. ## Next steps An ongoing thing is the first user experience. It's not where I want it to be, but honestly it's difficult to know or imagine how it should be. One issue that came up repeatedly is the premium trial, and that users don't want to provide their credit card just to start the trial. That's fair. Though Paddle, the payment system Lighthouse uses, doesn't provide another option. They have it as private beta, but I didn't get invited to that unfortunately. So I'm going to bite the bullet and implement this myself. Won't be as great as if Paddle does it, but at least users will get the premium experience for 2 weeks after signup. An improvement I had my eyes on for some time is using the HTML of RSS feed items for the preview. Lighthouse attempts to parse the full content for all items, but that's not always possible. If websites disallow it via robots.txt, or block via bot protection, Lighthouse doesn't get the content. In these cases it shows that access was blocked. But if the feed contains some content, that could be displayed. Feeds usually don't contain the full content, but it's at least something. One more thing I wanted to do for a long time, and can finally make time for, is creating collections of feeds for specific topics. For example "Frontier AI labs", "Company engineering blogs", "JS ecosystem", and so on. The [blogroll editor](https://lighthouseapp.io/tools/blogroll-editor) is the basis for that. It lets you create a collection of websites and feeds, and export OPML from that. I'm going to improve its UX a bit and then start creating these collections.

0 views
Kev Quirk 4 weeks ago

Updates to My Commenting System

I've been making some updates to my personal commenting system . Before they looked like this: It was just a simple table that contained every comment, both original and replies, in one table. As more people commented, it got a little unwieldy and confusing, so I've changed the layout to that it's all nested comment threads now: I've been using it for around 6 months and have received well over a hundred comments, plus moving it from Jekyll to Pure Blog was extremely simple - just some CSS changes, actually. At this point it's battle tested and working great. However, there's still some rough edges in the code, and security could definitely be improved. So over the next few weeks I'll be doing that, at which point I'll probably release it to the public so you too can have comments on your blog, if you want them.

0 views
David Bushell 1 months ago

Declarative Dialog Menu with Invoker Commands

The off-canvas menu — aka the Hamburger , if you must — has been hot ever since Jobs’ invented mobile web and Ethan Marcott put a name to responsive design . Making an off-canvas menu free from heinous JavaScript has always been possible, but not ideal. I wrote up one technique for Smashing Magazine in 2013. Later I explored in an absurdly titled post where I used the new Popover API . I strongly push clients towards a simple, always visible, flex-box-wrapping list of links. Not least because leaving the subject unattended leads to a multi-level monstrosity. I also believe that good design and content strategy should allow users to navigate and complete primary goals without touching the “main menu”. However, I concede that Hamburgers are now mainstream UI. Jason Bradberry makes a compelling case . This month I redesigned my website . Taking the menu off-canvas at all breakpoints was a painful decision. I’m still not at peace with it. I don’t like plain icons. To somewhat appease my anguish I added big bold “Menu” text. The HTML for the button is pure declarative goodness. I added an extra “open” prefix for assistive tech. Aside note: Ana Tudor asked do we still need all those “visually hidden” styles? I’m using them out of an abundance of caution but my feeling is that Ana is on to something. The menu HTML is just as clean. It’s that simple! I’ve only removed my opinionated class names I use to draw the rest of the owl . I’ll explain more of my style choices later. This technique uses the wonderful new Invoker Command API for interactivity. It is similar to the I mentioned earlier. With a real we get free focus management and more, as Chris Coyier explains . I made a basic CodePen demo for the code above. So here’s the bad news. Invoker commands are so new they must be polyfilled for old browsers. Good news; you don’t need a hefty script. Feature detection isn’t strictly necessary. Keith Cirkel has a more extensive polyfill if you need full API coverage like JavaScript events. My basic version overrides the declarative API with the JavaScript API for one specific use case, and the behaviour remains the same. Let’s get into CSS by starting with my favourite: A strong contrast outline around buttons and links with room to breath. This is not typically visible for pointer events. For other interactions like keyboard navigation it’s visible. The first button inside the dialog, i.e. “Close (menu)”, is naturally given focus by the browser (focus is ‘trapped’ inside the dialog). In most browsers focus remains invisible for pointer events. WebKit has bug. When using or invoker commands the style is visible on the close button for pointer events. This seems wrong, it’s inconsistent, and clients absolutely rage at seeing “ugly” focus — seriously, what is their problem?! I think I’ve found a reliable ‘fix’. Please do not copy this untested . From my limited testing with Apple devices and macOS VoiceOver I found no adverse effects. Below I’ve expanded the ‘not open’ condition within the event listener. First I confirm the event is relevant. I can’t check for an instance of because of the handler. I’d have to listen for keyboard events and that gets murky. Then I check if the focused element has the visible style. If both conditions are true, I remove and reapply focus in a non-visible manner. The boolean is Safari 18.4 onwards. Like I said: extreme caution! But I believe this fixes WebKit’s inconsistency. Feedback is very welcome. I’ll update here if concerns are raised. Native dialog elements allow us to press the ESC key to dismiss them. What about clicking the backdrop? We must opt-in to this behaviour with the attribute. Chris Ferdinandi has written about this and the JavaScript fallback . That’s enough JavaScript! My menu uses a combination of both basic CSS transitions and cross-document view transitions . For on-page transitions I use the setup below. As an example here I fade opacity in and out. How you choose to use nesting selectors and the rule is a matter of taste. I like my at-rules top level. My menu also transitions out when a link is clicked. This does not trigger the closing dialog event. Instead the closing transition is mirrored by a cross-document view transition. The example below handles the fade out for page transitions. Note that I only transition the old view state for the closing menu. The new state is hidden (“off-canvas”). Technically it should be possible to use view transitions to achieve the on-page open and close effects too. I’ve personally found browsers to still be a little janky around view transitions — bugs, or skill issue? It’s probably best to wrap a media query around transitions. “Reduced” is a significant word. It does not mean “no motion”. That said, I have no idea how to assess what is adequately reduced! No motion is a safe bet… I think? So there we have it! Declarative dialog menu with invoker commands, topped with a medley of CSS transitions and a sprinkle of almost optional JavaScript. Aren’t modern web standards wonderful, when they work? I can’t end this topic without mentioning Jim Nielsen’s menu . I won’t spoil the fun, take a look! When I realised how it works, my first reaction was “is that allowed?!” It work’s remarkably well for Jim’s blog. I don’t recall seeing that idea in the wild elsewhere. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views

The LLM Context Tax: Best Tips for Tax Avoidance

Every token you send to an LLM costs money. Every token increases latency. And past a certain point, every additional token makes your agent dumber. This is the triple penalty of context bloat: higher costs, slower responses, and degraded performance through context rot, where the agent gets lost in its own accumulated noise. Context engineering is very important. The difference between a $0.50 query and a $5.00 query is often just how thoughtfully you manage context. Here’s what I’ll cover: Stable Prefixes for KV Cache Hits - The single most important optimization for production agents Append-Only Context - Why mutating context destroys your cache hit rate Store Tool Outputs in the Filesystem - Cursor’s approach to avoiding context bloat Design Precise Tools - How smart tool design reduces token consumption by 10x Clean Your Data First (Maximize Your Deductions) - Strip the garbage before it enters context Delegate to Cheaper Subagents (Offshore to Tax Havens) - Route token-heavy operations to smaller models Reusable Templates Over Regeneration (Standard Deductions) - Stop regenerating the same code The Lost-in-the-Middle Problem - Strategic placement of critical information Server-Side Compaction (Depreciation) - Let the API handle context decay automatically Output Token Budgeting (Withholding Tax) - The most expensive tokens are the ones you generate The 200K Pricing Cliff (The Tax Bracket) - The tax bracket that doubles your bill overnight Parallel Tool Calls (Filing Jointly) - Fewer round trips, less context accumulation Application-Level Response Caching (Tax-Exempt Status) - The cheapest token is the one you never send With Claude Opus 4.6, the math is brutal: That’s a 10x difference between cached and uncached inputs. Output tokens cost 5x more than uncached inputs. Most agent builders focus on prompt engineering while hemorrhaging money on context inefficiency. In most agent workflows, context grows substantially with each step while outputs remain compact. This makes input token optimization critical: a typical agent task might involve 50 tool calls, each accumulating context. The performance penalty is equally severe. Research shows that past 32K tokens, most models show sharp performance degradation. Your agent isn’t just getting expensive. It’s getting confused. This is the single most important metric for production agents: KV cache hit rate. The Manus team considers this the most important optimization for their agent infrastructure, and I agree completely. The principle is simple: LLMs process prompts autoregressively, token by token. If your prompt starts identically to a previous request, the model can reuse cached key-value computations for that prefix. The killer of cache hit rates? Timestamps. A common mistake is including a timestamp at the beginning of the system prompt. It’s a simple mistake but the impact is massive. The key is granularity: including the date is fine. Including the hour is acceptable since cache durations are typically 5 minutes (Anthropic default) to 10 minutes (OpenAI default), with longer options available. But never include seconds or milliseconds. A timestamp precise to the second guarantees every single request has a unique prefix. Zero cache hits. Maximum cost. Move all dynamic content (including timestamps) to the END of your prompt. System instructions, tool definitions, few-shot examples, all of these should come first and remain identical across requests. For distributed systems, ensure consistent request routing. Use session IDs to route requests to the same worker, maximizing the chance of hitting warm caches. Context should be append-only. Any modification to earlier content invalidates the KV cache from that point forward. This seems obvious but the violations are subtle: The tool definition problem is particularly insidious. If you dynamically add or remove tools based on context, you invalidate the cache for everything after the tool definitions. Manus solved this elegantly: instead of removing tools, they mask token logits during decoding to constrain which actions the model can select. The tool definitions stay constant (cache preserved), but the model is guided toward valid choices through output constraints. For simpler implementations, keep your tool definitions static and handle invalid tool calls gracefully in your orchestration layer. Deterministic serialization matters too. Python dicts don’t guarantee order. If you’re serializing tool definitions or context as JSON, use sort_keys=True or a library that guarantees deterministic output. A different key order = different tokens = cache miss. Cursor’s approach to context management changed how I think about agent architecture. Instead of stuffing tool outputs into the conversation, write them to files. In their A/B testing, this reduced total agent tokens by 46.9% for runs using MCP tools. The insight: agents don’t need complete information upfront. They need the ability to access information on demand. Files are the perfect abstraction for this. We apply this pattern everywhere: Shell command outputs : Write to files, let agent tail or grep as needed Search results : Return file paths, not full document contents API responses : Store raw responses, let agent extract what matters Intermediate computations : Persist to disk, reference by path When context windows fill up, Cursor triggers a summarization step but exposes chat history as files. The agent can search through past conversations to recover details lost in the lossy compression. Clever. A vague tool returns everything. A precise tool returns exactly what the agent needs. Consider an email search tool: The two-phase pattern: search returns metadata, separate tool returns full content. The agent decides which items deserve full retrieval. This is exactly how our conversation history tool works at Fintool. It passes date ranges or search terms and returns up to 100-200 results with only user messages and metadata. The agent then reads specific conversations by passing the conversation ID. Filter parameters like has_attachment, time_range, and sender let the agent narrow results before reading anything. The same pattern applies everywhere: Document search : Return titles and snippets, not full documents Database queries : Return row counts and sample rows, not full result sets File listings : Return paths and metadata, not contents API integrations : Return summaries, let agent drill down Each parameter you add to a tool is a chance to reduce returned tokens by an order of magnitude. Garbage tokens are still tokens. Clean your data before it enters context. For emails, this means: For HTML content, the gains are even larger. A typical webpage might be 100KB of HTML but only 5KB of actual content. CSS selectors that extract semantic regions (article, main, section) and discard navigation, ads, and tracking can reduce token counts by 90%+. Markdown uses significantly fewer tokens than HTML , making conversion valuable for any web content entering your pipeline. For financial data specifically: Strip SEC filing boilerplate (every 10-K has the same legal disclaimers) Collapse repeated table headers across pages Remove watermarks and page numbers from extracted text Normalize whitespace (multiple spaces, tabs, excessive newlines) Convert HTML tables to markdown tables The principle: remove noise at the earliest possible stage, not after tokenization. Every preprocessing step that runs before the LLM call saves money and improves quality. Not every task needs your most expensive model. The Claude Code subagent pattern processes 67% fewer tokens overall due to context isolation. Instead of stuffing every intermediate search result into a single global context, workers keep only what’s relevant inside their own window and return distilled outputs. Tasks perfect for cheaper subagents: Data extraction : Pull specific fields from documents Classification : Categorize emails, documents, or intents Summarization : Compress long documents before main agent sees them Validation : Check outputs against criteria Formatting : Convert between data formats The orchestrator sees condensed results, not raw context. This prevents hitting context limits and reduces the risk of the main agent getting confused by irrelevant details. Scope subagent tasks tightly. The more iterations a subagent requires, the more context it accumulates and the more tokens it consumes. Design for single-turn completion when possible. Every time an agent generates code from scratch, you’re paying for output tokens. Output tokens cost 5x input tokens with Claude. Stop regenerating the same patterns. Our document generation workflow used to be painfully inefficient: OLD APPROACH: User: “Create a DCF model for Apple” Agent: *generates 2,000 lines of Excel formulas from scratch* Cost: ~$0.50 in output tokens alone NEW APPROACH: User: “Create a DCF model for Apple” Agent: *loads DCF template, fills in Apple-specific values* Cost: ~$0.05 The template approach: Skill references template : dcf_template.xlsx in /public/skills/dcf/ Agent reads template once : Understands structure and placeholders Agent fills parameters : Company-specific values, assumptions WriteFile with minimal changes : Only modified cells, not full regeneration For code generation, the same principle applies. If your agent frequently generates similar Python scripts, data processing pipelines, or analysis frameworks, create reusable functions: # Instead of regenerating this every time: def process_earnings_transcript(path): # 50 lines of parsing code... # Reference a skill with reusable utilities: from skills.earnings import parse_transcript, extract_guidance The agent imports and calls rather than regenerates. Fewer output tokens, faster responses, more consistent results. Subscribe now LLMs don’t process context uniformly. Research shows a consistent U-shaped attention pattern: models attend strongly to the beginning and end of prompts while “losing” information in the middle. Strategic placement matters: System instructions : Beginning (highest attention) Current user request : End (recency bias) Critical context : Beginning or end, never middle Lower-priority background : Middle (acceptable loss) For retrieval-augmented generation, this means reordering retrieved documents. The most relevant chunks should go at the beginning and end. Lower-ranked chunks fill the middle. Manus uses an elegant hack: they maintain a todo.md file that gets updated throughout task execution. This “recites” current objectives at the end of context, combating the lost-in-the-middle effect across their typical 50-tool-call trajectories. We use a similar architecture at Fintool. As agents run, context grows until it hits the window limit. You used to have two options: build your own summarization pipeline, or implement observation masking (replacing old tool outputs with placeholders). Both require significant engineering. Now you can let the API handle it. Anthropic’s server-side compaction automatically summarizes your conversation when it approaches a configurable token threshold. Claude Code uses this internally, and it’s the reason you can run 50+ tool call sessions without the agent losing track of what it’s doing. The key design decisions: Trigger threshold : Default is 150K tokens. Set it lower if you want to stay under the 200K pricing cliff, or higher if you need more raw context before summarizing. Custom instructions : You can replace the default summarization prompt entirely. For financial workflows, something like “Preserve all numerical data, company names, and analytical conclusions” prevents the summary from losing critical details. Pause after compaction : The API can pause after generating the summary, letting you inject additional context (like preserving the last few messages verbatim) before continuing. This gives you control over what survives the compression. Compaction also stacks well with prompt caching. Add a cache breakpoint on your system prompt so it stays cached separately. When compaction occurs, only the summary needs to be written as a new cache entry. Your system prompt cache stays warm. The beauty of this approach: context depreciates in value over time, and the API handles the depreciation schedule for you. Output tokens are the most expensive tokens. With Claude Sonnet, outputs cost 5x inputs. With Opus, they cost 5x inputs that are already expensive. Yet most developers leave max_tokens unlimited and hope for the best. # BAD: Unlimited output response = client.messages.create( model=”claude-sonnet-4-20250514”, max_tokens=8192, # Model might use all of this messages=[...] ) # GOOD: Task-appropriate limits TASK_LIMITS = { “classification”: 50, “extraction”: 200, “short_answer”: 500, “analysis”: 2000, “code_generation”: 4000, } Structured outputs reduce verbosity. JSON responses use fewer tokens than natural language explanations of the same information. Natural language: “The company’s revenue was 94.5 billion dollars, which represents a year-over-year increase of 12.3 percent compared to the previous fiscal year’s revenue of 84.2 billion dollars.” Structured: {”revenue”: 94.5, “unit”: “B”, “yoy_change”: 12.3} For agents specifically, consider response chunking. Instead of generating a 10,000-token analysis in one shot, break it into phases: Outline phase : Generate structure (500 tokens) Section phases : Generate each section on demand (1000 tokens each) Review phase : Check and refine (500 tokens) This gives you control points to stop early if the user has what they need, rather than always generating the maximum possible output. With Claude Opus 4.6 and Sonnet 4.5, crossing 200K input tokens triggers premium pricing. Your per-token cost doubles: Opus goes from $5 to $10 per million input tokens, and output jumps from $25 to $37.50. This isn’t gradual. It’s a cliff. This is the LLM equivalent of a tax bracket. And just like tax planning, the right strategy is to stay under the threshold when you can. For agent workflows that risk crossing 200K, implement a context budget. Track cumulative input tokens across tool calls. When you approach the cliff, trigger aggressive compression: observation masking, summarization of older turns, or pruning low-value context. The cost of a compression step is far less than doubling your per-token rate for the rest of the conversation. Every sequential tool call is a round trip. Each round trip re-sends the full conversation context. If your agent makes 20 tool calls sequentially, that’s 20 times the context gets transmitted and billed. The Anthropic API supports parallel tool calls: the model can request multiple independent tool calls in a single response, and you execute them simultaneously. This means fewer round trips for the same amount of work. The savings compound. With fewer round trips, you accumulate less intermediate context, which means each subsequent round trip is also cheaper. Design your tools so that independent operations can be identified and batched by the model. The cheapest token is the one you never send to the API. Before any LLM call, check if you’ve already answered this question. At Fintool, we cache aggressively for earnings call summarizations and common queries. When a user asks for Apple’s latest earnings summary, we don’t regenerate it from scratch for every request. The first request pays the full cost. Every subsequent request is essentially free. This operates above the LLM layer entirely. It’s not prompt caching or KV cache. It’s your application deciding that this query has a valid cached response and short-circuiting the API call. Good candidates for application-level caching: Factual lookups : Company financials, earnings summaries, SEC filings Common queries : Questions that many users ask about the same data Deterministic transformations : Data formatting, unit conversions Stable analysis : Any output that won’t change until the underlying data changes The cache invalidation strategy matters. For financial data, earnings call summaries are stable once generated. Real-time price data obviously isn’t. Match your cache TTL to the volatility of the underlying data. Even partial caching helps. If an agent task involves five tool calls and you can cache two of them, you’ve cut 40% of your tool-related token costs without touching the LLM. The Meta Lesson Context engineering isn’t glamorous. It’s not the exciting part of building agents. But it’s the difference between a demo that impresses and a product that scales with decent gross margin. The best teams building sustainable agent products are obsessing over token efficiency the same way database engineers obsess over query optimization. Because at scale, every wasted token is money on fire. The context tax is real. But with the right architecture, it’s largely avoidable. Subscribe now Every token you send to an LLM costs money. Every token increases latency. And past a certain point, every additional token makes your agent dumber. This is the triple penalty of context bloat: higher costs, slower responses, and degraded performance through context rot, where the agent gets lost in its own accumulated noise. Context engineering is very important. The difference between a $0.50 query and a $5.00 query is often just how thoughtfully you manage context. Here’s what I’ll cover: Stable Prefixes for KV Cache Hits - The single most important optimization for production agents Append-Only Context - Why mutating context destroys your cache hit rate Store Tool Outputs in the Filesystem - Cursor’s approach to avoiding context bloat Design Precise Tools - How smart tool design reduces token consumption by 10x Clean Your Data First (Maximize Your Deductions) - Strip the garbage before it enters context Delegate to Cheaper Subagents (Offshore to Tax Havens) - Route token-heavy operations to smaller models Reusable Templates Over Regeneration (Standard Deductions) - Stop regenerating the same code The Lost-in-the-Middle Problem - Strategic placement of critical information Server-Side Compaction (Depreciation) - Let the API handle context decay automatically Output Token Budgeting (Withholding Tax) - The most expensive tokens are the ones you generate The 200K Pricing Cliff (The Tax Bracket) - The tax bracket that doubles your bill overnight Parallel Tool Calls (Filing Jointly) - Fewer round trips, less context accumulation Application-Level Response Caching (Tax-Exempt Status) - The cheapest token is the one you never send That’s a 10x difference between cached and uncached inputs. Output tokens cost 5x more than uncached inputs. Most agent builders focus on prompt engineering while hemorrhaging money on context inefficiency. In most agent workflows, context grows substantially with each step while outputs remain compact. This makes input token optimization critical: a typical agent task might involve 50 tool calls, each accumulating context. The performance penalty is equally severe. Research shows that past 32K tokens, most models show sharp performance degradation. Your agent isn’t just getting expensive. It’s getting confused. Stable Prefixes for KV Cache Hits This is the single most important metric for production agents: KV cache hit rate. The Manus team considers this the most important optimization for their agent infrastructure, and I agree completely. The principle is simple: LLMs process prompts autoregressively, token by token. If your prompt starts identically to a previous request, the model can reuse cached key-value computations for that prefix. The killer of cache hit rates? Timestamps. A common mistake is including a timestamp at the beginning of the system prompt. It’s a simple mistake but the impact is massive. The key is granularity: including the date is fine. Including the hour is acceptable since cache durations are typically 5 minutes (Anthropic default) to 10 minutes (OpenAI default), with longer options available. But never include seconds or milliseconds. A timestamp precise to the second guarantees every single request has a unique prefix. Zero cache hits. Maximum cost. Move all dynamic content (including timestamps) to the END of your prompt. System instructions, tool definitions, few-shot examples, all of these should come first and remain identical across requests. For distributed systems, ensure consistent request routing. Use session IDs to route requests to the same worker, maximizing the chance of hitting warm caches. Append-Only Context Context should be append-only. Any modification to earlier content invalidates the KV cache from that point forward. This seems obvious but the violations are subtle: The tool definition problem is particularly insidious. If you dynamically add or remove tools based on context, you invalidate the cache for everything after the tool definitions. Manus solved this elegantly: instead of removing tools, they mask token logits during decoding to constrain which actions the model can select. The tool definitions stay constant (cache preserved), but the model is guided toward valid choices through output constraints. For simpler implementations, keep your tool definitions static and handle invalid tool calls gracefully in your orchestration layer. Deterministic serialization matters too. Python dicts don’t guarantee order. If you’re serializing tool definitions or context as JSON, use sort_keys=True or a library that guarantees deterministic output. A different key order = different tokens = cache miss. Store Tool Outputs in the Filesystem Cursor’s approach to context management changed how I think about agent architecture. Instead of stuffing tool outputs into the conversation, write them to files. In their A/B testing, this reduced total agent tokens by 46.9% for runs using MCP tools. The insight: agents don’t need complete information upfront. They need the ability to access information on demand. Files are the perfect abstraction for this. We apply this pattern everywhere: Shell command outputs : Write to files, let agent tail or grep as needed Search results : Return file paths, not full document contents API responses : Store raw responses, let agent extract what matters Intermediate computations : Persist to disk, reference by path The two-phase pattern: search returns metadata, separate tool returns full content. The agent decides which items deserve full retrieval. This is exactly how our conversation history tool works at Fintool. It passes date ranges or search terms and returns up to 100-200 results with only user messages and metadata. The agent then reads specific conversations by passing the conversation ID. Filter parameters like has_attachment, time_range, and sender let the agent narrow results before reading anything. The same pattern applies everywhere: Document search : Return titles and snippets, not full documents Database queries : Return row counts and sample rows, not full result sets File listings : Return paths and metadata, not contents API integrations : Return summaries, let agent drill down For HTML content, the gains are even larger. A typical webpage might be 100KB of HTML but only 5KB of actual content. CSS selectors that extract semantic regions (article, main, section) and discard navigation, ads, and tracking can reduce token counts by 90%+. Markdown uses significantly fewer tokens than HTML , making conversion valuable for any web content entering your pipeline. For financial data specifically: Strip SEC filing boilerplate (every 10-K has the same legal disclaimers) Collapse repeated table headers across pages Remove watermarks and page numbers from extracted text Normalize whitespace (multiple spaces, tabs, excessive newlines) Convert HTML tables to markdown tables The Claude Code subagent pattern processes 67% fewer tokens overall due to context isolation. Instead of stuffing every intermediate search result into a single global context, workers keep only what’s relevant inside their own window and return distilled outputs. Tasks perfect for cheaper subagents: Data extraction : Pull specific fields from documents Classification : Categorize emails, documents, or intents Summarization : Compress long documents before main agent sees them Validation : Check outputs against criteria Formatting : Convert between data formats Scope subagent tasks tightly. The more iterations a subagent requires, the more context it accumulates and the more tokens it consumes. Design for single-turn completion when possible. Reusable Templates Over Regeneration (Standard Deductions) Every time an agent generates code from scratch, you’re paying for output tokens. Output tokens cost 5x input tokens with Claude. Stop regenerating the same patterns. Our document generation workflow used to be painfully inefficient: OLD APPROACH: User: “Create a DCF model for Apple” Agent: *generates 2,000 lines of Excel formulas from scratch* Cost: ~$0.50 in output tokens alone NEW APPROACH: User: “Create a DCF model for Apple” Agent: *loads DCF template, fills in Apple-specific values* Cost: ~$0.05 The template approach: Skill references template : dcf_template.xlsx in /public/skills/dcf/ Agent reads template once : Understands structure and placeholders Agent fills parameters : Company-specific values, assumptions WriteFile with minimal changes : Only modified cells, not full regeneration Strategic placement matters: System instructions : Beginning (highest attention) Current user request : End (recency bias) Critical context : Beginning or end, never middle Lower-priority background : Middle (acceptable loss) The key design decisions: Trigger threshold : Default is 150K tokens. Set it lower if you want to stay under the 200K pricing cliff, or higher if you need more raw context before summarizing. Custom instructions : You can replace the default summarization prompt entirely. For financial workflows, something like “Preserve all numerical data, company names, and analytical conclusions” prevents the summary from losing critical details. Pause after compaction : The API can pause after generating the summary, letting you inject additional context (like preserving the last few messages verbatim) before continuing. This gives you control over what survives the compression. Outline phase : Generate structure (500 tokens) Section phases : Generate each section on demand (1000 tokens each) Review phase : Check and refine (500 tokens) This is the LLM equivalent of a tax bracket. And just like tax planning, the right strategy is to stay under the threshold when you can. For agent workflows that risk crossing 200K, implement a context budget. Track cumulative input tokens across tool calls. When you approach the cliff, trigger aggressive compression: observation masking, summarization of older turns, or pruning low-value context. The cost of a compression step is far less than doubling your per-token rate for the rest of the conversation. Parallel Tool Calls (Filing Jointly) Every sequential tool call is a round trip. Each round trip re-sends the full conversation context. If your agent makes 20 tool calls sequentially, that’s 20 times the context gets transmitted and billed. The Anthropic API supports parallel tool calls: the model can request multiple independent tool calls in a single response, and you execute them simultaneously. This means fewer round trips for the same amount of work. The savings compound. With fewer round trips, you accumulate less intermediate context, which means each subsequent round trip is also cheaper. Design your tools so that independent operations can be identified and batched by the model. Application-Level Response Caching (Tax-Exempt Status) The cheapest token is the one you never send to the API. Before any LLM call, check if you’ve already answered this question. At Fintool, we cache aggressively for earnings call summarizations and common queries. When a user asks for Apple’s latest earnings summary, we don’t regenerate it from scratch for every request. The first request pays the full cost. Every subsequent request is essentially free. This operates above the LLM layer entirely. It’s not prompt caching or KV cache. It’s your application deciding that this query has a valid cached response and short-circuiting the API call. Good candidates for application-level caching: Factual lookups : Company financials, earnings summaries, SEC filings Common queries : Questions that many users ask about the same data Deterministic transformations : Data formatting, unit conversions Stable analysis : Any output that won’t change until the underlying data changes

1 views
fLaMEd fury 1 months ago

The Summer Took Hold

What’s going on, Internet? I was going to have this post as the December wrap up post but the summer took hold of me and here we are at the end of January and a week into Feb, lol. December kicked off with a Tuesday night gig, Lewis Capaldi At The Spark Arena , a fantastic show. I finished off a post about what music ownership means to me , I was happy to get that one out of drafts and published. Then we were straight into finishing up work for the year and I took some time to reflect on the beers I drunk , the books I read , and the music I enjoyed throughout the year. Christmas was an exciting time. With a four year old and a two year old we were in full hype mode. I also got into the hype and had a blast. We had a pre-Christmas lunch at my brother’s place with his family and my parents, two days of work and then straight into Christmas Eve prep which we had at my parent-in-law’s place with the kids, their aunties and all their grandparents. They actually managed to go to sleep and woke up super excited and we got to do it all over again on Christmas Day. I braved the mall on Boxing Day and managed to pick up the last copy of the 2025 Indie Store exclusive re-issue of Get Rich Or Die Tryin’ pressed on red vinyl. The mall itself wasn’t too bad, but the traffic into the mall was horrific. Worth it though. With Christmas out of the way we had a couple days down time before we started our road trip down south. This was so much fun. Our end destination was the Marlborough Sounds, and we broke the trip up into stages so we could explore some of the country and also make sure that the kids weren’t stuck in a car non-stop for days on end. On day one we headed to the Hawke’s Bay via Orakei Korako Cave & Geothermal Park . We stayed one night in Havelock North and visited the national aquarium in Napier before continuing on to Martinborough where we spent two nights. The kids got some grandparent time and my wife and I managed to head over the hill to Wellington to catch up with some friends. On New Year’s Eve we packed up the car again and made our way with the kids to the ferry terminal in Wellington to cross the Cook Strait and make the drive to a remote part of the Marlborough Sounds . After four nights down there enjoying the remote tranquility of the top of the South Island, we headed back to the ferry terminal in Picton to catch the boat back to Wellington and drove straight back to Martinborough for a week of working remote rather than cutting the road trip short. We then packed the car up for a final time and made our way back up north, back to Havelock North again for one more night. Before heading north again the next morning we took the kids to Splash Planet and had a fun time in the water and playing mini golf. We stopped off in Hastings for some ice cream before heading back to Taupō where we stopped off at the Hooker Falls and Cobb & Co for dinner. My wife and I have great memories of family dinners and birthday parties at the Cobb and were hoping the kids would have a blast, but I think they were over it and were just tired. Luckily it was only three more hours back to Tāmaki Makaurau so I made the drive home while the family slept. Then it was back home to unpack and get back to the reality of the weekly grind of school and work. What an epic road trip though. I look forward to more of them, and they’ll only get better as the kids get older. I’ve been watching a lot of great shows recently, old and new. I saw Tulsa King on Cory’s website, gave it a go and ended up watching the entirety of season one over a couple days. I’m going to get right onto season two as soon as I can. The second season of The Vince Staples Show was like a fever dream. Short episodes, easy to watch. My favourite universe, the Power Universe had season three of Power Book IV: Force on the screen and I loved it. It’s so over the top and the premise is wild. Great end to the series and looking forward to Power Book Legacy! I watched all four episodes of Sean Combs: The Reckoning - it really shows what a psycho that guy is. The fourth season of Industry has started and I’m watching my way through that, I have no idea if I actually like this show anymore - it’s kinda fever dream like itself, but I’ll keep watching it. Shrinking is back for its third season. And finally A Knight of the Seven Kingdoms started, I loved the books and I’m loving this show. Set a hundred years before the events of A Song Of Ice And Fire, it’s a bit more lighthearted and funny than Game Of Thrones was. Been a good run of movie watching over here recently. Partly due to Mr 4 taking more of an interest in them. He’d been coming home from preschool talking about KPop Demon Hunters , and we’d been listening to the music and so we sat down and watched the movie together one afternoon. I really enjoyed it. He did too. He’s also been into Paw Patrol recently so as a treat he’s been able to watch some of the Paw Patrol movies starting with PAW Patrol: The Movie , which I also ended up really enjoying. Surprising at the big names doing the voices. Totally different to the budget voices on the Tonies 🤣 So yeah, the movie was really good, although he got quite upset when Chase lost his confidence. It was cute. Christmas was also a bigger than usual thing in our household this year and so on Christmas Eve we all sat down together and watched Home Alone with him. A fun movie which he enjoyed, and maybe laughed a bit too hard at parts. When the kids did eventually go to bed I enjoyed the annual screening of The Guardians of the Galaxy Holiday Special . During the roadtrip we stayed at a motel on the way down south and we took advantage of the yearly screening of Grease - one of my all time favourite movies. After the kids sleeping for hours in the car we knew they weren’t going to bed easily so we had a family movie night. Grease isn’t a kids movie, but it’s also quite tame compared to movies today. I enjoyed it, they enjoyed it and we all had a good time, and they eventually went to sleep. The other movies I enjoyed were Woodstock 99 which dove into the absolute state of the 99 Woodstock festival and how it eventually turned into what it was. What a shit time that would have been. On the lighter side I enjoyed Good Fortune and then Shithouse , which has a shit name, but explored the isolating side of social interaction. Eleven records added to the collection across December and January. A lot of anniversary editions and special pressings landed on the shelf. Here’s all the posts I enjoyed recently. You can check out more over on the Bookmarks archive. Shellsharks is back and publishing Scrolls again so go check that out. Hyde is still publishing his Over/Under series and has just published issue #50 with Bobby Hiltz . I really love the way Hyde personalises each Over/Under interview with his guest - it makes for an interesting read and shows that he does go out of his way to read their websites. xandra has also just dropped the Autumn/Winter 2026 issue of Good Internet magazine. I can’t wait to get stuck in and read all the new articles while I wait for my physical edition to arrive. Plenty of hacking on the website over the break. I’ve been messing with the homepage layout. I’m still not entirely happy with the bento, but it’s fine for now. I’ve added a “Stuck In My Head” section that highlights the six most played songs from last.fm . I’ve also moved the webrings into the bento and revived my buttons collection - I’ve got more to add here later. I’ve started adding garden indicators for living posts that I intend to keep working on over time. I will eventually create a /garden/ page to list these pages for easy browsing. I guess that tag page serves that purpose at the moment. Along with that I think I’ve improved the post metadata on each post that helps tell the story better with category, tags, and dates. Let me know what you think. I spent some (a lot) of time unifying the look and feel across Beer Fridge , Recordshelf , Bookshelf , and Comics pages with cleaned up CSS and introducing a common layout for the stat cards. That damn comics page needs a lot of work. This update was brought to you by Girl in Stilettos by Annah Mac Hey, thanks for reading this post in your feed reader! Want to chat? Reply by email or add me on XMPP , or send a webmention . Check out the posts archive on the website. App selection criteria - Cory Dransfeldt on his criteria for choosing apps. Delete Spotify? Sure, But Don’t Just Replace it With Another Subscription - Stephanie Vee on thinking beyond just swapping one subscription for another. Why RSS matters - Ben Werdmuller on why RSS still matters. What happened to the comment section? - The History of the Web looks at what happened to comment sections. Three Predictions For The Web - Three predictions for the future of the web. IndieWeb Carnival December 2025 – The IndieWeb in 2030 - The Frugal Gamer hosts the December 2025 IndieWeb Carnival. Static sites killed the blog comment star - Did static sites kill blog comments, and can old tech bring them back? An Island in the Net in 2030? - Khürt Williams imagines the IndieWeb in 2030. Drinking The Largest Beer At The Airport Makes Everything Better - A celebration of the airport beer. Backing up Spotify - How to back up your Spotify library. 30 Years of br Tags - A look back at 30 years of the humble tag. Looking beyond collective blogging - V.H. Belvadi looks at what comes after collective blogging. Readers, writers and blogging ethics - On saying what you mean and being damned. How and Why Do I Blog? - Reflections on the how and why of blogging. Building an IndieWeb house (I): introduction - Starting the journey of building an IndieWeb house. The IndieWeb Doesn’t Need to ‘Take Off’ - Susam Pal on why the IndieWeb doesn’t need mass adoption. Experiences with social medias - Reflections on experiences with social media. Writing Hyperlinks: Salient, Descriptive, Start with Keyword - Nielsen Norman Group on writing better hyperlinks. Why 90s Movies Feel More Alive Than Anything on Netflix - Why 90s movies hit different. You should start a blog today - A nudge to start blogging. Self-hosting versus lots of small indieweb providers - The trade-offs between self-hosting and using small IndieWeb providers.

0 views
David Bushell 1 months ago

Big Design, Bold Ideas

I’ve only gone and done it again! I redesigned my website. This is the eleventh major version. I dare say it’s my best attempt yet. There are similarities to what came before and plenty of fresh CSS paint to modernise the style. You can visit my time machine to see the ten previous designs that have graced my homepage. Almost two decades of work. What a journey! I’ve been comfortable and coasting for years. This year feels different. I’ve made a career building for the open web. That is now under attack. Both my career, and the web. A rising sea of slop is drowning out all common sense. I’m seeing peers struggle to find work, others succumb to the chatbot psychosis. There is no good reason for such drastic change. Yet change is being forced by the AI industrial complex on its relentless path of destruction. I’m not shy about my stance on AI . No thanks! My new homepage doubles down. I won’t be forced to use AI but I can’t ignore it. Can’t ignore the harm. Also I just felt like a new look was due. Last time I mocked up a concept in Adobe XD . Adobe in now unfashionable and Figma, although swank, has that Silicon Valley stench . Penpot is where the cool kids paint pretty pictures of websites. I’m somewhat of an artist myself so I gave Penpot a go. My current brand began in 2016 and evolved in 2018 . I loved the old design but the rigid layout didn’t afford much room to play with content. I spent a day pushing pixels and was quite chuffed with the results. I designed my bandit game in Pentpot too (below). That gave me the confidence to move into real code. I’m continuing with Atkinson Hyperlegible Next for body copy. I now license Ahkio for headings. I used Komika Title before but the all-caps was unwieldy. I’m too lazy to dig through backups to find my logotype source. If you know what font “David” is please tell me! I worked with Axia Create on brand strategy. On that front, we’ll have more exciting news to share later in the year! For now what I realised is that my audience here is technical. The days of small business owners seeking me are long gone. That market is served by Squarespace or Wix. It’s senior tech leads who are entrusted to find and recruit me, and peers within the industry who recommend me. This understanding gave me focus. To illustrate why AI is lame I made an interactive mini-game! The slot machine metaphor should be self-explanatory. I figured a bit of comedy would drive home my AI policy . In the current economy if you don’t have a sparkle emoji is it even a website? The game is built with HTML canvas, web components, and synchronised events I over-complicated to ensure a unique set of prizes. The secret to high performance motion blur is to cheat with pre-rendered PNGs. In hindsight I could have cheated more with a video. I commissioned Declan Chidlow to create a bespoke icon set. Declan delivered! The icons look so much better than the random assortment of placeholders I found. I’m glad I got a proper job done. I have neither the time nor skill for icons. Declan read my mind because I received a 88×31 web badge bonus gift. I had mocked up a few badges myself in Penpot. Scroll down to see them in the footer. Declan’s badge is first and my attempts follow. I haven’t quite nailed the pixel look yet. My new menu is built using with invoker commands and view transitions for a JavaScript-free experience. Modern web standards are so cool when the work together! I do have a tiny JS event listener to polyfill old browsers. The pixellated footer gradient is done with a WebGL shader. I had big plans but after several hours and too many Stack Overflow tabs, I moved on to more important things. This may turn into something later but I doubt I’ll progress trying to learn WebGL. Past features like my Wasm static search and speech synthesis remain on the relevant blog pages. I suspect I’ll be finding random one-off features I forgot to restyle. My homepage ends with another strong message. The internet is dominated by US-based big tech. Before backing powers across the Atlantic, consider UK and EU alternatives. The web begins at home. I remain open to working with clients and collaborators worldwide. I use some ‘big tech’ but I’m making an effort to push for European alternatives. US-based tech does not automatically mean “bad” but the absolute worst is certainly thriving there! Yeah I’m English, far from the smartest kind of European, but I try my best. I’ve been fortunate to find work despite the AI threat. I’m optimistic and I refuse to back down from calling out slop for what it is! I strongly believe others still care about a job well done. I very much doubt the touted “10x productivity” is resulting in 10x profits. The way I see it, I’m cheaper, better, and more ethical than subsidised slop. Let me know on the socials if you love or hate my new design :) P.S. I published this Sunday because Heisenbugs only appear in production. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views
iDiallo 1 months ago

Microsoft Should Watch The Expanse

My favorite piece of technology in science fiction isn't lightsabers, flying spaceships, or even robots. It's AI. But not just any AI. My favorite is the one in the TV show The Expanse . If you watch The Expanse, the most advanced technology is, of course, the Epstein drive (an unfortunate name in this day and age). In their universe, humanity can travel to distant planets, the Belt, and Mars. Mars has the most high-tech military, which is incredibly cool. But the AI is still what impresses me most. If you watched the show, you're probably wondering what the hell I'm talking about right now. Because there is no mention of AI ever. The AI is barely visible. In fact, it's not visible at all. Most of the time, there aren't even voices. Instead, their computer interfaces respond directly to voice and gesture commands without returning any sass. In Season 1, Miller (the detective) is trying to solve a crime. Out of the blue, he just says, "Plot the course the Scopuli took over the past months." The course is plotted right there in his living room. No fuss, no interruptions, no "OK Google." And when he finally figures it out, no one says "You are absolutely right!" He then interacts with the holographic display in real time, asking for additional information and manipulating the data with gestures. At no point does he anthropomorphize the AI. It's always there, always available, always listening, but it never interrupts. This type of interaction is present throughout the series. In the Rocinante, James Holden will give commands like "seal bulkhead," "plot intercept course," or "scan for life signs," and the ship's computer simply executes. There are no loading screens, no chatbot personality trying to be helpful. The computer doesn't explain what it's doing or ask for confirmation on routine tasks. It just works. When Holden needs tactical information during a firefight, he doesn't open an app or navigate menus. He shouts questions, and relevant data appears on his helmet display. When Naomi needs to calculate a complex orbital maneuver, she doesn't fight with an interface. She thinks out loud, and the system provides the calculations she needs. This is the complete opposite of Microsoft's Copilot... Yes, this is about Copilot. In Microsoft's vision, they think they're designing an AI assistant, an AI copilot that's always there to help. You have Copilot in Excel, in Edge, in the taskbar. It's everywhere, yet it's as useless as you can imagine. What is Copilot? Is it ChatGPT or a wrapper around it? Is it a code assistant? Is it a search engine? Or wait, is it all of Microsoft Office now? It's attached to every application, yet it hasn't been particularly helpful. We now use Teams at work, and I see Copilot popping up every time to offer to help me, just like Clippy. OK, fine, I asked for the meaning of a term I hear often in this company. Copilot doesn't know. Well, it doesn't say it doesn't know. Instead, it gives me the definition of what it thinks the term means in general. Imagine for a second you're a manager and you hear developers talking about issues with Apache delaying a project. You don't know what Apache is, so you ask Copilot. It tells you that the Apache are a group of Native American tribes known for their resilience in the Southwest. If you don't know any better, you might take that definition at face value, never knowing that Copilot has does not have access to any of the company data. Now in the project retro, you'll blame a native American tribe for delaying the project. Copilot is everywhere, yet it is nowhere. Nobody deliberately opens it to solve a problem. Instead, it's like Google Plus from back in the day. If you randomly clicked seven times on the web, you would somehow end up with a Google Plus account and, for some reason, two YouTube accounts. Copilot is visible when it should be invisible, and verbose when it should be silent. It interrupts your workflow to offer help you didn't ask for, then fails to provide useful answers when you actually need them. It's the opposite of the AI in The Expanse. It doesn't fade in the background. It is constantly reminding you that you need to use it here and now. In The Expanse , the AI doesn't have a personality because it doesn't need one. It's not trying to be your friend or impress you with its conversational abilities. It's a tool, refined to perfection. It is not trying to replace your job, it is there to support you. Copilot only exists to impress you, and it fails at it every single time. Satya should binge-watch The Expanse. I'm not advocating for AI everything, but I am all for creating useful tools. And Copilot, as it currently exists, is one of the least useful implementations of AI I've encountered. The best technology is invisible. It doesn't announce itself, doesn't demand attention, and doesn't try to be clever. It simply works when you need it and disappears when you don't. I know Microsoft won't read this or learn from it. Instead, I expect Windows 12 to be renamed Microsoft Copilot OS. In The Expanse, the AI turn people into heroes. In our world, Copilot, Gemini, ChatGPT, all want to be the heroes. And they will differentiate themselves by trying to be the loudest.

0 views
HeyDingus 1 months ago

7 Things This Week [#182]

A weekly list of interesting things I found on the internet, posted on Sundays. Sometimes themed, often not. 1️⃣ Jose Munoz has a good tip for not getting sucked into doom-scrolling apps by Siri Suggestions in Search and the App Library: simply hide them from those areas. [ 🔗 josemunozmatos.com ] 2️⃣ I love a good stats-based pitch. Herman provides one for the benefits of morning exercise. [ 🔗 herman.bearblog.dev ] 3️⃣ Jason Fried explains a clever design detail about the power reserve indicator on a mechanical watch. [ 🔗 world.hey.com ] 4️⃣ I found myself nodding along to Chris Coyier’s list of words you should probably avoid using in your writing. [ 🔗 css-tricks.com ] 5️⃣ I spent a surprising amount of time recently perusing the depths of Louie Mantia’s portfolio and blog after reading his People & Blogs interview . He’s worked on so many cool things, lots of which have touched my life. [ 🔗 lmnt.me ] 6️⃣ Robert Birming made me feel a little better about my less-than-tidy house. [ 🔗 robertbirming.com ] 7️⃣ I’m not going to buy it, but I’m certainly intrigued by this tiny eReader that attaches via MagSafe onto the back of your phone. I love my Kobo, but it so often gets left behind. This would be a remedy. [ 🔗 theverge.com ] Thanks for reading 7 Things . If you enjoyed these links or have something neat to share, please let me know . And remember that you can get more links to internet nuggets that I’m finding every day by following me @jarrod on the social web. HeyDingus is a blog by Jarrod Blundy about technology, the great outdoors, and other musings. If you like what you see — the blog posts , shortcuts , wallpapers , scripts , or anything — please consider leaving a tip , checking out my store , or just sharing my work. Your support is much appreciated! I’m always happy to hear from you on social , or by good ol' email .

0 views
Kev Quirk 1 months ago

I've Moved to Pure Blog!

In my last post I introduced Pure Blog and ended the post by saying: I'm going to take a little break from coding for a few days, then come back and start migrating this site over to Pure Blog. Dogfoodin' yo! Yeah, I didn't take a break. Instead I've pretty much spent my entire weekend at the computer migrating this site from Jekyll to Pure Blog, and trying to make sure everything works ok. Along the way there were features that I wanted to add into Pure Blog to make my live easier, which I've now done, these include: As well as all this, I've also changed the way Pure Blog is formatted so that it's easier for people to update their Pure Blog version. While I was there, I also added a simple little update page in settings so people can see if they're running the latest version or not: Finally, I decided to give the site a new lick of paint. Which was by far the easiest part of this whole thing - just some custom CSS in the CMS and I ended up with this nice (albeit brutal) new design. The way I've architected Pure Blog should allow me to very easily change the design going forward, which is just fantastic for a perpetual fiddler, such as myself. OK, that's enough for one weekend. I hope publishing this post doesn't bring any other issues to the surface, but we shall see. Now I really am going to take a break from coding. This has been so much fun, and I continue to learn a lot. For now though, my brain needs a rest. Oh, if you're using Pure Blog, please do let me know - I'd love to hear your feedback. The reply button below should be working fine. 🙃 Thanks for reading this post via RSS. RSS is great, and you're great for using it. ❤️ You can reply to this post by email , or leave a comment . Hooks so I can automatically purge Bunny CDN cache when posts are published/updated. Implementing data files so I can generate things like my Blogroll and Projects pages from YML lists. Adding shortcodes so I can have a site wide email setting and things like my Reply by email button works at the bottom of every post. Post layout partial so I can add custom content below my posts without moving away from Pure Blog's upstream code.

0 views
Jim Nielsen 1 months ago

The Browser’s Little White Lies

So I’m making a thing and I want it to be styled different if the link’s been visited. Rather than build something myself in JavaScript, I figure I’ll just hook into the browser’s mechanism for tracking if a link’s been visited (a sensible approach, if I do say so myself ). Why write JavaScript when a little CSS will do? So I craft this: But it doesn’t work. is relatively new, and I’ve been known to muff it, so it’s probably just a syntax issue. I start researching. Wouldn’t you know it? We can’t have nice things. doesn’t always work like you’d expect because we (not me, mind you) exploited it. Here’s MDN : You can style visited links, but there are limits to which styles you can use. While is not mentioned specifically, other tricks like sibling selectors are: When using a sibling selector, such as , the adjacent element ( in this example) is styled as though the link were unvisited. Why? You guessed it. Security and privacy reasons. If it were not so, somebody could come along with a little JavaScript and uncover a user’s browsing history (imagine, for example, setting styles for visited and unvisited links, then using and checking style computations). MDN says browsers tell little white lies: To preserve users' privacy, browsers lie to web applications under certain circumstances So, from what I can tell, when I write the browser is telling the engine that handles styling that all items have never been (even if they have been). So where does that leave me? Now I will abandon CSS and go use JavaScript for something only JavaScript can do. That’s a good reason for JS. Reply via: Email · Mastodon · Bluesky

0 views