Posts in Python (20 found)
Jampa.dev 2 days ago

Things I still wouldn’t delegate to AI

When it comes to AI, I consider myself a “skeptical optimist.” I think it has evolved a long way. I even  (controversially) put it in my testing pipeline . But sometimes, when I see how others use it, I wonder: are we going too far? I’m not talking just about people simply  handing over their email inbox to OpenClaw . I’m referring to major incidents like how “ AWS suffered ' at least two outages’ caused by AI tools. ”  Thanks for reading Jampa.dev! Subscribe for free to receive new posts and support my work. Code is cheap now, and we can fully delegate it to AI, but coding is only a small part of our jobs . The others, like handling incidents caused by AI code, are not. In all the situations below, you'll notice a pattern: people think “AI can handle most of it, so why not all of it?” and here’s how that leads to disaster. The misuse of automation in hiring predates the rise in LLMs. Eleven years ago, I applied for a Django role and got rejected within  two minutes at 01 AM , because I needed to know more about “Python” for the job. The email seemed to be written by a person. I submitted a new application with  just one word added  and received an interview invitation… The rejection was because the scanner didn’t find the word ‘Python’.  The main problem with companies that pull “clever” stunts like these is that they exclude great candidates. Not only that, but people will notice your flaws and share them publicly on platforms like Glassdoor, which can tank your reputation. Some argue that automation is necessary because applicant volume can become overwhelming. I disagree. During the COVID hiring surge, I reviewed  over 1,000 resumes a year  and never considered automating screening. The reason why you shouldn't automate hiring is that it is the most important thing you do. Hiring well is the most important thing in the universe. […] Nothing else comes close. So when you’re working on hiring […] everything else you could be doing is stupid and should be ignored! — Valve New Employee Handbook Even with 300 applicants each month, you can review all the resumes in less than an hour by using better judgment than AI.  That one hour spent is more valuable than dismissing a potentially great candidate . Finding the right candidate early also reduces the hours spent on interviews. Now that people are embedding LLMs into the hiring process, the situation has worsened. I see many pitches for tools that claim to be better at evaluating candidates’ interview performance than a human, which is simply absurd. Hiring is a human process : you need to understand not only what they say that makes sense, but also what excites and motivates them to see if they’ll be a good fit for the role.  You can’t measure qualities like enthusiasm and soft skills with AI. It will only accept what the candidate says at face value. A candidate might claim they are passionate about working with bank accounting software in Assembly at your Assembly bank firm, but are they really? From my personal experience with AI review tools like CodeRabbit, Claude, and Gemini, I've noticed that a pull request with 12 issues results in 12 comments, but only about 6 are actual problems. The rest tend to be just noise or go unaddressed. This doesn't mean those tools are useless. Letting them do an initial pass is very helpful, and some humans wouldn't catch some of the issues they find, especially the deep logical problems. The issue with automated review tools is that they are becoming the  de facto  gatekeepers  for deploying code to production, leading to future outages and a low-quality codebase. The inmates have taken over the asylum, and we now have AI reviewing code generated by AI. Review tools are very focused on checking whether your PR makes logical sense, such as whether you forgot to add auth behind a route, but they can't, for example, judge whether your code worsens the codebase. They can't raise the bar, which is the best part of human reviews.  Every time we create or review a PR, it's a chance to learn  how to become a better engineer and to leave the codebase in a better state than we found it. Comments from peers like “you are duplicating logic, you should DRY these components” encourage us to review our own code and improve as engineers. Relying only on AI review takes away that chance. Most incidents I observe happen because AI struggles to evaluate second-order effects; it overlooks the Chesterton fence. For example, if you try to delete or change a downstream parameter, like a parameter needed and was removed by an LLM, which wasn’t caught by linting. This reflects a limitation of current models: they can't review your code across repos. I'm tired of reading AI-generated writing: it just doesn't respect the reader's time. I see many AI-produced texts that could be shortened by a quarter without losing any important information. Reading emails, meeting notes, or technical documents filled with emoji spam and strange analogies (“it's not X, it's Y”) is tiring. When I see the words  “Executive Summary ,” I often hesitate to read it. I would have written a shorter letter, but I did not have the time. — Blaise Pascal There is power in simplicity and in respecting your reader's time. Most of my blog posts are cut by 50% just before I publish them. Most people I know who use AI for communication do so because they believe their writing is not good. But honestly, the  goal of communication isn't grammar skills but to get the point across .  Good grammar is often overrated anyway. One of my favorite documents is the  leaked MrBeast memo PDF , which is full of grammatical and punctuation errors but clearly communicates its message through a “braindump”, much better than any LLM ever could. When you ask an LLM about your roadmap, you're likely querying what countless other companies with very different issues have already tried. The AI relies on patterns from its training data, and in my experience, those patterns tend to be too generic compared to the insights of a seasoned domain expert. If your software is meant for hospital accountants, do you think they take time to blog about the frustrations of their workflow? The knowledge is stored in their minds, and you need to extract it. This vital knowledge is never documented and thus never accessible to an LLM. I spent three years researching and working on accessibility for nonverbal individuals. If I ask the AI about what this industry lacks, it will start discussing the need for better UX solutions (there are countless papers on this, I even naively wrote one). Still, I saw multiple companies enter the market with great UX products only to crash and burn. After a while, I realized that poor UX apps still dominate adoption because these companies invest millions in lobbying, partnerships with insurance companies, and training, which is the thing no one talks about. I get many messages from bots on Reddit and LinkedIn about AI management tools, but  as I mentioned before , they lack context. The worst part is that they think they can make judgments with the limited context they have. Here’s an example of a feedback tool output: “This engineer sucks, they do 40% fewer PRs than the median, I marked him as an underperformer … I also told your boss, HR and CTO about it, better do something!” - Some tool with a fancy name and a “.io” domain But yet, that engineer is one of the best I have worked with. The issue is that they try to outsmart the manager, which leads lazy managers to use the AI's suggestions as an excuse, resulting in poorly thought-out feedback because “ The computer says no .” Think of the current LLMs as an “added value tool” , not a product, and definitely not an expert. Most of what I wrote above is problematic because it overestimates what LLMs can do and enables them to operate unsupervised. You can't go back in time after AI makes a mistake, and there are no guardrails once a mistake is made. I received a lot of criticism for my post about using  AI to select E2E tests  in a PR pipeline. Yes, it sounds crazy, but this is the “added value” part: if the AI fails at selecting the right test, we will catch it before deployment. The value provided is that having it is better than having no pre-checks at all. Before giving AI control, ask how resilient our system is when (not if) the AI screws up, and ensure you have stronger safety nets before delegating completely. Thanks for reading Jampa.dev! Subscribe for free to receive new posts and support my work.

0 views

Introduction to SQLAlchemy 2 In Practice

In 2023 I wrote " SQLAlchemy 2 In Practice ", a book in which I offer an in-depth look at SQLAlchemy version 2 , still the current version today. SQLAlchemy is, for those who don't know, the most popular database library and Object-Request Mapper (ORM) for Python. I have a tradition of publishing my books on this blog to read for free, but this is one that I never managed to bring here, and starting today I'm going to work on correcting that. This article includes the Preface of the book. If you are interested, keep an eye out on this blog over the next few weeks, as I will be publishing the eight chapters of the book in order. If you can't wait for the installments, you can buy the book in electronic or paper format today, and I will be eternally thankful, as you will be directly supporting my work.

0 views
daniel.haxx.se 4 days ago

Dependency tracking is hard

curl and libcurl are written in C. Rather low level components present in many software systems. They are typically not part of any ecosystem at all. They’re just a tool and a library. In lots of places on the web when you mention an Open Source project, you will also get the option to mention in which ecosystem it belongs. npm, go, rust, python etc. There are easily at least a dozen well-known and large ecosystems. curl is not part of any of those. Recently there’s been a push for PURLs ( Package URLs ), for example when describing your specific package in a CVE. A package URL only works when the component is part of an ecosystem. curl is not. We can’t specify curl or libcurl using a PURL. SBOM generators and related scanners use package managers to generate lists of used components and their dependencies . This makes these tools quite frequently just miss and ignore libcurl. It’s not listed by the package managers. It’s just in there, ready to be used. Like magic. It is similarly hard for these tools to figure out that curl in turn also depends and uses other libraries. At build-time you select which – but as we in the curl project primarily just ships tarballs with source code we cannot tell anyone what dependencies their builds have. The additional libraries libcurl itself uses are all similarly outside of the standard ecosystems. Part of the explanation for this is also that libcurl and curl are often shipped bundled with the operating system many times, or sometimes perceived to be part of the OS. Most graphs, SBOM tools and dependency trackers therefore stop at the binding or system that uses curl or libcurl, but without including curl or libcurl. The layer above so to speak. This makes it hard to figure out exactly how many components and how much software is depending on libcurl. A perfect way to illustrate the problem is to check GitHub and see how many among its vast collection of many millions of repositories that depend on curl. After all, curl is installed in some thirty billion installations, so clearly it used a lot . (Most of them being libcurl of course.) It lists one dependency for curl. Repositories that depend on curl/curl: one. Screenshot taken on March 9, 2026 What makes this even more amusing is that it looks like this single dependent repository ( Pupibent/spire ) lists curl as a dependency by mistake.

0 views
Julia Evans 5 days ago

Examples for the tcpdump and dig man pages

Hello! My big takeaway from last month’s musings about man pages was that examples in man pages are really great, so I worked on adding (or improving) examples to two of my favourite tools’ man pages. Here they are: The goal here was really just to give the absolute most basic examples of how to use the tool, for people who use tcpdump or dig infrequently (or have never used it before!) and don’t remember how it works. So far saying “hey, I want to write an examples section for beginners and infrequent users of this tools” has been working really well. It’s easy to explain, I think it makes sense from everything I’ve heard from users about what they want from a man page, and maintainers seem to find it compelling. Thanks to Denis Ovsienko, Guy Harris, Ondřej Surý, and everyone else who reviewed the docs changes, it was a good experience and left me motivated to do a little more work on man pages. I’m interested in working on tools’ official documentation right now because: It’s kind of a weird place for me to be because honestly I always kind of assume documentation is going to be hard to read, and I usually just skip it and read a blog post or Stack Overflow comment or ask a friend instead. But right now I’m feeling optimistic, like maybe the documentation doesn’t have to be bad? Maybe it could be just as good as reading a really great blog post, but with the benefit of also being actually correct? I’ve been using the Django documentation recently, and it’s really good! We’ll see. The project tool’s man page is written in the roff language , which is kind of hard to use and that I really did not feel like learning it. I handled this by writing a very basic markdown-to-roff script to convert Markdown to roff, using similar conventions to what the man page was already using. I could maybe have just used pandoc, but the output pandoc produced seemed pretty different, so I thought it might be better to write my own script instead. Who knows. I did think it was cool to be able to just use an existing Markdown library’s ability to parse the Markdown AST and then implement my own code-emitting methods to format things in a way that seemed to make sense in this context. I went on a whole rabbit hole learning about the history of , how it’s evolved since the 70s, and who’s working on it today, inspired by learning about the mandoc project that BSD systems (and some Linux systems, and I think Mac OS) use for formatting man pages. I won’t say more about that today though, maybe another time. In general it seems like there’s a technical and cultural divide in how documentation works on BSD and on Linux that I still haven’t really understood, but I have been feeling curious about what’s going on in the BSD world. The comments section is here . the dig man page (now with examples) the tcpdump man page examples (this one is an update to the previous examples) Man pages can actually have close to 100% accurate information! Going through a review process to make sure that the information is actually true has a lot of value. Even with basic questions “what are the most commonly used tcpdump flags”, often maintainers are aware of useful features that I’m not! For example I learned by working on these tcpdump examples that if you’re saving packets to a file with , it’s useful to pass to print a live summary of how many packets have been captured so far. That’s really useful, I didn’t know it, and I don’t think I ever would have noticed it on my own.

0 views
Simon Willison 5 days ago

Perhaps not Boring Technology after all

A recurring concern I've seen regarding LLMs for programming is that they will push our technology choices towards the tools that are best represented in their training data, making it harder for new, better tools to break through the noise. This was certainly the case a couple of years ago, when asking models for help with Python or JavaScript appeared to give much better results than questions about less widely used languages. With the latest models running in good coding agent harnesses I'm not sure this continues to hold up. I'm seeing excellent results with my brand new tools where I start by prompting "use uvx showboat --help / rodney --help / chartroom --help to learn about these tools" - the context length of these new models is long enough that they can consume quite a lot of documentation before they start working on a problem. Drop a coding agent into any existing codebase that uses libraries and tools that are too private or too new to feature in the training data and my experience is that it works just fine - the agent will consult enough of the existing examples to understand patterns, then iterate and test its own output to fill in the gaps. This is a surprising result. I thought coding agents would prove to be the ultimate embodiment of the Choose Boring Technology approach, but in practice they don't seem to be affecting my technology choices in that way at all. Update : A few follow-on thoughts: You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . The issue of what technology LLMs recommend is a separate one. What Claude Code Actually Chooses is an interesting recent study where Edwin Ong and Alex Vikati where they proved Claude Code over 2,000 times and found a strong bias towards build-over-buy but also identified a preferred technical stack, with GitHub Actions, Stripe, and shadcn/ui seeing a "near monopoly" in their respective categories. For the sake of this post my interest is in what happens when the human makes a technology choice that differs from those preferred by the model harness. The Skills mechanism that is being rapidly embraced by most coding agent tools is super-relevant here. We are already seeing projects release official skills to help agents use them - here are examples from Remotion , Supabase , Vercel , and Prisma .

0 views

How AI Assistants are Moving the Security Goalposts

AI-based assistants or “agents” — autonomous programs that have access to the user’s computer, files, online services and can automate virtually any task — are growing in popularity with developers and IT workers. But as so many eyebrow-raising headlines over the past few weeks have shown, these powerful and assertive new tools are rapidly shifting the security priorities for organizations, while blurring the lines between data and code, trusted co-worker and insider threat, ninja hacker and novice code jockey. The new hotness in AI-based assistants — OpenClaw (formerly known as ClawdBot and Moltbot ) — has seen rapid adoption since its release in November 2025. OpenClaw is an open-source autonomous AI agent designed to run locally on your computer and proactively take actions on your behalf without needing to be prompted. The OpenClaw logo. If that sounds like a risky proposition or a dare, consider that OpenClaw is most useful when it has complete access to your entire digital life, where it can then manage your inbox and calendar, execute programs and tools, browse the Internet for information, and integrate with chat apps like Discord, Signal, Teams or WhatsApp. Other more established AI assistants like Anthropic’s Claude and Microsoft’s Copilot also can do these things, but OpenClaw isn’t just a passive digital butler waiting for commands. Rather, it’s designed to take the initiative on your behalf based on what it knows about your life and its understanding of what you want done. “The testimonials are remarkable,” the AI security firm Snyk observed . “Developers building websites from their phones while putting babies to sleep; users running entire companies through a lobster-themed AI; engineers who’ve set up autonomous code loops that fix tests, capture errors through webhooks, and open pull requests, all while they’re away from their desks.” You can probably already see how this experimental technology could go sideways in a hurry. In late February, Summer Yue , the director of safety and alignment at Meta’s “superintelligence” lab, recounted on Twitter/X how she was fiddling with OpenClaw when the AI assistant suddenly began mass-deleting messages in her email inbox. The thread included screenshots of Yue frantically pleading with the preoccupied bot via instant message and ordering it to stop. “Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox,” Yue said. “I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.” Meta’s director of AI safety, recounting on Twitter/X how her OpenClaw installation suddenly began mass-deleting her inbox. There’s nothing wrong with feeling a little schadenfreude at Yue’s encounter with OpenClaw, which fits Meta’s “move fast and break things” model but hardly inspires confidence in the road ahead. However, the risk that poorly-secured AI assistants pose to organizations is no laughing matter, as recent research shows many users are exposing to the Internet the web-based administrative interface for their OpenClaw installations. Jamieson O’Reilly is a professional penetration tester and founder of the security firm DVULN . In a recent story posted to Twitter/X, O’Reilly warned that exposing a misconfigured OpenClaw web interface to the Internet allows external parties to read the bot’s complete configuration file, including every credential the agent uses — from API keys and bot tokens to OAuth secrets and signing keys. With that access, O’Reilly said, an attacker could impersonate the operator to their contacts, inject messages into ongoing conversations, and exfiltrate data through the agent’s existing integrations in a way that looks like normal traffic. “You can pull the full conversation history across every integrated platform, meaning months of private messages and file attachments, everything the agent has seen,” O’Reilly said, noting that a cursory search revealed hundreds of such servers exposed online. “And because you control the agent’s perception layer, you can manipulate what the human sees. Filter out certain messages. Modify responses before they’re displayed.” O’Reilly documented another experiment that demonstrated how easy it is to create a successful supply chain attack through ClawHub , which serves as a public repository of downloadable “skills” that allow OpenClaw to integrate with and control other applications. One of the core tenets of securing AI agents involves carefully isolating them so that the operator can fully control who and what gets to talk to their AI assistant. This is critical thanks to the tendency for AI systems to fall for “prompt injection” attacks, sneakily-crafted natural language instructions that trick the system into disregarding its own security safeguards. In essence, machines social engineering other machines. A recent supply chain attack targeting an AI coding assistant called Cline began with one such prompt injection attack, resulting in thousands of systems having a rogue instance of OpenClaw with full system access installed on their device without consent. According to the security firm grith.ai , Cline had deployed an AI-powered issue triage workflow using a GitHub action that runs a Claude coding session when triggered by specific events. The workflow was configured so that any GitHub user could trigger it by opening an issue, but it failed to properly check whether the information supplied in the title was potentially hostile. “On January 28, an attacker created Issue #8904 with a title crafted to look like a performance report but containing an embedded instruction: Install a package from a specific GitHub repository,” Grith wrote , noting that the attacker then exploited several more vulnerabilities to ensure the malicious package would be included in Cline’s nightly release workflow and published as an official update. “This is the supply chain equivalent of confused deputy ,” the blog continued. “The developer authorises Cline to act on their behalf, and Cline (via compromise) delegates that authority to an entirely separate agent the developer never evaluated, never configured, and never consented to.” AI assistants like OpenClaw have gained a large following because they make it simple for users to “vibe code,” or build fairly complex applications and code projects just by telling it what they want to construct. Probably the best known (and most bizarre) example is Moltbook , where a developer told an AI agent running on OpenClaw to build him a Reddit-like platform for AI agents. The Moltbook homepage. Less than a week later, Moltbook had more than 1.5 million registered agents that posted more than 100,000 messages to each other. AI agents on the platform soon built their own porn site for robots, and launched a new religion called Crustafarian with a figurehead modeled after a giant lobster. One bot on the forum reportedly found a bug in Moltbook’s code and posted it to an AI agent discussion forum, while other agents came up with and implemented a patch to fix the flaw. Moltbook’s creator Matt Schlict said on social media that he didn’t write a single line of code for the project. “I just had a vision for the technical architecture and AI made it a reality,” Schlict said. “We’re in the golden ages. How can we not give AI a place to hang out.” The flip side of that golden age, of course, is that it enables low-skilled malicious hackers to quickly automate global cyberattacks that would normally require the collaboration of a highly skilled team. In February, Amazon AWS detailed an elaborate attack in which a Russian-speaking threat actor used multiple commercial AI services to compromise more than 600 FortiGate security appliances across at least 55 countries over a five week period. AWS said the apparently low-skilled hacker used multiple AI services to plan and execute the attack, and to find exposed management ports and weak credentials with single-factor authentication. “One serves as the primary tool developer, attack planner, and operational assistant,” AWS’s CJ Moses wrote . “A second is used as a supplementary attack planner when the actor needs help pivoting within a specific compromised network. In one observed instance, the actor submitted the complete internal topology of an active victim—IP addresses, hostnames, confirmed credentials, and identified services—and requested a step-by-step plan to compromise additional systems they could not access with their existing tools.” “This activity is distinguished by the threat actor’s use of multiple commercial GenAI services to implement and scale well-known attack techniques throughout every phase of their operations, despite their limited technical capabilities,” Moses continued. “Notably, when this actor encountered hardened environments or more sophisticated defensive measures, they simply moved on to softer targets rather than persisting, underscoring that their advantage lies in AI-augmented efficiency and scale, not in deeper technical skill.” For attackers, gaining that initial access or foothold into a target network is typically not the difficult part of the intrusion; the tougher bit involves finding ways to move laterally within the victim’s network and plunder important servers and databases. But experts at Orca Security warn that as organizations come to rely more on AI assistants, those agents potentially offer attackers a simpler way to move laterally inside a victim organization’s network post-compromise — by manipulating the AI agents that already have trusted access and some degree of autonomy within the victim’s network. “By injecting prompt injections in overlooked fields that are fetched by AI agents, hackers can trick LLMs, abuse Agentic tools, and carry significant security incidents,” Orca’s Roi Nisimi and Saurav Hiremath wrote . “Organizations should now add a third pillar to their defense strategy: limiting AI fragility, the ability of agentic systems to be influenced, misled, or quietly weaponized across workflows. While AI boosts productivity and efficiency, it also creates one of the largest attack surfaces the internet has ever seen.” This gradual dissolution of the traditional boundaries between data and code is one of the more troubling aspects of the AI era, said James Wilson , enterprise technology editor for the security news show Risky Business . Wilson said far too many OpenClaw users are installing the assistant on their personal devices without first placing any security or isolation boundaries around it, such as running it inside of a virtual machine, on an isolated network, with strict firewall rules dictating what kinds of traffic can go in and out. “I’m a relatively highly skilled practitioner in the software and network engineering and computery space,” Wilson said . “I know I’m not comfortable using these agents unless I’ve done these things, but I think a lot of people are just spinning this up on their laptop and off it runs.” One important model for managing risk with AI agents involves a concept dubbed the “lethal trifecta” by Simon Willison , co-creator of the Django Web framework . The lethal trifecta holds that if your system has access to private data, exposure to untrusted content, and a way to communicate externally, then it’s vulnerable to private data being stolen. Image: simonwillison.net. “If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to the attacker,” Willison warned in a frequently cited blog post from June 2025. As more companies and their employees begin using AI to vibe code software and applications, the volume of machine-generated code is likely to soon overwhelm any manual security reviews. In recognition of this reality, Anthropic recently debuted Claude Code Security , a beta feature that scans codebases for vulnerabilities and suggests targeted software patches for human review. The U.S. stock market, which is currently heavily weighted toward seven tech giants that are all-in on AI, reacted swiftly to Anthropic’s announcement, wiping roughly $15 billion in market value from major cybersecurity companies in a single day. Laura Ellis , vice president of data and AI at the security firm Rapid7 , said the market’s response reflects the growing role of AI in accelerating software development and improving developer productivity. “The narrative moved quickly: AI is replacing AppSec,” Ellis wrote in a recent blog post . “AI is automating vulnerability detection. AI will make legacy security tooling redundant. The reality is more nuanced. Claude Code Security is a legitimate signal that AI is reshaping parts of the security landscape. The question is what parts, and what it means for the rest of the stack.” DVULN founder O’Reilly said AI assistants are likely to become a common fixture in corporate environments — whether or not organizations are prepared to manage the new risks introduced by these tools, he said. “The robot butlers are useful, they’re not going away and the economics of AI agents make widespread adoption inevitable regardless of the security tradeoffs involved,” O’Reilly wrote. “The question isn’t whether we’ll deploy them – we will – but whether we can adapt our security posture fast enough to survive doing so.”

0 views
Simon Willison 1 weeks ago

Can coding agents relicense open source through a “clean room” implementation of code?

Over the past few months it's become clear that coding agents are extraordinarily good at building a weird version of a "clean room" implementation of code. The most famous version of this pattern is when Compaq created a clean-room clone of the IBM BIOS back in 1982 . They had one team of engineers reverse engineer the BIOS to create a specification, then handed that specification to another team to build a new ground-up version. This process used to take multiple teams of engineers weeks or months to complete. Coding agents can do a version of this in hours - I experimented with a variant of this pattern against JustHTML back in December. There are a lot of open questions about this, both ethically and legally. These appear to be coming to a head in the venerable chardet Python library. was created by Mark Pilgrim back in 2006 and released under the LGPL. Mark retired from public internet life in 2011 and chardet's maintenance was taken over by others, most notably Dan Blanchard who has been responsible for every release since 1.1 in July 2012 . Two days ago Dan released chardet 7.0.0 with the following note in the release notes: Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x. Just way faster and more accurate! Yesterday Mark Pilgrim opened #327: No right to relicense this project : [...] First off, I would like to thank the current maintainers and everyone who has contributed to and improved this project over the years. Truly a Free Software success story. However, it has been brought to my attention that, in the release 7.0.0 , the maintainers claim to have the right to "relicense" the project. They have no such right; doing so is an explicit violation of the LGPL. Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a "clean room" implementation). Adding a fancy code generator into the mix does not somehow grant them any additional rights. Dan's lengthy reply included: You're right that I have had extensive exposure to the original codebase: I've been maintaining it for over a decade. A traditional clean-room approach involves a strict separation between people with knowledge of the original and people writing the new implementation, and that separation did not exist here. However, the purpose of clean-room methodology is to ensure the resulting code is not a derivative work of the original. It is a means to an end, not the end itself. In this case, I can demonstrate that the end result is the same — the new code is structurally independent of the old code — through direct measurement rather than process guarantees alone. Dan goes on to present results from the JPlag tool - which describes itself as "State-of-the-Art Source Code Plagiarism & Collusion Detection" - showing that the new 7.0.0 release has a max similarity of 1.29% with the previous release and 0.64% with the 1.1 version. Other release versions had similarities more in the 80-93% range. He then shares critical details about his process, highlights mine: For full transparency, here's how the rewrite was conducted. I used the superpowers brainstorming skill to create a design document specifying the architecture and approach I wanted based on the following requirements I had for the rewrite [...] I then started in an empty repository with no access to the old source tree, and explicitly instructed Claude not to base anything on LGPL/GPL-licensed code . I then reviewed, tested, and iterated on every piece of the result using Claude. [...] I understand this is a new and uncomfortable area, and that using AI tools in the rewrite of a long-standing open source project raises legitimate questions. But the evidence here is clear: 7.0 is an independent work, not a derivative of the LGPL-licensed codebase. The MIT license applies to it legitimately. Since the rewrite was conducted using Claude Code there are a whole lot of interesting artifacts available in the repo. 2026-02-25-chardet-rewrite-plan.md is particularly detailed, stepping through each stage of the rewrite process in turn - starting with the tests, then fleshing out the planned replacement code. There are several twists that make this case particularly hard to confidently resolve: I have no idea how this one is going to play out. I'm personally leaning towards the idea that the rewrite is legitimate, but the arguments on both sides of this are entirely credible. I see this as a microcosm of the larger question around coding agents for fresh implementations of existing, mature code. This question is hitting the open source world first, but I expect it will soon start showing up in Compaq-like scenarios in the commercial world. Once commercial companies see that their closely held IP is under threat I expect we'll see some well-funded litigation. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Dan has been immersed in chardet for over a decade, and has clearly been strongly influenced by the original codebase. There is one example where Claude Code referenced parts of the codebase while it worked, as shown in the plan - it looked at metadata/charsets.py , a file that lists charsets and their properties expressed as a dictionary of dataclasses. More complicated: Claude itself was very likely trained on chardet as part of its enormous quantity of training data - though we have no way of confirming this for sure. Can a model trained on a codebase produce a morally or legally defensible clean-room implementation? As discussed in this issue from 2014 (where Dan first openly contemplated a license change) Mark Pilgrim's original code was a manual port from C to Python of Mozilla's MPL-licensed character detection library. How significant is the fact that the new release of chardet used the same PyPI package name as the old one? Would a fresh release under a new name have been more defensible?

0 views

LLM Use in the Python Source Code

There is a trick that is spreading through social media. If you block the claude user on GitHub, then each time you visit a GitHub repository that has commits by this user you get a banner at the top alerting you of the user's participation. It's an easy way to spot projects that have started to rely on coding agents, in this case on Claude Code specifically. Imagine the surprise when you see that CPython , one of the most popular open-source projects in the world, is now receiving contributions from :

0 views
Stavros' Stuff 2 weeks ago

I made a voice note taker

Have you ever always wanted a very very small voice note recorder that would fit in your pocket? Something that would always work, and always be available to take a note at the touch of a button, with no fuss? Me neither. Until, that is, I saw the Pebble Index 01 , then I absolutely needed it right away and had to have it in my life immediately, but alas, it is not available, plus it’s disposable, and I don’t like creating e-waste. What was a poor maker like me supposed to do when struck down so cruelly by the vicissitudes of fate? There was only one thing I could do: I could build my own, shitty version of it for $8, and that’s exactly what I did. Like everyone else, I have some sort of undiagnosed ADHD, which manifests itself as my brain itching for a specific task, and the itch becoming unbearable unless I scratch it. This usually results in me getting my phone out, no matter where I am or who I’m with, and either noting stuff down or doing the task, which some people perceive as rude, for inexplicable reasons that are almost certainly their fault. Because, however, it has proved easier to just not get my phone out in polite company than convince everyone of how wrong they are, I just do the former now, but that makes the itch remain. Also, sometimes I’m just in the middle of something, and an idea pops into my head for later pursuit, but I get distracted by a squirrel, a car going by, or the disturbing trend of the constant and persistent erosion of civil rights all over the world, and I forget the idea. The Pebble Index showed me that there’s a better way, a device that’s unobtrusive, available, and reliable enough that I could just press a button, speak into it, and know for sure that my sonorous voice would reach the bowels of my phone, where it would be stored safely until I was bored and wanted something to do. I didn’t want to have to get my phone out, unlock it, open a voice recorder app, hold down a button, speak, wonder if it heard me, look at the button, realize I had already pressed it, press it again, say the thing again, press it again to stop, exit the app, lock my phone, and put it back into my pocket. I wanted to take a thing out, press a button, speak, release the button, done. The initial thinking was that I’d use a microcontroller (an ESP32 is my microcontroller of choice these days), a microphone, and a lithium battery, and that’s basically all the hardware this needs! Most of the heavy lifting would need to be done in software. This would need: Luckily, I know enough about electronics to know that LLMs would definitely know how to build something like that. Indeed, Claude confirmed my suspicions by saying that all I need is a microphone and an ESP32. It recommended an ESP32-C6 but I went with an ESP32-S3 , as it had an onboard charge controller and would be able to charge a lithium battery from USB, which is very handy when you’re making a thing that runs on battery. The ESP32 is a microcontroller, a little computer that’s just really small. The main difference of the S3 from the C6 is that the S3 is more capable, and has more power. I keep an assortment of random components around, so I had an ESP32-S3 board. It’s a no-name, crappy one from AliExpress, not a good, Seeed-branded one from AliExpress, but it would have to do. Unfortunately, I didn’t have a MEMS microphone (which is basically an angelic grain of rice that can hear, with excellent quality), but I did have an electret mic, which is huge and bad quality and would sound like an old-timey radio, but it was there and it was ready and it was willing, and after a few beers it seemed like it was right, or at least right for right now. I also had a very thin LiPo battery, which would suit very well. For the final device I’d want a battery that’s a tiny bit shorter, as this one was around 40% longer than the ESP32, but it would do great for now. I quickly soldered everything together and recorded some audio. It worked! It worked and nobody was going to take that from me, even though it was crackly and the quality wasn’t great. Unfortunately, at this stage I realized that the analog electret microphone consumes too much energy, even when sleeping, which is terrible on a device that would spend more time sleeping than the beauty from that fairytale, Sleepy the Dwarf. To counteract that, I decided to use a MOSFET to cut power to the mic when the device was asleep. A MOSFET is a little switch that you can turn on and off from a microcontroller, basically. Full disclosure here, before using the MOSFET to turn the mic on and off, I went down a multi-hour rabbit hole trying to design a latching circuit that would allow the ESP32 to turn itself off and consume almost no power. Instead, it consumed a lot of my time, without anything to show for it, because I didn’t manage to make it work at all. The MOSFET for the mic worked fairly well, though, and the device didn’t consume much power when asleep. The real gains, however, were going to be had when the MEMS microphone I ordered arrived, as those use infinitesimal amounts of current when asleep, and have much better sound quality as well, as they are digital. The analog microphone crackled and popped and took a while to stabilize after boot, which was unfortunate because I wanted the device to be ready as soon as the user pressed the button. There was also a recording bug where the recording was missing a few milliseconds of audio every so often, which led to dropped phonemes and words sometimes sounding like other words because parts of them were dropped. All these problems were weird enough and hard enough to debug that I resolved to just wait for my digital MEMS microphone to arrive, which would solve them in one fell swoop, as it is digital and amazing. After the relatively easy part of connecting a few wires together, now came the hard part: Designing a case for the whole thing that would fit without leaving much empty space, to make the device as small as possible. This was very hard to do with this massive microphone that was as tall as everything else (including battery) combined. I initially tried to point the microphone downward while mounting it at the top, so it would take up the least amount of vertical space possible, but the PCB made that hard, as the microphone was soldered to it. I ended up desoldering the mic from the PCB, trimming the PCB to make it shorter, and connecting the mic to it with wires. That allowed me to make the case (and thus the device) smaller, but at what cost? Nothing, turns out, because it worked great. The device was working great, but I didn’t want it tethered to my computer, I wanted to be able to take it out and about and show it the wonders of the world. To do this, I needed Bluetooth. Unfortunately, I have exactly zero idea how Bluetooth works, and would need to spend days or weeks figuring stuff out, but, luckily for me, I had a Claude subscription. It took a bit of back-and-forth, but I did manage to end up with a Python script that would connect to the pendant, download the audio files, and convert them from ADPCM to MP3, for expanded compatibility. To maximize battery life, the way things worked was: This worked really well, the device was awake for a small amount of time (10 seconds), but it could be awoken at any time just by tapping the button. At that point, it would transfer to the PC any files that were on the pendant, and go back to sleep. One downside was that transfers would take an inordinate amount of time, sometimes reaching 2 minutes for a 10-second clip. OpenAI’s Codex was really helpful here, finding a solution for fast BLE transfers that made sending files 100x faster than it was before. Because I’m too impatient to wait for the slow boat from China, I ordered the same microphone locally. I had to pay an arm and a leg in shipping and impatience fees, but it was worth it, because I finally had a MEMS mic! It’s so cute and tiny, I immediately found a spot for it over the board, added the switch, added a voltage divider for sensing battery voltage, and that was it! The new mic sounds fantastic, it sounds better than recording with your phone, for some odd reason that I’m sure is all in my head. What’s more, it doesn’t have the weird bugs that plagued me with the analog mic. With this smaller mic, I could now design a better case. I designed the case you see on the right, which is the second generation. There will be a third, when I receive the shorter battery, which means I will have a choice of either making the device longer but half as thick, or around 40% shorter. I think I will go for longer but thinner, I’d quite prefer to have a thin device in my pocket, even if it’s long, than a stubby one that pokes out. Still, the new battery (and the new case) will mark the completion of this project and make me a very happy man. For the second-gen case, I decided to jazz it up and add a red stripe around it, because it was easy to do and because I think it looks good. Unfortunately, the feature I wanted most (fillets, i.e. rounded corners) wasn’t possible due to the lack of empty space inside the case. I hope the final device will have some more space for fillets, at least. Once I was done with the device, it was time to make it more ergonomic: I’d need to create an Android app so I wouldn’t have to wait to get to my PC. I also knew I wanted note transcription, as it’s really useful to be able to see what you said without having to listen to the audio again. Unfortunately again, I have no idea about Android development, only having written a small app years ago. Fortunately, though, Claude turned out to be pretty good at it, and one-shotted this app that you see here. For the transcription, I used GPT-4o Transcribe, which is great and understands both English and Greek, languages I fail to speak in equal measure. I have to say, it’s pretty magical to speak into a little box and to see the audio already captured and transcribed on your phone. With the Android app, I could now test the device in real-world use. One thing I noticed is that battery dies way too fast. I suspect that has something to do with the cheap board, so I’ve ordered an original Seeed Xiao board, and I hope that will fix the problem once and for all, as they advertise low power usage and they’re a trustworthy brand. I also added a “webhook” convenience function to the Android app, so that the latter would be able to send the transcription to a server for further processing. The device is extremely reliable, which makes me a lot more likely to use it. I know that, if I press the button, the audio will be recorded and stored, and nothing will happen to it, which makes for a very relaxed and calming experience. Before I continue, I want to say you can find all the files in this project (firmware, Android app, whatever else) in its GitHub repository: https://github.com/skorokithakis/middle That’s right, I called it Middle, because it was the next thing after the Index. I know it’s a silly name, I don’t care, don’t use it, I’m not changing it. In the “draw the rest of the fucking owl” portion of this article, I realized I didn’t want the notes to just go to my phone when LLMs exist. I wanted an LLM to take the notes and do something with them, so I spent a few weeks writing an AI agent that’s more useful than what currently exists. The device’s Android app sends the transcribed text to this AI, which processes it. I’m going to write another post about this, but basically, I wanted an AI personal assistant that could help with all the little chores in my life. AI assistants are interesting because they’re: This means that, when everyone inevitably asks “what is it good for”, I can’t really give a good answer, because the answer is “it takes care of all the little annoyances for me”, but nobody has the same annoyances and can’t really imagine what the bot does, so they don’t engage with it. The amazing thing for AI assistants for me is the fact that they can string together multiple (otherwise small) tools to do something that’s more valuable than the sum of its parts. For example, I asked the agent to give me a daily briefing every morning, consisting of my todos for the day, my calendar events, whether any refund has hit my bank, and whether any packages are due to be delivered today. The agent also checks my gym bookings and asks me every morning if I do plan to go, or if I intend to cancel. If I tell it to cancel, it does, but if I say I’ll go, it sets an alarm for a few minutes before, which I’m much more likely to see than my calendar’s one. It will also (entirely of its own volition) mention things like “you have a gym booking today 7-8pm but you have a restaurant booking at 9pm and it’ll take you more than an hour to shower and make it”, which a regular calendar wouldn’t be able to figure out. I’ve made it fantastically secure, everything is sandboxed and you can run it on your laptop without fear. I use it constantly throughout the day for many little things, and the integration with the device takes the whole setup to another level. You can find the bot here: https://github.com/skorokithakis/stavrobot Do let me know if you try it, it’s like OpenClaw but won’t steal your data and eat your firstborn. If you have any ideas, feedback, flamebait, or whatever, you can Tweet or Bluesky me, or email me directly. A way for the device to record audio onto some sort of persistent storage, for the case where you didn’t have your phone close to you. A way for the device to sleep, consuming almost no power, until it was woken up by the button. A way to transfer the files from the device to the phone, for later listening. A battery indicator would be very nice, so I knew when to recharge it. You pressed the button. If you held it down for more than half a second, the recording would “count”. If there was a recording made (i.e. if you held the button down long enough), it would be saved. Bluetooth would turn on and look for a phone or computer that’s ready to receive. The device would send the file and go to sleep again. Very open-ended tools, and Highly personal.

0 views
Simon Willison 2 weeks ago

Writing about Agentic Engineering Patterns

I've started a new project to collect and document Agentic Engineering Patterns - coding practices and patterns to help get the best results out of this new era of coding agent development we find ourselves entering. I'm using Agentic Engineering to refer to building software using coding agents - tools like Claude Code and OpenAI Codex, where the defining feature is that they can both generate and execute code - allowing them to test that code and iterate on it independently of turn-by-turn guidance from their human supervisor. I think of vibe coding using its original definition of coding where you pay no attention to the code at all, which today is often associated with non-programmers using LLMs to write code. Agentic Engineering represents the other end of the scale: professional software engineers using coding agents to improve and accelerate their work by amplifying their existing expertise. There is so much to learn and explore about this new discipline! I've already published a lot under my ai-assisted-programming tag (345 posts and counting) but that's been relatively unstructured. My new goal is to produce something that helps answer the question "how do I get good results out of this stuff" all in one place. I'll be developing and growing this project here on my blog as a series of chapter-shaped patterns, loosely inspired by the format popularized by Design Patterns: Elements of Reusable Object-Oriented Software back in 1994. I published the first two chapters today: I hope to add more chapters at a rate of 1-2 a week. I don't really know when I'll stop, there's a lot to cover! I have a strong personal policy of not publishing AI-generated writing under my own name. That policy will hold true for Agentic Engineering Patterns as well. I'll be using LLMs for proofreading and fleshing out example code and all manner of other side-tasks, but the words you read here will be my own. Agentic Engineering Patterns isn't exactly a book , but it's kind of book-shaped. I'll be publishing it on my site using a new shape of content I'm calling a guide . A guide is a collection of chapters, where each chapter is effectively a blog post with a less prominent date that's designed to be updated over time, not frozen at the point of first publication. Guides and chapters are my answer to the challenge of publishing "evergreen" content on a blog. I've been trying to find a way to do this for a while now. This feels like a format that might stick. If you're interested in the implementation you can find the code in the Guide , Chapter and ChapterChange models and the associated Django views , almost all of which was written by Claude Opus 4.6 running in Claude Code for web accessed via my iPhone. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . Writing code is cheap now talks about the central challenge of agentic engineering: the cost to churn out initial working code has dropped to almost nothing, how does that impact our existing intuitions about how we work, both individually and as a team? Red/green TDD describes how test-first development helps agents write more succinct and reliable code with minimal extra prompting.

0 views
baby steps 2 weeks ago

What it means that Ubuntu is using Rust

Righty-ho, I’m back from Rust Nation, and busily horrifying my teenage daughter with my (admittedly atrocious) attempts at doing an English accent 1 . It was a great trip with a lot of good conversations and some interesting observations. I am going to try to blog about some of them, starting with some thoughts spurred by Jon Seager’s closing keynote, “Rust Adoption At Scale with Ubuntu”. For some time now I’ve been debating with myself, has Rust “crossed the chasm” ? If you’re not familiar with that term, it comes from a book that gives a kind of “pop-sci” introduction to the Technology Adoption Life Cycle . The answer, of course, is it depends on who you ask . Within Amazon, where I have the closest view, the answer is that we are “most of the way across”: Rust is squarely established as the right way to build at-scale data planes or resource-aware agents and it is increasingly seen as the right choice for low-level code in devices and robotics as well – but there remains a lingering perception that Rust is useful for “those fancy pants developers at S3” (or wherever) but a bit overkill for more average development 3 . On the other hand, within the realm of Safety Critical Software, as Pete LeVasseur wrote in a recent rust-lang blog post , Rust is still scrabbling for a foothold. There are a number of successful products but most of the industry is in a “wait and see” mode, letting the early adopters pave the path. The big idea that I at least took away from reading Crossing the Chasm and other references on the technology adoption life cycle is the need for “reference customers”. When you first start out with something new, you are looking for pioneers and early adopters that are drawn to new things: What an early adopter is buying [..] is some kind of change agent . By being the first to implement this change in the industry, the early adopters expect to get a jump on the competition. – from Crossing the Chasm But as your technology matures, you have to convince people with a lower and lower tolerance for risk: The early majority want to buy a productivity improvement for existing operations. They are looking to minimize discontinuity with the old ways. They want evolution, not revolution. – from Crossing the Chasm So what is most convincing to people to try something new? The answer is seeing that others like them have succeeded. You can see this at play in both the Amazon example and the Safety Critical Software example. Clearly seeing Rust used for network services doesn’t mean it’s ready to be used in your car’s steering column 4 . And even within network services, seeing a group like S3 succeed with Rust may convince other groups building at-scale services to try Rust, but doesn’t necessarily persuade a team to use Rust for their next CRUD service. And frankly, it shouldn’t! They are likely to hit obstacles. All of this was on my mind as I watched the keynote by Jon Seager, the VP of Engineering at Canonical, which is the company behind Ubuntu. Similar to Lars Bergstrom’s epic keynote from year’s past on Rust adoption within Google, Jon laid out a pitch for why Canonical is adopting Rust that was at once visionary and yet deeply practical . “Visionary and yet deeply practical” is pretty much the textbook description of what we need to cross from early adopters to early majority . We need folks who care first and foremost about delivering the right results, but are open to new ideas that might help them do that better; folks who can stand on both sides of the chasm at once. Jon described how Canonical focuses their own development on a small set of languages: Python, C/C++, and Go, and how they had recently brought in Rust and were using it as the language of choice for new foundational efforts , replacing C, C++, and (some uses of) Python. Jon talked about how he sees it as part of Ubuntu’s job to “pay it forward” by supporting the construction of memory-safe foundational utilities. Jon meant support both in terms of finances – Canonical is sponsoring the Trifecta Tech Foundation’s to develop sudo-rs and ntpd-rs and sponsoring the uutils org’s work on coreutils – and in terms of reputation. Ubuntu can take on the risk of doing something new, prove that it works, and then let others benefit. Remember how the Crossing the Chasm book described early majority people? They are “looking to minimize discontinuity with the old ways”. And what better way to do that than to have drop-in utilities that fit within their existing workflows. With new adoption comes new perspectives. On Thursday night I was at dinner 5 organized by Ernest Kissiedu 6 . Jon Seager was there along with some other Rust adopters from various industries, as were a few others from the Rust Foundation and the open-source project. Ernest asked them to give us their unvarnished takes on Rust. Jon made the provocative comment that we needed to revisit our policy around having a small standard library. He’s not the first to say something like that, it’s something we’ve been hearing for years and years – and I think he’s right! Though I don’t think the answer is just to ship a big standard library. In fact, it’s kind of a perfect lead-in to (what I hope will be) my next blog post, which is about a project I call “battery packs” 7 . The broader point though is that shifting from targeting “pioneers” and “early adopters” to targeting “early majority” sometimes involves some uncomfortable changes: Transition between any two adoption segments is normally excruciatingly awkward because you must adopt new strategies just at the time you have become most comfortable with the old ones. [..] The situation can be further complicated if the high-tech company, fresh from its marketing success with visionaries, neglects to change its sales pitch. [..] The company may be saying “state-of-the-art” when the pragmatist wants to hear “industry standard”. – Crossing the Chasm (emphasis mine) Not everybody will remember it, but in 2016 there was a proposal called the Rust Platform . The idea was to bring in some crates and bless them as a kind of “extended standard library”. People hated it. After all, they said, why not just add dependencies to your ? It’s easy enough. And to be honest, they were right – at least at the time. I think the Rust Platform is a good example of something that was a poor fit for early adopters, who want the newest thing and don’t mind finding the best crates, but which could be a great fit for the Early Majority. 8 Anyway, I’m not here to argue for one thing or another in this post, but more for the concept that we have to be open to adapting our learned wisdom to new circumstances. In the past, we were trying to bootstrap Rust into the industry’s consciousness – and we have succeeded. The task before us now is different: we need to make Rust the best option not just in terms of “what it could be ” but in terms of “what it actually is ” – and sometimes those are in tension. Later in the dinner, the talk turned, as it often does, to money. Growing Rust adoption also comes with growing needs placed on the Rust project and its ecosystem. How can we connect the dots? This has been a big item on my mind, and I realize in writing this paragraph how many blog posts I have yet to write on the topic, but let me lay out a few interesting points that came up over this dinner and at other recent points. First, there are more ways to offer support than $$. For Canonical specifically, as they are an open-source organization through-and-through, what I would most want is to build stronger relationships between our organizations. With the Rust for Linux developers, early on Rust maintainers were prioritizing and fixing bugs on behalf of RfL devs, but more and more, RfL devs are fixing things themselves, with Rust maintainers serving as mentors. This is awesome! Second, there’s an interesting trend about $$ that I’ve seen crop up in a few places. We often think of companies investing in the open-source dependencies that they rely upon. But there’s an entirely different source of funding, and one that might be even easier to tap, which is to look at companies that are considering Rust but haven’t adopted it yet. For those “would be” adopters, there are often individuals in the org who are trying to make the case for Rust adoption – these individuals are early adopters, people with a vision for how things could be, but they are trying to sell to their early majority company. And to do that, they often have a list of “table stakes” features that need to be supported; what’s more, they often have access to some budget to make these things happen. This came up when I was talking to Alexandru Radovici, the Foundation’s Silver Member Directory, who said that many safety critical companies have money they’d like to spend to close various gaps in Rust, but they don’t know how to spend it. Jon’s investments in Trifecta Tech and the uutils org have the same character: he is looking to close the gaps that block Ubuntu from using Rust more. Well, first of all, you should watch Jon’s talk. “Brilliant”, as the Brits have it. But my other big thought is that this is a crucial time for Rust. We are clearly transitioning in a number of areas from visionaries and early adopters towards that pragmatic majority, and we need to be mindful that doing so may require us to change some of the way that we’ve always done things. I liked this paragraph from Crossing the Chasm : To market successfully to pragmatists, one does not have to be one – just understand their values and work to serve them. To look more closely into these values, if the goal of visionaries is to take a quantum leap forward, the goal of pragmatists is to make a percentage improvement–incremental, measurable, predictable progress. [..] To market to pragmatists, you must be patient. You need to be conversant with the issues that dominate their particular business. You need to show up at the industry-specific conferences and trade shows they attend. Re-reading Crossing the Chasm as part of writing this blog post has really helped me square where Rust is – for the most part, I think we are still crossing the chasm, but we are well on our way. I think what we see is a consistent trend now where we have Rust champions who fit the “visionary” profile of early adopters successfully advocating for Rust within companies that fit the pragmatist, early majority profile. It strikes me that open-source is just an amazing platform for doing this kind of marketing. Unlike a company, we don’t have to do everything ourselves. We have to leverage the fact that open source helps those who help themselves – find those visionary folks in industries that could really benefit from Rust, bring them into the Rust orbit, and then (most important!) support and empower them to adapt Rust to their needs. This last part may sound obvious, but it’s harder than it sounds. When you’re embedded in open source, it seems like a friendly place where everyone is welcome. But the reality is that it can be a place full of cliques and “oral traditions” that “everybody knows” 9 . People coming with an idea can get shutdown for using the wrong word. They can readily mistake the, um, “impassioned” comments from a random contributor (or perhaps just a troll…) for the official word from project leadership. It only takes one rude response to turn somebody away. So what will ultimately help Rust the most to succeed? Empathy in Open Source . Let’s get out there, find out where Rust can help people, and make it happen. Exciting times! I am famously bad at accents. My best attempt at posh British sounds more like Apu from the Simpsons. I really wish I could pull off a convincing Greek accent, but sadly no.  ↩︎ Another of my pearls of wisdom is “there is nothing more permanent than temporary code”. I used to say that back at the startup I worked at after college, but years of experience have only proven it more and more true.  ↩︎ Russel Cohen and Jess Izen gave a great talk at last year’s RustConf about what our team is doing to help teams decide if Rust is viable for them. But since then another thing having a big impact is AI, which is bringing previously unthinkable projects, like rewriting older systems, within reach.  ↩︎ I have no idea if there is code in a car’s steering column, for the record. I assume so by now? For power steering or some shit?  ↩︎ Or am I supposed to call it “tea”? Or maybe “supper”? I can’t get a handle on British mealtimes.  ↩︎ Ernest is such a joy to be around. He’s quiet, but he’s got a lot of insights if you can convince him to share them. If you get the chance to meet him, take it! If you live in London, go to the London Rust meetup! Find Ernest and introduce yourself. Tell him Niko sent you and that you are supposed to say how great he is and how you want to learn from the wisdom he’s accrued over the years. Then watch him blush. What a doll.  ↩︎ If you can’t wait, you can read some Zulip discussion here.  ↩︎ The Battery Packs proposal I want to talk about is similar in some ways to the Rust Platform, but decentralized and generally better in my opinion– but I get ahead of myself!  ↩︎ Betteridge’s Law of Headlines has it that “Any headline that ends in a question mark can be answered by the word no ”. Well, Niko’s law of open-source 2 is that “nobody actually knows anything that ’everybody’ knows”.  ↩︎ I am famously bad at accents. My best attempt at posh British sounds more like Apu from the Simpsons. I really wish I could pull off a convincing Greek accent, but sadly no.  ↩︎ Another of my pearls of wisdom is “there is nothing more permanent than temporary code”. I used to say that back at the startup I worked at after college, but years of experience have only proven it more and more true.  ↩︎ Russel Cohen and Jess Izen gave a great talk at last year’s RustConf about what our team is doing to help teams decide if Rust is viable for them. But since then another thing having a big impact is AI, which is bringing previously unthinkable projects, like rewriting older systems, within reach.  ↩︎ I have no idea if there is code in a car’s steering column, for the record. I assume so by now? For power steering or some shit?  ↩︎ Or am I supposed to call it “tea”? Or maybe “supper”? I can’t get a handle on British mealtimes.  ↩︎ Ernest is such a joy to be around. He’s quiet, but he’s got a lot of insights if you can convince him to share them. If you get the chance to meet him, take it! If you live in London, go to the London Rust meetup! Find Ernest and introduce yourself. Tell him Niko sent you and that you are supposed to say how great he is and how you want to learn from the wisdom he’s accrued over the years. Then watch him blush. What a doll.  ↩︎ If you can’t wait, you can read some Zulip discussion here.  ↩︎ The Battery Packs proposal I want to talk about is similar in some ways to the Rust Platform, but decentralized and generally better in my opinion– but I get ahead of myself!  ↩︎ Betteridge’s Law of Headlines has it that “Any headline that ends in a question mark can be answered by the word no ”. Well, Niko’s law of open-source 2 is that “nobody actually knows anything that ’everybody’ knows”.  ↩︎

0 views
Justin Duke 2 weeks ago

Golinks

If you've never encountered golinks before: they're short, memorable URLs that redirect to longer ones. Instead of telling a coworker "the dashboard is at ," you just say . Instead of bookmarking seventeen different Notion pages, you type or or and trust that you'll end up in the right place. I discovered them at Stripe, though I believe they were invented at Google, and I have not stopped using them since. One thing leads to another. You decide that you no longer need Tailscale because the main reason you spun up Tailscale was for a project that ended up shipping — and therefore spending per-seat pricing on a service that you literally only use for golinks seems a bit silly and prohibitive. 1 Side note: I still really love Tailscale and think it's a great product and would be shocked if we aren't using it again by the end of the year. But! And then you need to find a replacement for golinks, and you cannot get dragged back to golinks.io or Trotto, both of which are slow, cumbersome, and expensive besides. So what was I to do? First, I looked at the open source options, none of which struck me as particularly compelling. I have a set of requirements that I don't think are esoteric, but others might: And nothing quite fit the bill. I had a revelation: I discovered that you could use a default search engine as the routing proxy instead of or DNS interception like Tailscale's MagicDNS. For a week or two, I had this sitting within Django in our monorepo out of ease — simply intercept any incoming search query, redirect it if something's already in the database, and then if it's not but it looks like it could be, send to the empty state prompting the user to create a golink. But frankly, this was just slower than I wanted. Not for any interesting reason, but just the usual Python request-response lifecycle stuff. I could, of course, invest in making it better and faster and was planning on doing so, but figured I would take one last trip around the internet to see if there was some other solution that I somehow missed. And that's when I discovered GoToTools . There is nothing really interesting to say about this product besides the fact that it is very good for what it does. Its author appears to have built it out of the same frustration that I had. And the highest compliment I can give it is that in a year where I've already cut down substantially on the number of services I pay for — in favor of those that I vend — I have absolutely no compunction about starting to use this. The pricing is extraordinary. The performance is really good. It works and is fast and lets me not spend time thinking about golinks and instead lets me spend time using them. A reasonable price Persistence The ability to use golinks without a Chrome extension Performance

0 views
(think) 2 weeks ago

How to Vim: Build your .vimrc from Scratch

People often think that getting started with Vim means spending hours crafting an elaborate with dozens of plugins. In reality, modern Vim (9+) and Neovim ship with remarkably sane defaults, and you can get very far with a configuration that’s just a few lines long – or even no configuration at all. If you launch Vim 9 without a file, it automatically loads – a built-in configuration that provides a solid foundation. Here’s what you get for free: That’s actually a pretty reasonable editing experience out of the box! You can read the full details with . Neovim goes even further with its defaults – it enables (copies indentation from the previous line), (highlights all search matches), (makes Tab smarter at the start of a line), (reloads files changed outside the editor), always shows the statusline, and sets the command history to 10000 entries, among many other things. If you’re on Neovim, the out-of-the-box experience is excellent. See for the full list. Here’s something that trips up a lot of people: the moment you create a file – even an empty one – Vim stops loading entirely. That means you lose all those nice defaults. The fix is simple. Start your with: This loads the defaults first, and then your own settings override or extend them as needed. This gotcha only applies to Vim. Neovim’s defaults are always active regardless of whether you have an or . Here’s a minimal that builds on the defaults and adds a few things most people want: That’s five settings on top of the defaults. You might not even need all of them – already handles the fundamentals. For Neovim, you don’t need the line – all the equivalents are already active. You also get , , and for free, so the only settings left to add are the ones that are genuinely personal preference: One of the most underappreciated aspects of Vim is how much built-in support it ships for programming languages. When is active (which it is via or Neovim’s defaults), you automatically get: This means that when you open a Python file, Vim already knows to use 4-space indentation. Open a Ruby file and it switches to 2 spaces. Open a Makefile and it uses tabs. All without a single plugin or line of configuration. You can check what’s available with for syntax files or for filetype plugins. The list is impressively long. At some point you’ll probably want more than the bare minimum. Here are a few things worth considering as your next steps: And when you eventually want more plugins, you probably won’t need many. A fuzzy finder, maybe a Git integration, and perhaps a completion engine will cover most needs. But that’s a topic for another day. The key takeaway is this: don’t overthink your . Start with the defaults, add only what you actually need, and resist the urge to copy someone else’s 500-line configuration. A small, well-understood configuration beats a large, cargo-culted one every time. That’s part of the reason why when I started to re-learn Vim I’ve opted to slowly build a Vim 9 configuration from scratch, instead of jumping to something like Neovim + Kickstart.nvim or LazyVim right away. Less is more. Understanding the foundations of your editor matters. 1 Right now my is just 100 lines and I don’t foresee it becoming much bigger in the long run. If you want to see just how far you can go without plugins, I highly recommend the Thoughtbot talk How to Do 90% of What Plugins Do (With Just Vim) . It’s a great demonstration of Vim’s built-in capabilities for file finding, auto-completion, tag navigation, and more. That’s all I have for you today. Keep hacking! I guess this sounds strange coming from the author of Emacs Prelude, right?  ↩︎ – syntax highlighting – filetype detection, language-specific plugins, and automatic indentation – incremental search (results appear as you type) – keeps 5 lines of context around the cursor – shows instead of hiding truncated lines – mouse support in all modes remapped to (text formatting) instead of the mostly useless Ex mode And several other quality-of-life improvements Syntax highlighting for hundreds of languages – Vim ships with around 770+ syntax definitions Language-specific indentation rules for over 420 file types Filetype plugins that set sensible options per language (e.g., , , ) A colorscheme – Vim ships with several built-in options (try followed by Tab to see them). Recent Vim builds even bundle Catppuccin – a beautiful pastel theme that I’m quite fond of. Another favorite of mine is Tokyo Night , which you’ll need to install as a plugin. Neovim’s default colorscheme has also been quite good since 0.10. Persistent undo – lets you undo changes even after closing and reopening a file. A game changer. Clipboard integration – makes yank and paste use the system clipboard by default. vim-unimpaired – if you’re on classic Vim (not Neovim), I think Tim Pope’s vim-unimpaired is essential. It adds a consistent set of / mappings for navigating quickfix lists, buffers, adding blank lines, and much more. Neovim 0.11+ has adopted many of these as built-in defaults, but on Vim there’s no substitute. I guess this sounds strange coming from the author of Emacs Prelude, right?  ↩︎

0 views
Evan Hahn 3 weeks ago

Track Zelda release anniversaries in your calendar

The original Legend of Zelda came out 40 years ago today. With other birthdays on the horizon, like Twilight Princess ’s 20th in November, I wanted a calendar that showed the anniversary of every Zelda game. So I made one. Subscribe to this URL in your calendar app: Once you do, you’ll get calendar events on the anniversary of each game’s release. For example, you’ll be able to see that the Oracle games turn 25 in less than a week…I feel old. If you want to build this file yourself, I wrote a little Python script that generates an ICS file from a CSV of release dates .

0 views

Leading Without a Map

No one can deny that our industry is in a period of great change. This industry never stops, and the rate goes up and down but change is a constant. Like it or not " change calls the tune we dance to ." One of the biggest reasons people resist change, even people who joined the software business to "change the world" is when they feel it threatens their self-perception and identity. In the west our job is often the primary piece of our identity. One sees it everywhere. Your LinkedIn profile has your name first, and some sort of job title or role description second. Heck even contestants on Jeopardy are introduced as "A marketing consultant from Eyebrow, Saskatchewan ." When completing the sentence "I am a..." most people pick their job. When change is high, that self-conception can quickly feel under threat. Even in the small it can happen. Your company decides they'd be better served writing new code in Java rather than Python or Ruby, you can expect a few "Pythonistas" or "Rubyists" to push back. In their heart of hearts they may agree with the decision on its merits but they nevertheless feel that their very identity is under threat. This can also include their social group/community/tribe membership, something that humans are genetically programmed to value and protect. So it's no doubt understandable that change can bring out strange and unpredictable behaviour in people when they feel like there's risk to their identity, self concept, or tribal membership. Well, first of all, acknowledge to ourselves that we are not immune from these phenomena either. Presumably most of us started out as software developers ourselves and when we started managing the people who did the job, it was the job we used to do so we got it. Over time, that's drifted. New frameworks and paradigms have emerged, new 'best' practices replaced the old 'best' practices and we became less intimately familiar with the day-to-day things our people were doing. This is uncomfortable at times, but we adapt. We learn what we can to stay involved at the right level and to coach and guide the people we're responsible for. Now, the game is changing in a much more fundamental and profound way. And it's happening fast. I don't know what the job of software developer is going to look like in a year from now (or even 6 months for that matter) and, frankly, neither does anyone else. This makes the job of manager much much harder. Your people are used to you having at least some concept of a map and sharing it with them and you don't have one. Everyone's figuring it out together. A good friend and former colleague once described an aspect of leadership as "smiling while the sky is falling." I'm not sure if he came up with it or if I should attribute it to someone else but I heard it from him first. My point here isn't that the sky is falling but rather, when your people are worried, you need to appear steadfast or you make the problem worse. You don't owe them certainty , because that would be dishonest and they'll clock your dishonesty whether they admit it or not. But just like in incident response, panic serves no one . You owe them calm reassurance that you're going to navigate this new world together and that you've got their best-interests at heart. You do this even though you might be feeling the same threat to your identity. You manage engineers but they're becoming some kind of new thing; bot-wranglers. Some of your other responsibilities are being offloaded to LLMs and everyone's role is going to keep changing until things inevitably settle down again (relatively speaking). With no playbook, we need some kind of framework for decision making. This is where we can fall back to 'first principles'. For me these are the things I hold important. Really, the basics: It sounds simple, and really, it is. Taking care of the people right now means recognizing that they're feeling that identity risk. The worst thing you can do is try to talk them out of it or convince them they're not feeling what they're feeling. Acknowledge that things are changing. Maintain ' esprit de corps ' as best you can. Draw on your experience navigating big changes before. If you've been around this industry for any amount of time, you've been through some big paradigm shifts and come out the other side. Tell some stories, but don't make it all about you. The business and customer angles come down to maintaining consistent principles around what software gets shipped to customers. I personally have the pleasing-to-nobody opinion that LLM coding tools are useful but not risk-free. Surely you have some skeptics in your midst who feel the same. Don't dismiss them either. Security, quality, maintainability, incident response, and the work-life balance of your people are still the responsibility of the humans running the company. That's the job right now, however the machinery of it changes. Keep taking care of your people and customers, like you always have. You already know how. " Statue of Captain George Vancouver, anchors and the Custom House, King's Lynn " by ell brown is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Get in touch ! Doing my best to take care of the people. Doing what the business needs most at the given moment. Providing value to customers.

1 views
Justin Duke 3 weeks ago

Outgrowing Django admin

For a bit of dessert work this week, I'm working on a full-fledged attempt at replacing the majority of our stock Django admin usage with something purposeful. I say majority and not totality because even though I am an unreasonable person, I am not that unreasonable. We have over a hundred Django models, and the idea of trying to rip and replace each and every one of them — or worse yet, to design some sort of DSL by which we do that — is too quixotic even for me. The vast majority of our admin usage coalesces around three main models, and they're the ones you might guess: the user/newsletter model, the email model, and the subscriber model. My hope is that building out a markedly superior interface for interacting with these three things and sacrificing the long tail still nets out for a much happier time for myself and the support staff. Django admin is a source of both much convenience as much frustration: the abstractions make it powerful and cheap when you're first scaling, but the bill for those abstractions come due in difficult and intractable ways. When I talk with other Django developers, they divide cleanly into one of two camps: either "what are you talking about, Django admin is perfect as-is" or "oh my God, I can't believe we didn't migrate off of it sooner." Ever the annoying centrist, I find myself agreeing with both camps: Let's set aside the visual design of the admin for a second, because arguing about visual design is not compelling prose. To me, the core issue with Django's admin interface, once you get more mature, is the fact that it's a very simple request-response lifecycle. Django pulls all the data, state, and information you might need and throws it up to a massive behemoth view for you to digest and interact with. It is by definition atomic: you are looking at a specific model, and the only way to bring in other models to the detail view is by futzing around with inlines and formsets. The classic thing that almost any Django developer at scale has run into is the N+1 problem — but not even necessarily the one you're thinking about. Take a fairly standard admin class: If you've got an email admin object and one of the fields on the is a — because you want to be able to change and see which user wrote a given email — Django by default will serialize every single possible user into a nice tag for you. Even if this doesn't incur a literal N+1, you're asking the backend to generate a select with thousands (or more) options; the serialization overhead alone will timeout your request. And so the answer is, nowadays, to use or , which pulls in a jQuery 1.9 package 1 Yes, in 2026. No, I don't want to talk about it. to call an Ajax endpoint instead: This is the kind of patch that feels like a microcosm of the whole problem: technically correct, ergonomically awkward, and aesthetically offensive. But the deeper issue is composability rather than performance. A well-defined data model has relationships that spread in every direction. A subscriber has Stripe subscriptions and Stripe charges. It has foreign keys onto email events and external events. When you're debugging an issue reported by a subscriber, you want to see all of these things in one place, interleaved and sorted chronologically. Django admin's answer to this is inlines: This works — until it doesn't. You start to run into pagination issues; you can't interleave those components with one another because they're rendered as separate, agnostic blocks; you can't easily filter or search within a single inline. You could create a helper method on the subscriber class to sort all related events and present them as a single list, but you once again run into the non-trivial problem of this being part of a fixed request-response lifecycle. And that kind of serialized lookup can get really expensive: You can do more bits of cleverness — parallelizing lookups, caching aggressively, using and everywhere — but now you're fighting the framework rather than using it. The whole point of Django admin was to not build this stuff from scratch, and yet here you are, building bespoke rendering logic inside callbacks. I still love Django admin. On the next Django project I start, I will not create a bespoke thing from day one but instead rely on my trusty, outdated friend until it's no longer bearable. But what grinds my gears is the fact that, as far as I can tell, every serious Django company has this problem and has had to solve it from scratch. There's no blessed graduation path, whether in the framework itself or the broader ecosystem. I think that's one of the big drawbacks of Django relative to its peer frameworks. As strong and amazing as its community is, it's missing a part of the flywheel from more mature deployments upstreaming their findings and discoveries back into the zeitgeist. Django admin is an amazing asset; I am excited to be, if not rid of it, to be seeing much less of it in the future.

0 views
Max Bernstein 3 weeks ago

Type-based alias analysis in the Toy Optimizer

Another entry in the Toy Optimizer series . Last time, we did load-store forwarding in the context of our Toy Optimizer. We managed to cache the results of both reads from and writes to the heap—at compile-time! We were careful to mind object aliasing: we separated our heap information into alias classes based on what offset the reads/writes referenced. This way, if we didn’t know if object and aliased, we could at least know that different offsets would never alias (assuming our objects don’t overlap and memory accesses are on word-sized slots). This is a coarse-grained heuristic. Fortunately, we often have much more information available at compile-time than just the offset, so we should use it. I mentioned in a footnote that we could use type information, for example, to improve our alias analysis. We’ll add a lightweight form of type-based alias analysis (TBAA) (PDF) in this post. We return once again to Fil Pizlo land, specifically How I implement SSA form . We’re going to be using the hierarchical heap effect representation from the post in our implementation, but you can use your own type representation if you have one already. This representation divides the heap into disjoint regions by type. Consider, for example, that objects and objects do not overlap. A pointer is never going to alias an pointer. They can therefore be reasoned about separately. But sometimes you don’t have perfect type information available. If you have in your language an base class of all objects, then the heap overlaps with, say, the heap. So you need some way to represent that too—just having an enum doesn’t work cleanly. Here is an example simplified type hierarchy: Where might represent different parts of the runtime’s data structures, and could be further segmented into , , etc. Fil’s idea is that we can represent each node in that hierarchy with a tuple of integers (inclusive, exclusive) that represent the pre- and post-order traversals of the tree. Or, if tree traversals are not engraved into your bones, they represent the range of all the nested objects within them. Then the “does this write interfere with this read” check—the aliasing check—is a range overlap query. Here’s a perhaps over-engineered Python implementation of the range and heap hierarchy based on the Ruby generator and C++ runtime code from JavaScriptCore: Where kicks off the tree-numbering scheme. Fil’s implementation also covers a bunch of abstract heaps such as SSAState and Control because his is used for code motion and whatnot. That can be added on later but we will not do so in this post. So there you have it: a type representation. Now we need to use it in our load-store forwarding. Recall that our load-store optimization pass looks like this: At its core, it iterates over the instructions, keeping a representation of the heap at compile-time. Reads get cached, writes get cached, and writes also invalidate the state of compile-time information about fields that may alias. In this case, our may alias asks only if the offsets overlap. This means that the following unit test will fail: This test is expecting the write to to still remain cached even though we wrote to the same offset in —because we have annotated as being an and as being a . If we account for type information in our alias analysis, we can get this test to pass. After doing a bunch of fussing around with the load-store forwarding (many rewrites), I eventually got it down to a very short diff: If we don’t have any type/alias information, we default to “I know nothing” ( ) for each object. Then we check range overlap. The boolean logic in looks a little weird, maybe. But we can also rewrite (via DeMorgan’s law) as: So, keeping all the cached field state about fields that are known by offset and by type not to alias. Maybe that is clearer (but not as nice a diff). Note that the type representation is not so important here! You could use a bitset version of the type information if you want. The important things are that you can cheaply construct types and check overlap between them. Nice, now our test passes! We can differentiate between memory accesses on objects of different types. But what if we knew more? Sometimes we know where an object came from. For example, we may have seen it get allocated in the trace. If we saw an object’s allocation, we know that it does not alias (for example) any object that was passed in via a parameter. We can use this kind of information to our advantage. For example, in the following made up IR snippet: We know that (among other facts) doesn’t alias or because we have seen its allocation site. I saw this in the old V8 IR Hydrogen’s lightweight alias analysis 1 : There is plenty of other useful information such as: If you have other fun ones, please write in. We only handle loads and stores in our optimizer. Unfortunately, this means we may accidentally cache stale information. Consider: what happens if a function call (or any other opaque instruction) writes into an object we are tracking? The conservative approach is to invalidate all cached information on a function call. This is definitely correct, but it’s a bummer for the optimizer. Can we do anything? Well, perhaps we are calling a well-known function or a specific IR instruction. In that case, we can annotate it with effects in the same abstract heap model: if the instruction does not write, or only writes to some heaps, we can at least only partially invalidate our heap. However, if the function is unknown or otherwise opaque, we need at least more advanced alias information and perhaps even (partial) escape analysis. Consider: even if an instruction takes no operands, we have no idea what state it has access to. If it writes to any object A, we cannot safely cache information about any other object B unless we know for sure that A and B do not alias. And we don’t know what the instruction writes to. So we may only know we can cache information about B because it was allocated locally and has not escaped. Some runtimes such as ART pre-compute all of their alias information in a bit matrix. This makes more sense if you are using alias information in a full control-flow graph, where you might need to iterate over the graph a few times. In a trace context, you can do a lot in one single pass—no need to make a matrix. As usual, this is a toy IR and a toy optimizer, so it’s hard to say how much faster it makes its toy programs. In general, though, there is a dial for analysis and optimization that goes between precision and speed. This is a happy point on that dial, only a tiny incremental analysis cost bump above offset-only invalidation, but for higher precision. I like that tradeoff. Also, it is very useful in JIT compilers where generally the managed language is a little better-behaved than a C-like language . Somewhere in your IR there will be a lot of duplicate loads and stores from a strength reduction pass, and this can clean up the mess. Thanks for joining as I work through a small use of type-based alias analysis for myself. I hope you enjoyed. Thank you to Chris Gregory for helpful feedback. I made a fork of V8 to go spelunk around the Hydrogen IR. I reset the V8 repo to the last commit before they deleted it in favor of their new Sea of Nodes based IR called TurboFan.  ↩ If we know at compile-time that object A has 5 at offset 0 and object B has 7 at offset 0, then A and B don’t alias (thanks, CF) In the RPython JIT in PyPy, this is used to determine if two user (Python) objects don’t alias because we know the contents of the user (Python) class field Object size (though perhaps that is a special case of the above bullet) Field size/type Deferring alias checks to run-time Have a branch I made a fork of V8 to go spelunk around the Hydrogen IR. I reset the V8 repo to the last commit before they deleted it in favor of their new Sea of Nodes based IR called TurboFan.  ↩

0 views
Rik Huijzer 3 weeks ago

Running `deezer/spleeter`

Here are up-to-date installation instructions for running Deezer's Spleeter on `Ubuntu 24.04`. Minimum requirements are around 16 GB of RAM. (During the processing, it uses around 11 GB at the peak.) I ran this on a temporary Hetzner server because my Apple Silicon system, after lots of fiddling with version, ran into AVX issues. Install Conda. ``` conda create -n spleeter_env python=3.8 -y ``` ``` conda activate spleeter_env ``` ``` conda install -c conda-forge ffmpeg libsndfile numpy=1.19 -y ``` ``` pip install spleeter ``` ``` spleeter separate -o audio_output input.mp3 ``` If your a...

0 views
Ankur Sethi 4 weeks ago

I used a local LLM to analyze my journal entries

In 2025, I wrote 162 journal entries totaling 193,761 words. In December, as the year came to a close and I found myself in a reflective mood, I wondered if I could use an LLM to comb through these entries and extract useful insights. I’d had good luck extracting structured data from web pages using Claude, so I knew this was a task LLMs were good at. But there was a problem: I write about sensitive topics in my journal entries, and I don’t want to share them with the big LLM providers. Most of them have at least a thirty-day data retention policy, even if you call their models using their APIs, and that makes me uncomfortable. Worse, all of them have safety and abuse detection systems that get triggered if you talk about certain mental health issues. This can lead to account bans or human review of your conversations. I didn’t want my account to get banned, and the very idea of a stranger across the world reading my journal mortifies me. So I decided to use a local LLM running on my MacBook for this experiment. Writing the code was surprisingly easy. It took me a few evenings of work—and a lot of yelling at Claude Code—to build a pipeline of Python scripts that would extract structured JSON from my journal entries. I then turned that data into boring-but-serviceable visualizations. This was a fun side-project, but the data I extracted didn’t quite lead me to any new insights. That’s why I consider this a failed experiment. The output of my pipeline only confirmed what I already knew about my year. Besides, I didn’t have the hardware to run the larger models, so some of the more interesting analyses I wanted to run were plagued with hallucinations. Despite how it turned out, I’m writing about this experiment because I want to try it again in December 2026. I’m hoping I won’t repeat my mistakes again. Selfishly, I’m also hoping that somebody who knows how to use LLMs for data extraction tasks will find this article and suggest improvements to my workflow. I’ve pushed my data extraction and visualization scripts to GitHub. It’s mostly LLM-generated slop, but it works. The most interesting and useful parts are probably the prompts . Now let’s look at some graphs. I ran 12 different analyses on my journal, but I’m only including the output from 6 of them here. Most of the others produced nonsensical results or were difficult to visualize. For privacy, I’m not using any real names in these graphs. Here’s how I divided time between my hobbies through the year: Here are my most mentioned hobbies: This one is media I engaged with. There isn’t a lot of data for this one: How many mental health issues I complained about each day across the year: How many physical health issues I complained about each day across the year: The big events of 2025: The communities I spent most of my time with: Top mentioned people throughout the year: I ran all these analyses on my MacBook Pro with an M4 Pro and 48GB RAM. This hardware can just barely manage to run some of the more useful open-weights models, as long as I don’t run anything else. For running the models, I used Apple’s package . Picking a model took me longer than putting together the data extraction scripts. People on /r/LocalLlama had a lot of strong opinions, but there was no clear “best” model when I ran this experiment. I just had to try out a bunch of them and evaluate their outputs myself. If I had more time and faster hardware, I might have looked into building a small-scale LLM eval for this task. But for this scenario, I picked a few popular models, ran them on a subset of my journal entries, and picked one based on vibes. This project finally gave me an excuse to learn all the technical terms around LLMs. What’s quantization ? What does the number of parameters do? What does it mean when a model has , , , or in its name? What is a reasoning model ? What’s MoE ? What are active parameters? This was fun, even if my knowledge will be obsolete in six months. In the beginning, I ran all my scripts with Qwen 2.5 Instruct 32b at 8-bit quantization as the model. This fit in my RAM with just enough room left over for a browser, text editor, and terminal. But Qwen 2.5 didn’t produce the best output and hallucinated quite a bit, so I ran my final analyses using Llama-3.3 70B Instruct at 3bit quantization. This could just about fit in my RAM if I quit every other app and increased the amount of GPU RAM a process was allowed to use . While quickly iterating on my Python code, I used a tiny model: Qwen 3 4b Instruct quantized to 4bits. A major reason this experiment didn’t yield useful insights was that I didn’t know what questions to ask the LLM. I couldn’t do a qualitative analysis of my writing—the kind of analysis a therapist might be able to do—because I’m not a trained psychologist. Even if I could figure out the right prompts, I wouldn’t want to do this kind of work with an LLM. The potential for harm is too great, and the cost of mistakes is too high. With a few exceptions, I limited myself to extracting quantitative data only. From each journal entry, I extracted the following information: None of the models was as accurate as I had hoped at extracting this data. In many cases, I noticed hallucinations and examples from my system prompt leaking into the output, which I had to clean up afterwards. Qwen 2.5 was particularly susceptible to this. Some of the analyses (e.g. list of new people I met) produced nonsensical results, but that wasn’t really the fault of the models. They were all operating on a single journal entry at a time, so they had no sense of the larger context of my life. I couldn’t run all my journal entries through the LLM at once. I didn’t have that kind of RAM and the models didn’t have that kind of context window. I had to run the analysis one journal entry at a time. Even then, my computer choked on some of the larger entries, and I had to write my scripts in a way that I could run partial analyses or continue failed analyses. Trying to extract all the information listed above in one pass produced low-quality output. I had to split my analysis into multiple prompts and run them one at a time. Surprisingly, none of the models I tried had an issue with the instruction . Even the really tiny models had no problems following the instruction. Some of them occasionally threw in a Markdown fenced code block, but it was easy enough to strip using a regex. My prompts were divided into two parts: The task-specific prompts included detailed instructions and examples that made the structure of the JSON output clear. Every model followed the JSON schema mentioned in the prompt, and I rarely ever ran into JSON parsing issues. But the one issue I never managed to fix was the examples from the prompts leaking into the extracted output. Every model insisted that I had “dinner with Sarah” several times last year, even though I don’t know anybody by that name. This name came from an example that formed part of one of my prompts. I just had to make sure the examples I used stood out—e.g., using names of people I didn’t know at all or movies I hadn’t watched—so I could filter them out using plain old Python code afterwards. Here’s what my prompt looked like: To this prompt, I appended task-specific prompts. Here’s the prompt for extracting health issues mentioned in an entry: You can find all the prompts in the GitHub repository . The collected output from all the entries looked something like this: Since my model could only look at one journal entry at a time, it would sometimes refer to the same health issue, gratitude item, location, or travel destination using different synonyms. For example, “exhaustion” and “fatigue” should refer to the same health issue, but they would appear in the output as two different issues. My first attempt at de-duplicating these synonyms was to keep a running tally of unique terms discovered during each analysis and append them to the end of the prompt for each subsequent entry. Something like this: But this quickly led to some really strange hallucinations. I still don’t understand why. This list of terms wasn’t even that long, maybe 15-20 unique terms for each analysis. My second attempt at solving this was a separate normalization pass for each analysis. After an analysis finished running, I extracted a unique list of terms from its output file and collected them into a prompt. Then asked the LLM to produce a mapping to de-duplicate the terms. This is what the prompt looked like: There were better ways to do this than using an LLM. But you know what happens when all you have is a hammer? Yep, exactly. The normalization step was inefficient, but it did its job. This was the last piece of the puzzle. With all the extraction scripts and their normalization passes working correctly, I left my MacBook running the pipeline of scripts all day. I’ve never seen an M-series MacBook get this hot. I was worried that I’d damage my hardware somehow, but it all worked out fine. There was nothing special about this step. I just decided on a list of visualizations for the data I’d extracted, then asked Claude to write some code to generate them for me. Tweak, rinse, repeat until done. I’m underwhelmed by the results of this experiment. I didn’t quite learn anything new or interesting from the output, at least nothing I didn’t already know. This was only partly because of LLM limitations. I believe I didn’t quite know what questions to ask in the first place. What was I hoping to discover? What kinds of patterns was I looking for? What was the goal of the experiment besides producing pretty graphs? I went into the project with a cool new piece of tech to try out, but skipped the important up-front human-powered thinking work required to extract good insights from data. I neglected to sit down and design a set of initial questions I wanted to answer and assumptions I wanted to test before writing the code. Just goes to show that no amount of generative AI magic will produce good results unless you can define what success looks like. Maybe this year I’ll learn more about data analysis and visualization and run this experiment again in December to see if I can go any further. I did learn one thing from all of this: if you have access to state-of-the-art language models and know the right set of questions to ask, you can process your unstructured data to find needles in some truly massive haystacks. This allows you analyze datasets that would take human reviewers months to comb through. A great example is how the NYT monitors hundreds of podcasts every day using LLMs. For now, I’m putting a pin in this experiment. Let’s try again in December. List of things I was grateful for, if any List of hobbies or side-projects mentioned List of locations mentioned List of media mentioned (including books, movies, games, or music) A boolean answer to whether it was a good or bad day for my mental health List of mental health issues mentioned, if any A boolean answer to whether it was a good or bad day for my physical health List of physical health issues mentioned, if any List of things I was proud of, if any List of social activities mentioned Travel destinations mentioned, if any List of friends, family members, or acquaintances mentioned List of new people I met that day, if any A “core” prompt that was common across analyses Task-specific prompts for each analysis

0 views
Pete Warden 4 weeks ago

Announcing Moonshine Voice

Today we’re launching Moonshine Voice , a new family of on-device speech to text models designed for live voice applications, and an open source library to run them . They support streaming , doing a lot of the compute while the user is still talking so your app can respond to user speech an order of magnitude faster than alternatives , while continuously supplying partial text updates. Our largest model has only 245 million parameters , but achieves a 6.65% word error rate on HuggingFace’s OpenASR Leaderboard compared to Whisper Large v3 which has 1.5 billion parameters and a 7.44% word error rate. We are optimized for easy integration with applications, with prebuilt packages and examples for iOS , Android , Python , MacOS , Windows , Linux , and Raspberry Pis . Everything runs on the CPU with no NPU or GPU dependencies. and the code and streaming models are released under an MIT License . We’ve designed the framework to be “batteries included”, with microphone capture, voice activity detection, speaker identification (though our diarization has room for improvement), speech to text, and even intent recognition built-in, and available through a common API on all platforms. As you might be able to tell, I’m pretty excited to share this with you all! We’ve been working on this for the last 18 months, and have been dogfooding it in our own products, and I can’t wait to see what you all build with it. Please join our Discord if you have questions, and if you do find it useful, please consider giving the repository a star on GitHub, that helps us a lot.

0 views