Posts in Programming (20 found)

My Biggest Gripe With YouTube

3 years ago, I started a YouTube channel called JSLegendDev where I uploaded tutorials teaching the JavaScript programming language through the development of 2D games. The state of the space around the time I started was as follows : Tutorials inferior to an hour in length were not in demand. They made very little views. Tutorials divided into multiple parts where dead on arrival. You were guaranteed dwindling views on every new upload. To adapt, other content creators started uploading longer, multi-hour, often project based tutorials which translated to more views. Seeing the shift, I also decided to follow suit and uploaded tutorials reaching the 4-10 hour mark. I saw some success doing this. Therefore, I kept at it for a while. However, as time passed, I got tired of recording extremely long tutorials and they, in general, started to make less views. There are many hypotheses as to why YouTube’s algorithm started serving tutorial content less. The advent of AI could’ve been the likely cause but also a general shift in YouTube becoming more of an entertainment focused platform to the detriment of educational content. Something you now put on TV to relax. In the programming space, channel producing content that can be watched passively like tech news, tech drama, tech history, high level discussions, etc… continued to thrive. Seeing this new shift and because I was genuinely tired of making YouTube tutorials, I published my first scripted video titled “How do Devs Make Levels Without Game Engines” which was first published as an article. In that piece, I told the story of how I discovered a convenient way to design levels for my games using an external editor called Tiled in conjunction with my editor-less game framework. At the end of that video, I promoted a paid tutorial I made teaching the exact steps needed to achieve what was presented. The video ended up accumulating over 30k views, which was pretty great! It took far less effort to make compared to my multi-hour tutorials and I was able to make a few sales on my paid tutorial I mentioned within. Previously, I was very unsuccessful in selling any paid courses and I didn’t quite understand why. However, the answer now hit me like a truck. Why would anyone still have the appetite for a paid course after having invested the time following a free multi-hour course? Even if the subject of the paid offering was different, they would probably be too tired to commit to another one. Anyway, following in the footsteps of this first breakthrough, I uploaded another scripted video titled “You Can Now Make PS2 Games in JavaScript” which was again first published as an article. In that video, I told the story of how I discovered that you could make PS2 games in JavaScript and provided an overview of how the viewer could get started. Despite including very practical knowledge, the viewer was never expected to follow along and therefore could watch it passively. It was a resounding success, over 100k views! Unfortunately, I didn’t sell any courses in that video because I simply didn’t have the energy to both make the video and a course. The best business decision would have been to wait before uploading. I’ll go into more details later, but my biggest gripe with YouTube is that it’s no longer a great platform to build an audience but rather it’s only good for reach and here, I had wasted a lot of reach. After having made so many game development tutorials, I wanted to try my hand in creating an original game that I would sell on Steam. Once the project was starting to take shape, I had the idea of making a video about it to gauge interest as I wasn’t sure it would find an audience. Therefore, I had the idea of using the same format used in my two previous successful videos. However, rather than focusing on technical details, I instead would tell the story of how I came up with my game’s design covering the various iterations and challenges I faced while working on it. Therefore, I ended up uploading a video titled “Making a Small RPG” which again, was originally an article. It was also a resounding success reaching barely below 100k views! However, it came with a hidden cost. That cost was the tipping point that made me realize that YouTube is no longer a good platform to build an audience on. I naively thought that if the video performed well, this would translate to subscribers and an audience eager to hear more about the project, but this wasn’t the case. I had made a big mistake by not setting up a Steam page to direct viewers to before publishing the video. On my next upload concerning the project, the fall off in terms of views was brutal. I went from 98k views to below 10k. It became clear that YouTube was acting as a gatekeeper between me and the audience I thought I had built. After reflecting on the situation, I came to the following conclusion. The reason my 3 previous videos had performed well was due having certain characteristics that aligned with YouTube’s goal as a platform, which consists in making people watch videos for as long as possible so they can serve more ads. I listed them below : The subject of all three videos were remarkable which lead to people clicking on them. Something is remarkable when it obviously stands out as being interesting/noteworthy. For example, the subject of my video titled “You Can Now Make PS2 Games in JavaScript” is remarkable because the PS2 is a very popular, but now old console and you had to use a hard programming language called C++ to make games for it. Being able to now use JavaScript, a simpler but most importantly, a language originally designed for making websites and not games, makes the subject come across as immediately noteworthy. Therefore, remarkable. The use of storytelling made people eager to watch more of the video. This can be explained by the fact that we instinctively want to know what happens next in a compelling story. Finally, the length of the videos were all above 10 minutes and the 2 more successful ones were in the 15+ min range. This resulted in more absolute watch time compared to shorter content. For example, if 2 videos are both watched fully by the same audience. The shorter one will translate to less total time spent on the platform compared to the longer one. Therefore, YouTube will recommend the longer one instead because there’s an opportunity cost to doing otherwise. To understand the fall off, it’s important to first mention that usually, series on YouTube don’t work. The second video of a series ends up making less views than the first because it requires prior context before clicking. Thus reducing its appeal and limiting its reach. However, I knew this going in. I tried making the second video as independent as possible but in the end, a second video talking about the same subject was bound to be less remarkable. It didn’t help that because I summarized the content of the first video in the second one, a familiar viewer would have found it less engaging making the video further away from hitting criteria 2 and 3 that I outlined above. Consequently, I realized I had wasted my biggest marketing ammunition regarding my small RPG game as I had no way to contact the audience hit by the first video. Like with the one on making PS2 games in JavaScript, I had wasted tremendous reach. At this point, I realized my biggest gripe with YouTube was simply that I could not access my audience reliably. Therefore, was it really my audience? On one hand, YouTube allows someone without a following to reach millions but on the other, the link to those reached is fickle. I thought I was building an audience by gaining subscribers but instead, I was building a sand castle that could easily be carried away by the slightest algorithm waves. YouTube wasn’t always like this. People used to subscribe to channels and seek their content in their subscriptions tab. However, the platform effectively buried this model by conditioning users to seek recommended videos on the home page and deprioritizing the Subscriptions tab to the point that it barely looks like a clickable section. You have to click on the “Subscriptions” text to access your sub feed. Doesn’t look very clickable doesn’t it? I think that we’re now entering an era where YouTube is starting to treat content creators as interchangeable much like TikTok. They saw the success TikTok had, tried to replicate it with Shorts and now YouTube long form is getting affected as well. I fear that in the future, uploading to YouTube will look no different than making posts on Reddit. You might get views, you might get comments, but they’re self contained to a specific post with no following building up and no guarantee of your next posts having the same reach. The conclusion to all of this is that it’s not worth it to be a YouTuber. Relying on YouTube adsense and sponsorships (sponsors use views as a metric to determine how much to pay you) for your livelihood is simply not sustainable due to how fickle getting views on the platform is. Therefore, focusing so much on making YouTube content will most likely lead to your exploitation. That said, is quitting really the answer? Considering that YouTube can give you incredible reach even if you’re a nobody as long as you make content that is remarkable, engaging (for example, through storytelling) and long enough, it would be stupid to completely walk away, at least in my case. Therefore a new strategy appears on the horizon. It consists in building your audience outside of YouTube through a mailling list (Substack conveniently allows you to do so) and to strategically make occasional compelling YouTube content to tap into the platform’s reach potential. However, the key is to always direct viewers to the mailling list. Why is building an audience through email so important? because it allows you to have a direct and long lasting link with your audience. It also gives you independence from social media platforms. Even in the case of Substack, where this article is currently hosted, I can export my email list and move to another platform or email sending service without my subscribers even noticing. This shift implies that I no longer need to worry about pumping frequent content for YouTube because I’m not making money through them or worrying about doing so. By making YouTube content rarely, I get to keep most of my energy to build something compelling outside the platform like an actual game, writing interesting articles, making an in-depth course or other kinds of art/products. This plan seems to me as more sustainable and more healthy long term. That’s about all I’ve got to share. Hope this article was insightful. If you’re curious to see where this journey will lead, I recommend subscribing! I usually write about programming, game development and game design. Subscribe now You can check some of my previous articles below. Tutorials inferior to an hour in length were not in demand. They made very little views. Tutorials divided into multiple parts where dead on arrival. You were guaranteed dwindling views on every new upload. The video ended up accumulating over 30k views, which was pretty great! It took far less effort to make compared to my multi-hour tutorials and I was able to make a few sales on my paid tutorial I mentioned within. Previously, I was very unsuccessful in selling any paid courses and I didn’t quite understand why. However, the answer now hit me like a truck. Why would anyone still have the appetite for a paid course after having invested the time following a free multi-hour course? Even if the subject of the paid offering was different, they would probably be too tired to commit to another one. Anyway, following in the footsteps of this first breakthrough, I uploaded another scripted video titled “You Can Now Make PS2 Games in JavaScript” which was again first published as an article. In that video, I told the story of how I discovered that you could make PS2 games in JavaScript and provided an overview of how the viewer could get started. Despite including very practical knowledge, the viewer was never expected to follow along and therefore could watch it passively. It was a resounding success, over 100k views! Unfortunately, I didn’t sell any courses in that video because I simply didn’t have the energy to both make the video and a course. The best business decision would have been to wait before uploading. I’ll go into more details later, but my biggest gripe with YouTube is that it’s no longer a great platform to build an audience but rather it’s only good for reach and here, I had wasted a lot of reach. After having made so many game development tutorials, I wanted to try my hand in creating an original game that I would sell on Steam. Once the project was starting to take shape, I had the idea of making a video about it to gauge interest as I wasn’t sure it would find an audience. Therefore, I had the idea of using the same format used in my two previous successful videos. However, rather than focusing on technical details, I instead would tell the story of how I came up with my game’s design covering the various iterations and challenges I faced while working on it. Therefore, I ended up uploading a video titled “Making a Small RPG” which again, was originally an article. It was also a resounding success reaching barely below 100k views! However, it came with a hidden cost. That cost was the tipping point that made me realize that YouTube is no longer a good platform to build an audience on. I naively thought that if the video performed well, this would translate to subscribers and an audience eager to hear more about the project, but this wasn’t the case. I had made a big mistake by not setting up a Steam page to direct viewers to before publishing the video. On my next upload concerning the project, the fall off in terms of views was brutal. I went from 98k views to below 10k. It became clear that YouTube was acting as a gatekeeper between me and the audience I thought I had built. After reflecting on the situation, I came to the following conclusion. The reason my 3 previous videos had performed well was due having certain characteristics that aligned with YouTube’s goal as a platform, which consists in making people watch videos for as long as possible so they can serve more ads. I listed them below : The subject of all three videos were remarkable which lead to people clicking on them. Something is remarkable when it obviously stands out as being interesting/noteworthy. For example, the subject of my video titled “You Can Now Make PS2 Games in JavaScript” is remarkable because the PS2 is a very popular, but now old console and you had to use a hard programming language called C++ to make games for it. Being able to now use JavaScript, a simpler but most importantly, a language originally designed for making websites and not games, makes the subject come across as immediately noteworthy. Therefore, remarkable. The use of storytelling made people eager to watch more of the video. This can be explained by the fact that we instinctively want to know what happens next in a compelling story. Finally, the length of the videos were all above 10 minutes and the 2 more successful ones were in the 15+ min range. This resulted in more absolute watch time compared to shorter content. For example, if 2 videos are both watched fully by the same audience. The shorter one will translate to less total time spent on the platform compared to the longer one. Therefore, YouTube will recommend the longer one instead because there’s an opportunity cost to doing otherwise.

0 views
Langur Monkey 2 days ago

Langur Agent

Langur Agent is a simple, open, hackable CLI AI agent for Linux and macOS. It connects to any service providing an OpenAI-compatible endpoint. It features: The source is available in this repository . Langur Agent has been tested on Linux and macOS only. Install the agent with: Run the agent with the default session: If you need an API key to access the endpoint, put it in the file. Langur Agent looks for the file in the following locations, in order: Create the file with the API key: The agent uses to load at startup. The package reads from the environment automatically. You can also set in your shell profile. On first run, the configuration is created in . You can configure the agent interactively with the slash command. The agent works with any OpenAI-compatible endpoint, so LM Studio, Ollama, OpenWebUI, or any other service you configure. Here are the default values: Run the agent, and then you can enter your prompt. You can use the following key bindings during input: During inference, you can cancel the turn and return to the input prompt with Ctrl + c . Use to print information about the available commands, and to configure the agent interactively. Internally, Langur Agent uses sessions to separate different memory histories. Sessions are named by the user. By default, the agent uses the session. You can start in a different session (either create a new one, or restore it if it exists) with the argument: The default session’s name is , so the following two commands are equivalent: You can also list the existing sessions with : Sessions contain: For now, the configuration file is the same for all sessions. Sessions are matched by the directory name in the sessions location ( ). You can rename a session by just renaming the directory! You can enable mode for the current session with the command , or permanently in the configuration . External editor —In mode, exit INSERT mode ( Esc ), then press v to edit your prompt in an external editor (uses your or variable). There are a few commands available to use in the agent loop. You can list them with . Also, use (e.g. ) to show additional help for a command. Persistent memory follows XDG Base Directory spec in : In addition to persistent memory, the agent maintains a chat history of recent user input and assistant output pairs. This provides context that survives beyond the LLM’s context window. Here is how it works: Persistence: Configuration: Langur Agent can be easily customized and extended by adding new tools, commands, and skills. If you create a cool new tool, skill, or slash command, consider contributing it via a pull request! Create a file in or use one of the existing ones. To create a tool, create a method and decorate it with : Tools are auto-discovered on startup. The process is very similar to tools. You need to create your method, preferably in , and decorate it with . A slash command must return, in that order, , , , : Decorated commands are automatically registered, and auto-completed in the input prompt. Add a file in with YAML front matter, following the agentskills.io standard: The front matter and are parsed and shown in the skills list. The body is injected into the system prompt. session management memory management visual candy autocompletion interactive configuration Python 3.13+ for dependency management Current directory, Home directory, Alt + Enter : add a new line Enter : submit the prompt Ctrl + q : quit The input history Chat memory (see chat memory ) Notes (see session memory ) User profile (see session memory ) — user information — persistent notes (added via tool) Memory is loaded into the system prompt each turn tool adds notes during a session tool explicitly persists memory to disk Memory is auto-saved when the agent exits (interactive mode) Each user message and assistant response is stored in memory Reasoning is omitted from chat memory Automatically compacted when exceeding the configured character limit The user can trigger the compaction any time with Chat memory is attached to the system prompt on each turn The agent displays the last 10 exchanges, with long messages truncated Chat history is persisted to Automatically loaded on startup Saved after every exchange (user input or assistant response) Compacted history is also persisted to disk : a indicating if the command succeeded or failed. : an optional short status message. It is printed with or . : an optional with the Python Rich-formatted content, it is printed to the output. : an optional formatted in Markdown, it is printed to the output.

0 views
daniel.haxx.se 2 days ago

curl up 2026 summary

Getting curl developers and related enthusiasts into a single room to hang out in the real world for a whole weekend once a year is awesome. We find inspiration, we share experiences, we learn from each other and we dream and plan of future endeavors and things to work on. Seeing faces, hearing voices and watching body language help us communicate better virtually and on video calls during the rest of the year. We have gathered curl people like this annually since 2017, even if some years during Covid were “different”. To me, this is one of the best events of the year. I get to hang out and talk curl with good friends a whole weekend! The 2026 edition was held in Prague in late May and kept the general style of past events. About 25 people got into the room. We had five curl maintainers present and quite a lot of local curious minds. The curl up format is easy, casual and friendly. We do topical presentations, followed up with Q&A and discussions around the topics brought up – of course usually with reflections about curl’s role, both past and future. We live-stream and record the presentations to allow our friends who could not attend to keep up both in real-time but also after the fact. Unfortunately the tech is not always on our side so the quality sometimes is a little lacking. This year I brought an HDMI-splitter and an HDMI-to-USB device to allow us to get better recordings, but they were not working as smoothly as intended so we had to use inferior backup solutions for most of the meetup. This presentation above was the “keynote”, the introduction talk to the event. We then also recorded another nine session that are all available in the curl up 2026 playlist on YouTube. To give you all a little glimpse of what curl up is about, here’s a gallery showing some of the speakers and some scenery. Daniel Stenberg Alexandr Nedvedicky Daniel Stenberg Jim Fuller Jim Fuller Carlos Henrique Lima Melara Jim Klimov Moritz Buhl Stanislav Fort Daniel Stenberg Igor Chubin Igor Chubin Daniel Stenberg Daniel Stenberg and Frank Gevaerts All photos taken by and donated to us by an anonymous curl fan present in the room.

0 views
Martin Fowler 3 days ago

Fragments: May 27

At the GOTO Conference in Copenhagen in 2025, Kent Beck and I spent some time on stage talking and answering questions from the audience - a format I refer to as “two old geezers on a park bench”. We talk about our experiences with LLM-augmented programming (at that point - October 2025), we show our frustration that things we’ve been saying for thirty years still need to be said, we say how anything like a manifesto reunion needs to be led by a younger generation, and opine on what junior developers should be focusing on in their career. ❄                ❄                ❄                ❄                ❄ Ian Johnson has written a series of posts about restructuring a gnarly codebase The story follows a real Laravel + React codebase over ~3 months and ~258 commits from a legacy monolith with no tests to a well-structured application with automated quality gates, a React SPA migration in progress, and an AI agent that reliably ships production code with minimal supervision. The series covers the steps in decent detail, and his approach follows the kinds of steps I’d use. First get everything under the control of decent characterization tests, add static analysis, introduce the right patterns to make things flow easily. With all of this, is his use of AI, which changed during the exercise: For the first two months of this project, I used Claude Code with auto-approve turned off. Every file edit, every terminal command, every change… I reviewed it before it executed. […] The results were good. The code was clean. But I was doing most of the thinking and half the typing. The agent was a fancy autocomplete with better suggestions. I wasn’t getting the leverage I’d hoped for. I read an article about “on-the-loop” versus “in-the-loop” human-AI collaboration. The framing clicked immediately […] I was micromanaging because I didn’t trust the agent to do the right thing. And I didn’t trust the agent because there was nothing forcing it to do the right thing. His early steps put in tests, static analysis, and the right architectural patterns. With those in place, he could let the agent do more work. My role shifted from writer to curator. I don’t write most of the code anymore. I Define the patterns […] Review the test specs […] Review the output […] Update the harness […] Make strategic decisions […] He finishes the series with conclusions about how he’d generalize his experience to other circumstances. ❄                ❄                ❄                ❄                ❄ Back in the land of my birth, there was some notable groans when the National Health Service decided to close nearly all of their Open Source repositories , supposedly to the security threat of LLMs. Closing repos like this isn’t an effective counter to LLM-augmented attackers. I suspect it’s no coincidence to see GDS (Government Data Services), the highly-regarded IT enablers in the UK government publish their position Moving code from public to private as a substitute for investment in secure-by-design delivery, ownership and remediation is a warning sign because it reduces sharing and scrutiny, can slow coordinated improvement across government and suppliers, and does not remove the underlying weaknesses in a running service. Terence Eden memorably sums up his view on this: Within the UK’s Civil Service you occasionally hear the expression “being invited to a meeting without biscuits”. It implies a rather frosty discussion without any of the polite niceties of a normal meeting. ❄                ❄                ❄                ❄                ❄ I’ve seen a few cases where those developers who are most involved in working with LLMs find they are running into a problem with cognitive endurance, Adam Tornhill has joined this group : One of the big wins with agents is that they let us stay with the higher-level problem for longer. We get less sidetracked by details, dependency cleanup, and similar secondary tasks that used to break concentration. But there is a cost we are still underestimating. Agentic coding is mentally expensive. I can usually sustain the pace for a couple of hours. Then I need a break. The pace is simply too intense. And based on conversations with other engineers, I do not think I am alone in that. He explains that working with The Genie means we are making more decisions in less time, this increase in decision density is hard on the brain. He responds by keeping agent tasks small, automating everything he can, and accepting that he won’t know every line of code as long as he has good verification mechanisms in place. Notably, he has not gone in the direction of doing his work with swarms of agents that he coordinates. Instead has one long-running task that he babysits and one focus task That last point is important given the running-twenty-agents-in-parallel hype. I cannot even think about twenty meaningful things to build, and even less so about the resulting cognitive tax of the likely interruptions. It’s exactly the wrong thing to even consider. At least for humans. (And yes, I understand sub-agents and machine parallelisation. That is not what I’m objecting to. It is the parallelisation of human attention that does not scale). I liked that he included some thoughts about what folks can do in time outside this intense programming time. Not just “have a coffee” (although he includes that) but also about learning about the domain that the software supports. ❄                ❄                ❄                ❄                ❄ A couple of pithy quotes from social media Lorin Hochstein “Metaphor debt” is when all of your metaphors involve the concept of “debt” because you can’t think of any other metaphors anymore. ❄                ❄ Daniel Terhorst-North If a vegan crossfit fan is using Claude to write Rust, which thing do they tell you first? ❄                ❄                ❄                ❄                ❄ Karl Bode reacts to speakers getting booed when mentioning AI during commencement addresses. He points out that younger folks are increasingly unhappy with the tech oligarchy and their fruits . The thing is the kids aren’t stupid. They see the field clearly. They see the difference between what’s being sold to them by tech companies, the press, and commencement speakers, and what they have repeatedly seen with their own eyes. They’ve watched tech oligarchs spend the last decade mired in scandal after scandal, hype cycle after hype cycle, steadily enshittifying everything they touch along the way. The percentage of Gen Z that think AI’s benefits don’t counterbalance the risks now sits around fifty percent, up 11 percentage points in just the last year. Eight out of every ten believe that using AI makes the process of actual learning more difficult. He sees young people saddled with the perception of entering a worsening world - which leads them to rage against this latest fruit of the tech oligarchy. A rage that is easy for folks like me - with a comfortable retirement off-ramp - to properly appreciate. A rage that could have marked political and social consequences. ❄                ❄                ❄                ❄                ❄ Relevant to these concerns are a couple of items in last week’s Economist newspaper. The newspaper argues that historically major technological advances haven’t led to significant unemployment or drops in wages ( paywalled article ). The closest was the original industrial revolution in 19th Century Britain. There was a stagnation in wages during this period, but there was also a massive increase in population, from 4½ million to 12 million. It also points out that we’ll probably only understand the full consequences of all this when a recession hits, as this is when most unproductive jobs tend to be flushed out of the system. A second article ( also paywalled ) indicates that AI is having some effect on graduate hiring. They did an analysis of surveys of recent graduates, looking to see if employment varied depending on a job’s exposure to AI. The least exposed quintile of subjects saw employment rate fall by 1.5% over the last couple of years, while the most exposed quintile’s drop was 6.6%. ❄                ❄                ❄                ❄                ❄ Lawfare isn’t impressed with the latest efforts by the US Government to regulate AI. On [last] Wednesday, the White House invited leaders of OpenAI, Google, Anthropic, Meta, and Microsoft to the Oval Office for a signing ceremony the following afternoon. President Trump was to sign an executive order on AI and cybersecurity—the administration’s most formal effort yet to establish a voluntary process for reviewing frontier models before their release. But roughly three hours before the ceremony, when some company executives were already in the air to Washington, the White House called it off. They see the proposed regulations as mild, and including some valuable measures to harden defenses against cyber threats. But it’s worth underscoring the implications of postponing (if not outright canceling) this order, which, by its own terms, was about as modest a frontier-AI intervention as the federal government could put on paper: voluntary, focused on the government’s own defenses, and explicitly barred from becoming a licensing regime. The objection isn’t so much about government coercion as about the government having any settled role at all. Voluntary, in other words, isn’t the floor of frontier AI policy in this administration; it’s the ceiling. This is a questionable position given that the concerns animating this draft order will likely grow in the near future. It is also self-defeating for those who applauded the order’s delay or demise. Far from resolving the risk of government meddling in AI, killing the order just leaves in place what Ball has described as the “opaque and essentially lawless” alternative: government access happening through back channels, on terms set case by case, with no stable rules at all. One of the problems here is a distinct lack of governmental expertise, either in AI or in software in general. Too much is being decided at the whims of the tech oligarchy, there isn’t any attempt to engage in the broader issues at hand. That’s not entirely a bad thing, trying to regulate something that’s still evolving so fast is usually a fool’s errand - but the problem here is the impact of AI is so big that there’s real danger in being too far behind. ❄                ❄ Which leads me to a rare thing, an endorsement of a candidate for political office. If you are voting in congressional district MA-06 (North Shore of Massachusetts), I’d seriously look at Beth Anders-Beck , who is running for congress in that district. Beth has a long background in software development (including developing the notion of Forest and Desert ), so would introduce expertise that Congress desperately needs. I’ve known Beth for decades, and have a high opinion of their intelligence, judgment, and ability to work with others. Congress doesn’t deserve Beth, but it does need her.

0 views
Heather Burns 3 days ago

Born Crotchety

I spoke with The National about the proposed UK social media ban for teenagers.  That’s an archive link due to their unfortunate adwall. There’s nothing I offered in my delightfully crotchety comments that I wasn’t already saying four, five, six, and seven years ago, but if anyone had listened to me four, five, six, and […]

0 views
iDiallo 4 days ago

How Many Tokens Did You Burn Today

Early in my career, a manager at one of the big firms where I worked made a request so absurd it remains etched in my memory. I walked back to the team, repeated what he had asked, and couldn't finish the story without laughing. He wanted me to create a pie chart, of lines of code, per developer, per week. We all lost it. Our lead developer asked if, by any chance, the manager's eyes looked glassy. We laughed even harder. Because yes. Yes, they did. He was always high. That was twenty years ago. I've repeated that story countless times, and it always drew chuckles as we discussed the disconnect between software teams and management. Any software engineer could relate. We all knew that lines of code were a meaningless metric. A junior could write a thousand lines of spaghetti. A senior could fix the same problem with forty elegant ones. But then, last week, I found my name at the top of a leaderboard. My employer had been exploring productivity tools and trialed one they thought would be useful. After the trial, they were quoted $500k a year. The tool tracked developer productivity and integrated with Atlassian products, Microsoft, and many other services we used. The price was too steep, so it was dropped. A couple of months later, the same company came back with a discount. The exact same tool for just $50k a year. My employer jumped at the opportunity. How many bytes did you use today? I'm looking at this dashboard right now and I see my name at the top of the leaderboard. I click on the widget, and a pie chart appears. There it is: a breakdown of the total lines of code my team has produced using AI, by individual. This isn't limited to my employer. Every company is putting something together to track AI usage and justify the investment. Instead of tracking project completions, we're tracking how many lines of code each developer generated with AI. And the joke's on me, because nobody is laughing. The whole industry is applauding and encouraging employees to use more of it. I didn't become the champion because I have some neat agentic workflow. It was done by complete accident. While using an LLM, I accidentally selected "planning mode" for a request that had already been planned. The agent ran for several minutes, burning tokens to resolve a problem that didn't exist. Just like that, I made it to the top, without ever writing a single line of code. If this widget is taken at face value, it won't be long before developers start gaming it deliberately. Just let the agent run overnight, and your employer can claim a 10x improvement in productivity. We didn't use line count as a productivity metric in the past because it never made sense. Whenever we refactor code, we often end up with less than we started with. In fact, much of the time I spend modifying AI-generated code is spent deleting unnecessary things it created. Should we track negative lines of code? The better you are at programming, the worse your numbers look. We are assessing developers by the lines of code. I've watched AI evangelists ask "how many tokens did you burn today?" They were trying to convince an audience that productivity is directly proportional to token usage. It reminds me of the transition from paper to computers. A computer evangelist of that era might have asked: "how many bytes did you use today?" Token counts, lines of code, bytes, none of these have anything to do with actual productivity. Metrics are often entirely disconnected from what they're meant to measure. I've seen companies rely on story points only to watch employees point every ticket as high as possible. Choose lines of code as your metric, and lines of code will increase. Reward the highest contributor, and watch everyone double or triple their output by the next performance review. It's a silly metric but it serves a purpose, just not yours. AI companies promote token usage and associate it with productivity because they directly benefit from it. Imagine an internet service provider that charges by the byte. What would their recommendation for productivity be? "Use more bytes!" The best engineers I've ever known wrote less code, not more. They deleted things. They simplified. They understood that the goal was never the code itself. They solved problems, they made the system reliable, and they served the user. Measuring developers by output volume, whether that's lines, commits, or tokens, mistakes the exhaust for the engine. Every era of tooling brings a new class of metric that mistakes activity for value. The spreadsheet didn't make accountants more productive just because they could fill more cells. AI won't make developers more productive just because it can generate more code. We aren't even tracking if the right problems are being solved, and solved well. If the productivity dashboard can't answer that, it's not measuring productivity. It's measuring the subscription.

0 views

Pipeline Parallel Decompression

This isn’t a paper summary, but rather a description of a hobby experiment I’ve been hacking on ("research quality" code). This quote (attributed to either Anonymous or David Clark ) originally referred to networking, but applies to parallel programming as well: There is an old network saying: Bandwidth problems can be cured with money. Latency problems are harder because the speed of light is fixed—you can’t bribe God. Standard "cured with money" parallelization techniques (e.g., shared-nothing architectures, data parallelism) try to minimize cross-core communication. These hammers are great for hitting nails labeled: "improve throughput by throwing more cores at the problem”. Not everything is a nail. Important problems which cannot be solved with this kind of approach include: Parallel network packet processing in cases where load balancing schemes like RSS do not apply Parallel transaction processing when there is high contention between transactions Parallel encryption of a single stream of data Pipeline parallelism has the potential to provide "bribing God” solutions to some of these problems. A potential additional benefit that pipeline parallelism brings to the table is better usage of CPU caches because of a smaller working set. For example, if 8 cores cooperate to process 1 input file, the working set (input data, output data, intermediate data structures) is potentially 8 times smaller than the case where each core processes a separate input file. This caching advantage also applies to instruction caches, as pipeline parallelism distributes the computational steps of an algorithm across cores. Pipeline parallelism has some major drawbacks: Fine-grain synchronization/communication Load imbalance The purpose of this experiment is to put some numbers on the costs and benefits in a real-world application ( DEFLATE decompression). DEFLATE decompression is hard to parallelize because of two tight feedback loops: The position of encoded token in the input stream is not known until token is decoded (because input data is encoded with a variable length code). The output generated by a match (i.e., length & distance tuple) cannot be computed until some amount of previous output has been generated (because a match references previously generated output) A Negative Nancy might view these as problems, but a Positive Pipeliner views them as a guide for how to decompose the algorithm into pipeline stages. The general technique is to dedicate a pipeline stage to each of these feedback loops and whittle them down to be as tight as possible. The design I’ve landed on has three pipeline stages: , , and . The stage computes the length of each encoded token. It simply reads the next 13 bits from the input stream and uses them as an index into a lookup table. The inner loop looks like this: Note that in contrast to non-pipelined implementations, the only thing this code (and the lookup table) are concerned with is finding the length of each token, everything else is dealt with in another pipeline stage. Each iteration of this loop runs in about 8 clock cycles, and the lookup table fits in the L1 cache. The CPU cannot run multiple iterations of this loop in parallel due to the tight dependency chain. The input to the lookup stage is the encoded bits associated with each input token ( in the code above). These bits are used to perform another lookup (in a larger lookup table, stored in the L2 cache) which results in much more information about each token. Optimizing this stage is easy, because it doesn’t contain any tight feedback loops. The CPU can process multiple loop iterations in parallel, which enables it to hide the latency of accessing the L2. If necessary, it would be easy to split this pipeline stage into two. The inner loop looks like this: The structure contains metadata about the input token (literal value and/or information about a match). This data structure does not contain the exact distance associated with the match, the variables named deal with that detail from the DEFLATE spec. The stage writes literals and matches to the output buffer. This code leans on the CPU store-to-load forwarding hardware to deal with match operations which must read data that was recently produced. Each iteration of the inner loop performs a word-sized write of literal data, plus a 32B read and write to read and write match data. Actual store-to-load forwarding is rare, as most match distances are large. The Silesia Corpus contains commonly used files to benchmark compression algorithms. has English text with short matches whereas contains data dumps with longer matches. is an optimized library which can decompress roughly 2-3x faster than the standard . The following chart shows baseline performance on in a shared-nothing architecture where each CPU core decompresses a separate input file. There is one data point for each core count (1, 2, …, 8). As you would expect, throwing more cores at the problem improves throughput, at the cost of slight latency increase. If you want a more interesting tradeoff of throughput vs. latency, you have to bribe God. For example, say you are writing a decompression application. If the user requests a bulk decompression of 100 files, then the optimal choice may assign each file to a CPU core. But if the user requests to decompress a single file, then you would prefer to decompress using multiple CPU cores. And here is the same chart with the 3-stage pipeline implementation added in orange (compare it to the third blue dot from the left for a 3-core vs 3-core comparison): For a 37% cost in throughput, you get a 2x reduction in latency. Here is the chart for , which shows a similar story. Data-parallel throughput saturates at 6 cores. Pipeline parallelism allows a 2.6x latency reduction at the cost of 14% throughput. Dangling Pointers I think there is room for language/runtime support to improve performance of pipeline parallel algorithms on multicore CPUs (by reducing load imbalance). is bound by the chase stage, whereas is bound by the output stage. The programmer could supply multiple implementations of the pipeline (with some compiler help to reduce code duplication), and the runtime could dynamically switch between them depending on which stage is the bottleneck. High level synthesis tools are capable of automatic pipelining. Such techniques could be used to automatically generate many pipeline implementations for the runtime to choose between. The description above leaves out a few implementation details regarding the lookup tables. Because the lookup table data is spread across two cores (i.e., pipeline stages), there is enough room to store data for 2 Huffman tokens (2 literals, or a full match). This provides a large speedup compared to traditional implementations that store all data in the caches of a single core. Because the stage is throughput bound rather than latency bound, it can afford to access the lookup table via a layer of indirection. The 13 input bits are used to lookup a index, and that index is used to access the final data in another lookup table. The second lookup table has fewer entries, but each entry is larger. This reduces the total working set. This design leans heavily on CPU branch prediction. The code snippets shown earlier are for the common cases, with branches used to implement uncommon cases (e.g., a single encoded token that is wider than 13 bits). As long as those cases are rare, branch prediction does a great job of keeping the inner loops humming. An interesting puzzle arose during this experiment. I found that performance could swing widely (~10%) based on where the operating system located stacks of the various threads. The stack address would change from run to run because of ASLR . A little to offset the stack by a small amount would resolve this issue. It seems to be an important consideration when trying to maximize usage of the L1 cache. Subscribe now Parallel network packet processing in cases where load balancing schemes like RSS do not apply Parallel transaction processing when there is high contention between transactions Parallel encryption of a single stream of data Fine-grain synchronization/communication Load imbalance The position of encoded token in the input stream is not known until token is decoded (because input data is encoded with a variable length code). The output generated by a match (i.e., length & distance tuple) cannot be computed until some amount of previous output has been generated (because a match references previously generated output)

0 views
DHH 4 days ago

Basecamp Five

I've been working on Basecamp for half my life, and nearly my entire professional career in software. The first code was written in the summer of 2003 when I was just 23. Now I'm 46, and we've just released the fifth major version.  It's an incredible update to a service that continues to help about a million users a day avoid dropping the ball when working with others. It's AI accessible, but not agent hysteric. It's still famously easy to use, still executes the basics beautifully, and still focuses on the small to medium-sized teams we've been serving in the Fortune 5,000,000 for decades. Here are just three of my favorite new features in Basecamp 5: Lexxy editor: Our new text editor finally brings tables, markdown, and live syntax highlighting for code to Basecamp. Oh, and voice notes. It's built on Meta's Lexical editor toolkit, and it's going to ship as the default for Action Text in the next major version of Rails. Keyboard accessible: After moving to Linux, building Omarchy, and acquiring a taste for mechanical keyboards, I've come to love navigating the computer primarily through hotkeys. So with a lot of effort, Basecamp is now a delight to drive through the keys, and you don't have to be a brainiac to remember them all: just hold down SHIFT, and they're revealed in the interface. SHIFT + S opens the sidebar, ESC moves focus between it and the main page, SHIFT + C starts composing a comment/chat line/answer. The permanent sidebar: If you live in Basecamp, like I do, it's to stay on top of all the new things that are constantly happening in a busy account, and that's just gotten so much faster with the new permanent sidebar. Before, we had a Hey! menu in the top bar. You'd get a little dot when something was new, then you'd open it, click, and the menu would close. If you had five things that were new, it'd be open-click-close, open-click-close, five times. Being able to zoom through these now with just the return key, tap, tap, tap, and I've read three new things. So good. And there's so much more. Jason put together a great summary on the new marketing site, which in itself is brand new too. A back-to-basics design in many ways. As our entire industry is getting swept up in agent hysteria (and I love AI as much as anyone!), we thought it better to focus on the human communication that's the cornerstone of Basecamp. The new site just speaks plainly to that mission and shows you the software right at the top. Another thing that's back is color, specifically in the logo. Basecamp's clever but flat paperclip logo has been replaced with a modern take of our original rolling mountains. In full three dimensions, with depth and a gradient. Love it.  Overall, I'm really proud of what we've built with Basecamp Five. We're inching in on a quarter of a century in service! We still have customers who signed up back in early 2004! This is the kind of legacy that makes me beam, and the new version is just ace.  If you've tried Basecamp in the past, it's time to take another look. If you haven't tried it yet, you're in for a treat.

0 views
Unsung 6 days ago

FAIL_MAIL_OVER_500_MILES=TRUE

Here’s a 2002 story from a younger internet, by programmer Trey Harris ( link to the original and if you don’t like the classic Usenet formatting – my browser’s reader mode can’t even prettify it! – here’s a nicer-looking format ): “We’re having a problem sending email out of the department.” “What’s the problem?” I asked. “We can’t send mail more than 500 miles,” the chairman explained. I choked on my latte. “Come again?” “We can’t send mail farther than 500 miles from here,” he repeated. “A little bit more, actually. Call it 520 miles. But no farther.” It would be easy to assume this is a classic case of pebkac , “problem exists between keyboard and chair,” the derisive term used (supposedly!) by support people, describing naïve public who had a tenuous grasp of technological reality. But the story goes to an unexpected place. This might be the most widely-shared computer bug story of all time I’ve seen – I just saw a comment from 2008 calling it “oldie but a goodie,“ and it even has a FAQ page that’s actually a really great read. There’s quite a bit of chatter inside about something important to me: the balance between the needs of good storytelling and going deep into technical details: In the story, I make it sound like it took all of ten minutes from being made aware of the 500-mile email limit and determining a 3 ms light-speed issue. In fact, this took several hours, and quite a bit of detective work. The point is, eventually I came up with that figure, ran units, and gagged on my latte. You can sense author’s frustration with every nerd trying to “gotcha” him instead of just enjoying the story. Even a younger internet wasn’t without faults. #bug deep dives #bugs #change management #storytelling #web

0 views
Armin Ronacher 1 weeks ago

Building Pi With Pi

Pi is now part of Earendil, but in the important sense it is still Mario’s project. He has been living with its issue tracker longer than I have, and he has been exposed to the weirdness of the new form of agent traffic in Open Source projects for longer too. This post is mostly a reflection of my own experience after spending more time in the tracker, using Pi to work on Pi, and watching what I have learned about it so far. Unsurprisingly, we are using Pi to build Pi. That sounds like a cute dogfooding thing but it really helps understand what we do. An interesting effect of building with agents is that it changes the role of the issue tracker a tiny bit. The issue descriptions are not just messages from a user to a maintainer because we also use them as inputs for prompts in Pi sessions. It is something I might hand to my clanker 1 and say: “understand this, reproduce it, inspect the code, and propose a fix.” That means the shape of the issue matters in a new way. A bad issue was always annoying, but at least a lot of issues were vague. Now we are also dealing with a class of issues that are 5% human and 95% clanker-generated and largely inaccurate shit. A bad issue that contains a plausible but wrong diagnosis creates extra work. The most frustrating failure mode right now is that people submit issues that are not in their own voice. They contain an observed problem somewhere, but it has been thrown into a clanker and the clanker reworded it and made a huge mess of it. Typically, it was prompted so badly that the conclusions produced are more often than not inaccurate but always full of confidence. The result is complete guesswork on root causes, fake-minimal repros, suggested implementation strategies, analogies to adjacent but often the wrong code, and long lists of error classes that might or might not matter. That is worse than no diagnosis. I don’t want to point to specific issues because I really do not want to bad mouth anyone, but it is frustrating. It is also frustrating because when I give that issue to Pi, Pi sees the wrong diagnosis too. It does not treat the issue body as a rumor. It treats it as evidence. It will happily go down the path that the issue already prepared for it, because the prose is confident and the code references look plausible. We use a custom slash command called , which specifically has this instruction in it: Do not trust analysis written in the issue. Independently verify behavior and derive your own analysis from the code and execution path. Unfortunately, it does not fully work, because when humans first throw their issue through the clanker wringer, their clanker expands scope almost immediately. What was once a very narrow and fact based bug observation, turns into a much expanded surface area full of hypotheses. So at least personally, I increasingly want issue reports to be condensed to what the human actually observed: That is enough. If you used an LLM to understand the problem, great, maybe leave it as a follow-up comment. But the issue and the issue text should be something you own. If you do not know the root cause, say that. I too can operate a clanker, and I would rather do this myself than use your slop. If your repro is a guess, say that. If the only hard fact is one stack trace, give me the stack trace and stop there. That we’re seeing issues full of slop is just a result of the present day quality of these machines. Sadly, their failures in creating good issues extend to a lot of code that is generated. Not all of it, but a lot of code. Over and over I keep running into them over-engineering the hell out of issues and implementations. If you tell them that “this malformed session log crashes the reader,” the clanker will often add a tolerant reader. Then it will add a fallback, then maybe a migration, then more debug output, then a test for all of this. None of this is necessarily wrong in isolation, but it can be the wrong move for the system. At Pi’s core is a rather well-designed session log with invariants that must be upheld. The clanker’s present-day behavior is to just assume that no such invariants exist, and instead to make the system work with all kinds of malformedness, blowing up the complexity in the process. Almost always, the correct fix is not to handle the bad state, but to make the bad state impossible. This matters a lot for persisted data such as Pi session logs. They are opened, branched, compacted, exported, shared, and analyzed. The goal here is to never write bad session data. Yet if you just let the clanker roam freely, it will attempt to handle every case of bad data in the session log with a more permissive reader. I have complained about this plenty, but working on Pi’s code base continues to reinforce the point. This is one of the ways LLM authored code grows so much needless complexity. All these models see a local failure and try to locally defend against it. As maintainers we have to keep pulling the conversation back to the global invariant, which is harder than it should be, and it’s laborious. Then there is the issue of volume. The tracker is receiving a lot of issues and PRs, and a significant fraction of them are clearly LLM-assisted. Some are good, none are excellent, and most are just bad. The total throughput is a maintenance problem by itself. As you might know, Pi’s issue tracker is automated to close all issues and pull requests from new contributors, and there is a manual process by which we might reopen some of them or approve individuals. So auto-close -> reopen -> close again is an interesting statistic for us to look at. I pulled the public GitHub tracker data while writing this over the last 90 days. Excluding Earendil members, that leaves 3,145 external issues and pull requests. Of those, 2,504 were auto-closed because they were from non-approved individuals. 17% were reopened. For pull requests the number is worse: less than 10% were merged. Many of the issues and PRs are complete slop and in some cases the humans did not even realize that they created them. Sources of low-quality spam include OpenClaw instances, as well as some skills that people put into their context that seemingly encourage issue creation. GitHub clearly is not built to deal with this new form of Open Source, but I’m increasingly feeling the need to put the blame less on GitHub than on all the people involved who make that experience painful. If your clanker shits on someone else’s issue tracker then it’s not the fault of GitHub, it’s yours alone. Pi might be built with Pi, but we’re quite far off today from where Bun and OpenClaw already are: fully detached, automated software engineering. Maybe we will reach that point, I don’t know. Today it does not seem like we know how to pull off a dark factory and we also don’t yet have the desire. That said, there is quite a bit of parallelism going on, and it is mostly for reproducing issues. The small setup we use for this is three tiny pieces in Pi’s own committed folder. (for analyze is sue) is a prompt for analyzing GitHub issues: it labels and assigns the issue, reads the full thread and links, then explicitly tells the agent not to trust the analysis in the issue and to derive its own diagnosis from the code. Then an extension adds a which watches the prompt before the agent starts, recognizes the GitHub issue or PR URL that (or the PR equivalent) put into the prompt, fetches the title and author with , renders that in a little UI widget, and renames the session. It also rebuilds that state on session start or session switch, so if we reopen an older investigation the window still tells the developer which issue it belongs to. In practice this means it’s possible to have several Pi windows open, each running against a different issue, and the UI keeps the investigations visually distinct while the agents do their independent reproduction and code reading. Once the investigations are done, one can work through them sequentially. To finish off everything, ( wr ap it up) is the matching wrap-up prompt: it infers the GitHub context from the session, updates the changelog, drafts or posts the final issue comment with a disclaimer, commits only the files changed in that session, adds the appropriate when there is exactly one issue, and pushes from . You will have noticed this already but Open Source in a post-AI world is under a strange new pressure. We are getting more code, more projects, and more issues. Projects appear with no real users, or a temporary audience of one, and even projects with thousands of stars can have a shelf life of weeks. For us, Pi’s harness layer is worth maintaining carefully because it solves hard coordination problems and creates a platform we and others can build on. We also know that coordination and cooperation lifts us all up. Many times the right answer is not to work around a problem locally, but to make the upstream behavior correct. Mario has been very good at refusing to make Pi paper over every misconfigured gateway, and we’re trying to preserve that discipline. When a gateway behaves correctly, everybody benefits. Sadly that type of thinking is quickly disappearing because these machines make local workarounds cheap, so code accumulates local defenses against every misbehavior. Instead of humans talking to humans about where a fix belongs, one human and one machine work around the problem in isolation. Keep in mind that AI has not increased the number of people who need software, or the number of maintainers who can review it. It has mostly increased the amount of code and the number of projects competing for attention. Some of that is healthy, but a lot of it fragments effort that should be shared. We need stronger foundations, not weaker ones. Open Source needs more collaboration, not more isolated work with a machine. Human communication is hard, and it is tempting to avoid it when you can sit alone with your clanker. But isolation is not where Open Source derives its value. The value is in the community and the structure that lets projects outlive their original creators. To me, clanker is a much preferable term for agent. Agency lies with humans, not with machines. Calling these things agents I still believe is a mistake, but alas. ↩ I ran this command. I expected this to happen. This happened instead. Here is the exact error or log. To me, clanker is a much preferable term for agent. Agency lies with humans, not with machines. Calling these things agents I still believe is a mistake, but alas. ↩

0 views
Susam Pal 1 weeks ago

Childhood Computing

I recently stumbled upon a nice blog post titled Childhood Computing . It made me think about my own childhood computing experience. I am much older than the author of the aforementioned post but like them, I love computers too. I have for most of my life. When I was about eight years old, my parents decided to transfer me to a new school because of its curriculum. They did not know it then, and it probably did not even matter to them, but this new school had a computer lab. That was quite remarkable for its time. I grew up in a very tiny industrial town. The computers in the lab were hand-me-downs from the silica factory around which the town was built. We got only about two hours of time per month in the computer lab but the little time I got there opened up a whole new world for me. Before entering the lab, we had to leave our shoes at the door. 'These are expensive machines. We must keep them free of dust', our teacher would say. It was a ritual. The computers were very old IBM PC compatible machines, mostly with monochrome cathode-ray tube (CRT) monitors. They had no hard disks at all. They had a few hundred kilobytes of RAM. Every time, we performed the same ritual. Insert a 5¼-inch floppy disk to load MS-DOS into memory. Then insert another disk to load LOGO. Then write small LOGO programs and watch the turtle move. I have written more about that early LOGO programming experience here: FD 100 . Further, since there were no hard disks and storage was at a premium, nothing was ever saved. The moment you turned off the computer, all your work vanished. So saving a program meant literally writing the program down in a physical notebook. Since I got so little time with an actual computer, most of my Logo programming happened with pen and paper at home. I would 'test' my programs by tracing the results on graph paper. Eventually, I would get about thirty minutes of actual computer time in the lab to run them for real. One particular Logo program I still remember very well drew a house with animated dashed lines, where the dashes moved around the outline of the house. Everyone around me loved it, copied it and tweaked it to change the colours, alter the details and add their own little touches. That must have been my first 'free and open source software'. The 'licence' was 'do whatever you want but show me if you make any interesting modifications'. Occasionally, when we successfully completed the Logo programming exercises our teacher set us as challenges, he would let us play computer games too. The first computer game I ever played was Moon Bugs. Space Invaders, Bricks, Dangerous Dave and others were some of my other favourites. Space Invaders inspired me to write my own game but the little GW-BASIC programming I knew back then and the very limited access to computers I had then were insufficient to write anything more sophisticated than simple text-based input/output programs. But eventually, as an adult, I did manage to write an invaders-like game, which you can find here: Andromeda Invaders . Writing this game fulfilled a childhood dream! One of my buddies liked the game called Digger developed by Windmill Software. It soon became my favourite as well. The game came in a self-booting disk, so we did not have to go through the elaborate ritual of first inserting a floppy disk to load DOS. We could insert the Digger floppy disk directly and the computer would boot and start the game immediately. Another computer game I remember fondly was Grand Prix Circuit by Accolade. I really loved typing the command to launch the game, knowing that in a moment I will be greeted with its excellent opening music. Grand Prix Circuit blew my mind. As a child who only knew how to draw basic two-dimensional geometrical shapes with Logo and GW-BASIC, I found it astounding that a computer program could create a projection of a three-dimensional fictional world that you could navigate with keyboard inputs. How was it even possible, I wondered. It has been over 30 years since then, but the memories and the feelings still remain fresh in my mind. There are times when I can close my eyes and recall the buzzing sound of the dozen or so computers running in the lab, the beeps from the power-on self-tests (POST) and the distinctive, strangely pleasant smell of the closed, air-conditioned room. For some reason, that smell is one of the strongest memories I have from those days. I have never been able to describe it well, but once in a while I encounter it in very unexpected places, like a corridor somewhere, or a store, and it takes me right back to those early days of childhood computing. Those childhood computing experiences form some of my strongest and most vivid memories. They were such magical experiences, full of wonder and exploration. Read on website | #miscellaneous

0 views
Farid Zakaria 1 weeks ago

Leaving performance on the table

I have been working with LLVM at , and I have gotten to become familiar with the benefits of optimizing your workloads. I tend to think of optimizing my binaries as thinking about whether I have attached to my compiler flags; maybe if I’m particularly advanced that day I’ll sprinkle in some (link time optimziation) and call it a day. Turns out though that’s leaving lots of performance on the table. Compilers work under the assumption that every branch is is equally taken, unless you are hints like ( ref ). If we can feed the compilers more information about the likely path that our workloads often take, then they can produce much more performant code. There are two primary ways to optimize a binary: instrumented or statistical. When we instrument our binary, we run our workload with an instrumented binary and capture the exact paths that are executed. We will then optimize the binary perfectly tuned to that workload. If our workloads however are varied, we can collect profiles via over a length of time and create an optimized binary based on the statistical occurence of call graphs. Both approaches have their benefits however let’s start with the instrumented variant first, as it’s a little easier to follow and understand. Let’s look at a very simple benchmark. We will calculate fibonocci using SQL in sqlite3 . This is an ideal workload because it’s purely CPU-bound and ripe for optimizing. We will compile from source by downloading it. We can compile a “traditional” optimized binary that merely has and also a version that has LTO enabled since I was also keen to see how much LTO itself adds. Ok, so it looks like our program takes roughly 14-15 seconds to run. Sounds ok? How much better can we do…. 🤔 Next, we compile our program again but we instrument the binary , which effectively injects counters into the program to count invocations of functions. We get very accurate counts of our calls but the binary itself now runs much slower, which can be a problem if your workload was already very slow. Luckily for us, we are in a time domain (~15 seconds), where that is ok. After we have our instrumented binary, we run our workload again to generate the profile data and rebuild the binary with that data. The last step will be to optimize with BOLT, which is a post-link optimizer, which requires us to keep relocations so I’ve also added . When we run our workload with the final optimized binary, we see massive improvement already! 🤯 We’ve cut our workload time down to ~10 seconds which is a nearly a 1.5x improvement. Now let’s optimize the final binary with LLVM’s BOLT . BOLT is a post-link optimizer designed for “large applications”. What this means, is that it largely works by shuffling code around the binary to keep code-paths that have high temporal locality near each other (spatial locality). This can have positive impact on performance due to the instruction cache for instance. Looks like it was a little faster but not much. That makes sense since itself is a pretty small binary (~6MB), but nontheless was good to run through. Running a more thorough benchmark with we can get a final tally of our results. Looks like the I got from the Fedora ecosystem was the slowest . When all the optimizations were applied I was able to get a maximum of 1.38x faster than what was available. These optimizations would be even more dramatic for code-bases that are a sprawl and can heavily vary. Don’t worry also about getting the profile perfectly tuned to your workloads. I have a coworker who often cites that even poor profiles are still much better than no profile at all.

0 views
flowtwo.io 1 weeks ago

Othello World

I was introduced to the board game Othello (also known as Reversi) on a recent trip to Japan. It's one of those games where you can learn the rules in 5 minutes, but the gameplay dynamics are surprisingly deep. When I saw it's played on an 8x8 board, like chess is, I immediately started thinking about how to program a game engine for it. The 8x8 board is helpful because it allows you to represent the board state with 64-bit longs; each set bit in the number indicates the presence of a piece on that square. When you perform a bitwise operation on these numbers you're essentially computing multiple piece movements in parallel with a single CPU instruction. This computational efficiency enables deep searching of the move tree. I purposely started out without reading too much about game strategies because I wanted to explore it through coding the engine logic. It didn't take long to create an algorithm that is significantly stronger than me. Although it's not a high bar. There's a demo available here if you're interested in playing it. The basic building blocks of the game engine are as follows: Once you have these four elements built and wired together, you have a functional game engine to play against. The first two pieces are fairly straightforward—the real strength of an engine comes from how the last two are implemented. Like I mentioned above, we can represent the complete board state with just two 64-bit numbers. One number represents the black piece positions and the other for the white pieces. How you encode the 64 squares to the 64 bits is arbitrary, but I chose to represent each row as one byte (8 bits) and from left to right, top to bottom in terms of bit significance. In other words: And that's all that's needed to represent the piece positions. I created an immutable data class to encapsulate this: In Othello, if one player has no legal moves at any point in time, they skip their turn and the other player gets to go again. If both players have no legal moves, the game ends. Instead of computing both player's legal moves every time to check for those situations, I created a enum so that information somewhat pre-computed. The combination of and provides everything needed to determine the state of the game for the other stages in the engine. This is where things get tricky. Move generation requires codifying the rules of Othello in such a way that, given a board state, all the legal moves for either player can be computed—quickly, ideally. In Othello, you can only place a piece somewhere that will "sandwich" the other player's piece(s) between the piece you're placing and another "anchor" piece of yours. There can't be any blank spaces either. This rule applies to any of the 8 directions of the board (diagonals count). This screenshot illustrates the valid moves for black in this position: This function will calculate all the eligible squares for a single direction of movement (up, down, up-left etc.). What's cool is that it calculates eligible squares for all 8 rows/columns/diagonals at the same time. It's invoked as follows. For each of the 8 directions, you pass in a movement function and an ineligible square bitmask if required for that direction. For example, if shifting towards the left, you need to mask out the pieces on the leftmost column to prevent wrapping to the other side of the board (similarly for moving right). Moving up or down doesn't require a mask because shifting the bits "up" or "down" enough will just drop them from the number entirely. The function will return all valid moves for a given position for the "moving" pieces (the 1st argument). The moves are returned as a where each set bit is a valid square to place a piece. This part was interesting to me as I don't know much about strategy in Othello besides that the corners are important. The corners are important because once you claim a corner it can't be unflipped by the other player. Also, simply maximizing for the most pieces isn't the best strategy either, apparently. I do have a "greedy" algorithm that you can select in the demo app if you want to see that strategy in action. But of course, closer to the end of the game, having more pieces is more important since that's how the winner is determined. I represented this in the eval function by linearly shifting the weighting towards piece score as you get closer to the end of the game. I have two piece scores actually. The is a step function that only returns 1 or -1 depending on which piece colour has more pieces. But in the heuristic evaluation, I look at the actual piece differential score which returns between -100% and +100% depending on what "percentage" of the overall possible pieces the leading player has. That score is given 40% weighting in the heuristic evaluation function, the other 60% is a positional score based on the following square values I came up with: This was my best guess at which squares matter most. My reasoning is that the more central the square is, the more likely it is to be flipped. The closer to the edge it is, the less likely it is to be flipped and the more likely it is to be used as an anchor piece. So putting this all together, the heuristic evaluation is computed as follows: And that's it. The top-level function provides a relative score between -1.0 and +1.0 which represents the strength of a given position, relative to black. Since Othello is a zero-sum game, a good score for one player is an equivalently bad score for the other player. This is important in the next phase, the move search algorithm. This part of the engine is fairly "textbook". There's lots of explanation for how these algorithms work on wikipedia and chessprogramming.org is an incredible knowledge base for this sort of thing too. For zero-sum games, you can use a variant of minimax search called Negamax . That's what's shown here: For Othello specifically, the Negamax function needs to handle the case that the moving player has no legal moves and must pass to the opposing player. This is in the branch in the middle. We check if we're already in a position where the previous player had to pass, which means both players can't move and the game would be over in this branch. If not, we simply call again with the SAME and reverse the score returned from that call. With those 4 components built, I now had a functional engine to play against. I created an class that accepts a move selection algorithm. It exposes 3 methods: - for showing valid player moves in the UI - which validates and then applies a specified player move - which chooses and applies the best move using the I exposed the via a stateless REST API. Each request needs to supply the current game state information in order to make a move. For example: For the demo , it uses HTMX instead to return a rendered board component. The request format is the same but it returns HTML instead of JSON. I read this article recently that took a contrarian view on agentic coding and it's pitfalls. The author makes a lot of good points and it was thought-provoking. While I don't agree that using agentic coding will make you dumber per se ... I do think there's something to be said for regularly exercising the critical thinking and problem solving part of your brain if you want to be a good software engineer. Side projects like this are a great opportunity to do that. The incredible rise in coding competency for AI agents over the last 12 months has made a project like this into a one-shot, one prompt task for a recent LLM. I obviously didn't do that, because the point of this project was the act of doing it, not the end result. I learned a bit about Othello and refreshed myself on bitwise operations. The parts I wasn't interested in doing, the UI and the API wiring, I delegated to an agent to implement for me. To me, that's one of the best parts about coding with AI. I can now offload the tasks I'm not interested in or that's not as critical, and focus on the parts of the system I want to work on. It's never been easier to build and bring ideas to life with software. Board representation Move generation Position evaluation Game tree search

0 views
Martin Fowler 1 weeks ago

Bliki: Vibe Coding

Vibe coding is building a software application by prompting an LLM, telling it what to build, trying it out, prompting for changes - but without looking at any of the code that the LLM generates. This technique can be used by people without any knowledge of programming. However the resulting software often shows problems with maintainability, correctness, and security - so is best used for disposable software written for a limited audience. The term was coined in February 2025 by Andrej Karpathy, an experienced programmer, in a post on X: There's a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like “decrease the padding on the sidebar by half” because I'm too lazy to find it. I “Accept All” always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works. -- Andrej Karpathy The key point about vibe coding is “forget that the code even exists” . This is what gives it much of its usefulness, but also its limitations. Since the November Inflection many programmers are getting LLMs to write all their code, commenting that they may never write a line of code directly again. However they do care about this code, reviewing it, paying attention to its internal structure. In that case, they aren't forgetting the code exists, so it's really a different thing that I call Agentic Programming . Sadly the term “vibe coding” really caught on, so many people use it to mean agentic programming. However I feel that despite this rapid Semantic Diffusion , it's worth trying to keep the concepts of vibe coding and agentic programming separate, as they are both different to use and different in their consequences. Because a vibe coder doesn't look at the code, they don't need programming skills, so it's perfect for someone with no programming knowledge to build applications for their own use. Experienced programmers may also find it handy for rapid development of disposable software or prototypes. Vibe coding is still new, so we are exploring its limitations, and those limitations change as the sophistication of models and their harnesses change. These limitations do introduce considerable risks, particularly if the vibed software is used widely or has access to sensitive information. Perhaps the most serious risk is that of security. LLMs are inherently vulnerable as they provide a large attack surface for predators. Vibe coded applications can often expose sensitive information or worse, credentials to attack deeper into an organization's systems. Even non-programmers need to be aware of the Lethal Trifecta . With little attention to the code, vibed software can rapidly produce many lines of code of a very low quality. Such code makes it difficult, even for an LLM, to modify and enhance the software in the future. While it's possible that growing LLM capabilities will allow it to work with even the largest bowls of spaghetti software, thus far it seems clear that well-structured software makes life easier for LLMs too. LLMs are famous for habit of hallucinating incorrect facts and presenting these with great confidence. This habit also leads them to create software that behaves incorrectly - and those errors may not be manifest to the user. Furthermore the non-determinism of LLMs means that it's likely that asking an LLM to enhance some software could easily lead it to introduce errors, even in parts of the code that shouldn't change due to the new request. We should thus treat LLM-generated software with skepticism, it can still be useful, but we need to be aware of the risks. On the whole vibe coding software is best used for disposable software that's only used by its author or a close group of collaborators who understand and accept the risks involved. Code that is more complex, more widely-used, and with more consequences to its risks should not be forgotten about.

0 views
Martin Fowler 1 weeks ago

Three more static code analysis sensors

Birgitta Böckeler adds discussion of three more sensors for static code analysis, focusing on checking and enforcing better modularity. Computational sensors for dependency checks were good at enforcing rules, but the rules were limited. Building a computational sensor for coupling data proved lackluster. Prompting an inferential sensor to review modularity was more effective.

0 views
(think) 1 weeks ago

nREPL Forever

Last week I announced Port , a small prepl client for Emacs. That post focused on Port itself, but writing it left me with the itch to do a follow-up on the bigger picture, because the socket REPL / prepl story is one I’ve been meaning to write up for years. If you’ve been around Clojure long enough, you remember the chatter. Socket REPL landed in Clojure 1.8 (January 2016), prepl in Clojure 1.10 (December 2018), and for a couple of years there was a steady stream of posts, tweets, and Slack threads to the effect of “this is what we should be building tools on. nREPL is on the way out.” Some serious people put their weight behind that idea, and some of them went and built tools to prove it. Now it’s 2026 and we can take stock. The pitch was good. Socket REPL is just the Clojure REPL exposed on a TCP port. prepl wraps it with a structured printer so the bytes coming back are EDN-tagged maps ( , , , ) instead of a human-readable prompt. Both ship with Clojure itself. No external server library, no middleware, no third-party namespaces. You start a JVM, you bind a port, you’re done. The intellectual case for moving off nREPL had been made by Rich Hickey himself, most clearly in a March 2015 clojure-dev post that’s worth reading in full. Rich didn’t actually attack nREPL by name in that message. What he did was argue carefully for what a REPL is : a thing that reads characters, evaluates forms, prints results, and loops, with those streams available to user code so that things like nested REPLs and debuggers compose naturally. The money line: While framing and RPC orientation might make things easier for someone who just wants to implement an eval window, it makes the resulting service strictly less powerful than a REPL. His proposal, in the same post, was that tools should open multiple connections to the running program: one for the human-facing stream, and dedicated channels for IDE operations. The socket REPL (which landed in 1.8 the following January) and prepl (which arrived in 1.10) were the official implementation of that worldview. A handful of editor projects took the cue and built clients: It was real momentum. If you were following Clojure tooling in 2018-2020, it genuinely felt like nREPL might be the past, and the future would be some combination of socket REPL plus a thin self-installing protocol on top of it. You can find a fair number of “RIP nREPL” hot takes from that period if you go looking. I went and surveyed each of those projects recently while working on Port. The pattern is depressingly consistent: Tutkain started on prepl. In November 2021, its v0.11 release explicitly stopped using prepl message framing and switched to a hand-rolled EDN-RPC protocol that Tutkain boots onto the raw socket REPL by sending it a base64-encoded blob. The new protocol has request ids, op dispatch ( , , , , , , …), and server-managed thread bindings. In other words: Tutkain grew into nREPL, just spelled differently. Chlorine never used prepl directly. It used socket REPL plus an -style upgrade blob. Its author’s successor project, Lazuli , abandoned the whole approach in favor of nREPL. The post-mortem is worth reading and is fairly blunt: tools that attempted prepl went back to nREPL because, honestly, it’s simply better. Conjure had a prepl client in its early Rust days. The current Lua/Fennel rewrite ships only an nREPL client. The author’s reasoning in the release notes was that nREPL “has complete ecosystem adoption and brilliant ClojureScript support.” Clojure-Sublimed technically still talks to a raw socket REPL, but only after sending it an EDN-printing prelude that upgrades the REPL to a structured protocol of tonsky’s own design. His post on the topic is one of the most thoughtful pieces I’ve read on Clojure REPL design, and his conclusion is roughly: the bare socket REPL is more useful than prepl because you can install your own protocol on top of it. Which is true. But notice that everyone who reached that conclusion ended up reinventing the same wheel: ids, ops, request/response correlation, completion support, lookup, interrupts. You know, the things nREPL has had since 2010. So the trajectory looks roughly like this: Pure prepl clients are nearly extinct in the wild. The one I found that qualifies is propel by Oliver Caldwell (of Conjure fame), which is delightful, about 70 lines of Clojure, and explicitly synchronous (one outstanding eval at a time). That works! But it’s not a foundation for the kind of feature set people expect from an editor. Here’s where I land. Rich isn’t wrong that prepl is closer to a “real” REPL in the strict sense. prepl genuinely is a more faithful encoding of read-eval-print: each form goes in, each result comes out, and the semantics match what you’d get at the standard REPL prompt. The thing is, “real REPL” is not the property you optimize for when you’re building editor tooling. The properties editor tooling actually needs are: nREPL was explicitly designed for those properties. The ops, middleware, and transport abstractions exist precisely because the people building it knew the consumers are not humans typing at a prompt, they’re programs negotiating a session. Calling nREPL “not a real REPL” is technically defensible and practically beside the point. Nobody on the consuming end is confused about what nREPL is for . I wrote about nREPL’s revival in 2018 . At that point I had just finished migrating the project out of Clojure Contrib, and the goal was to give it a real home and a working development process. It was a lot of work, but in hindsight things played out pretty well. Looking at where things ended up: Meanwhile prepl is, as best as I can tell, mostly a curiosity. It got me a side project I had fun with. It did not displace nREPL. The history of tooling protocols is full of cases where “purer”, “simpler”, or “more elegant” lost to “shipped, documented, and battle-tested.” LSP beat fifteen ad-hoc language protocols. DAP beat the same fifteen debuggers. nREPL beat prepl in the (Clojure) editor space. It’s not that the simpler thing is bad. prepl is a fine, elegant little protocol, and there’s a real case for embedding it in CI scripts, ops automation, deployment pipelines, or anywhere you want to drive a Clojure VM programmatically without pulling in a server library. Use it there. But for editor tooling? The Clojure community made an enormous, multi-year, multi-tool investment in nREPL. We have the protocol, the middleware, the manual, the books, the conference talks. nREPL works, it’s actively maintained, it’s increasingly portable across Clojure dialects, and the design decisions that Rich called out as un-REPL-like are the exact ones that make it a good substrate for editors. So I’ll say what I felt awkward saying back in 2018: nREPL forever. It’s the right abstraction for the job, and it’s not going anywhere. One more thing. After finishing Port I got curious what a minimal nREPL client would look like by comparison, so I went and built one. As you can imagine, it turned out to be significantly simpler. If that sounds interesting, take a look at neat , a small, language-agnostic nREPL client for Emacs. Keep hacking! Tutkain for Sublime Text Chlorine for Atom Conjure for Neovim (in its early Rust incarnation) Clojure-Sublimed by Nikita Tonsky a steady drip of smaller experiments around , , and friends Editor decides nREPL is too heavy or an undesirable external dependency and starts on prepl. Editor discovers prepl has no ids, no ops, no interrupts, no server-side completion, no namespace tracking, no test runner integration, etc. Editor rolls a custom protocol on top of socket REPL, or… Editor gives up and goes to nREPL. A way to correlate a request with its response when output and results are interleaved. A way to multiplex – one connection, several logical conversations. Server-side hooks for the operations every IDE expects: completion, lookup, go-to-definition, find-references, test running, stacktrace structuring, interrupt. A protocol stable enough that ten different editors can target it without each one inventing its own dialect. nREPL itself is healthier than it has ever been. Active maintainers, a proper manual , a steady release cadence, an actual ecosystem organization on GitHub. Most popular Clojure editors support it. CIDER , Calva , Cursive (via its own client), Conjure, vim-iced , you name it. babashka ships with nREPL built in. You boot a and you get an nREPL server, no extra dependencies. That’s how a lot of people use nREPL in scripting contexts today, and it’s been a hit. basilisp (the Clojure dialect on Python) has nREPL support . nREPL running on Python, talking to Emacs, evaluating Clojure. Nice. ClojureCLR has a working nREPL story now, and jank (the C++ Clojure) has nREPL on its roadmap too. The middleware ecosystem ( , , , , , …) is alive, well, and continues to add features.

0 views
(think) 1 weeks ago

neat: a language-agnostic nREPL client for Emacs

I think I’ll take my REPL neat My parens black and my bed at three CIDER’s too sweet for me… Last week I announced Port , a small prepl client for Emacs. Today I’m following it up with another small Emacs package. Meet neat , a tiny, deliberately language-agnostic nREPL client. For years I’ve been hearing some version of the same request: “could CIDER work with my non-Clojure nREPL server?”. Babashka, Basilisp, nREPL-CLR, even some homegrown servers people built on top of nREPL for languages I’d never heard of. 1 The answer was always the same kind of squishy “sort of, in theory, with caveats”, because while bare nREPL is genuinely language-agnostic, CIDER is not. CIDER was built for Clojure and assumes Clojure pretty much everywhere. I always thought the right answer was “let’s gradually make CIDER more language-agnostic.” That’s the kind of plan that sounds reasonable until you actually try it. The thing that pushed me over the edge was, oddly enough, building Port. Port is small, focused, and doesn’t try to be CIDER. Working on it for a couple of weeks reminded me how (deceptively) productive it is to start from a clean slate when the new requirements don’t match the assumptions baked into a mature codebase. Trying to retrofit CIDER into a language-agnostic shape would have meant fighting with every helper that ever assumed exists, every middleware contract defines, every project-type heuristic that knows about and and nothing else. A whole lot of “is the server Clojure, or is it the other thing?” branches. The Port experience reaffirmed that the right move for a genuinely different client is a new project , not a thousand cuts to an existing one. So was born. The name is short, says what it does (it’s neat, both in the small-and-tidy sense and in the “no deps, no special assumptions, just the protocol” sense), and conveniently leaves room for puns I haven’t fully committed to yet. I might land on a backronym one day. For now it’s just “neat”. neat is a small Emacs nREPL client. The code is split across four files: It only uses Emacs builtins. There are no external runtime dependencies, not even on , because neat doesn’t assume Clojure on the other end. If you write , , , or anything else that talks nREPL, you turn on in that buffer and it just works. The connection routing is also intentionally library-friendly. There’s a buffer-local override so downstream packages can implement their own routing logic, plus a global default for the simple “one server at a time” case that most people will want. Capability discovery is done at connect time via the nREPL op. neat doesn’t hardcode “this server has completions, this one doesn’t” assumptions. If the server reports a op, the CAPF backend lights up (with type annotations next to each candidate, when the server provides them). If it reports , eldoc starts working and jumps to definitions via an xref backend. If neither is there, you still get a perfectly serviceable raw REPL. Start an nREPL server. Anything that speaks the protocol will do. For a Clojure server: Then in Emacs: A REPL buffer pops up, the prompt follows the server’s reported namespace, and you can type expressions at it. Multi-line input works because only submits when the form parses as balanced under (Emacs Lisp syntax by default, which is close enough for any Lisp). Input history is persisted across sessions. If there’s a file in the project, the prompt defaults to its contents, so is enough to connect. To evaluate from a source buffer, turn on the minor mode: The familiar bindings are there, intentionally compatible with what CIDER users expect: ships the buffer contents as an op; uses the standard op instead, so the server can attribute file and line numbers to errors. Use the latter when you’re actually loading a file from disk and care about good diagnostics. sets the buffer-local , which gets sent as the field on every op from that buffer. For languages where the namespace is declared in the source (Clojure’s , etc.), swap in a parser via . For juggling multiple connections, opens a tabulated-list buffer with one row per live connection, where you can set the default or disconnect interactively. That’s roughly the whole user-facing surface today. There’s no jack-in command, no inspector, no debugger, no test runner. Likely there will never be, but if you need those you should probably be using CIDER anyways… If you write Clojure and CIDER works for you, keep using CIDER. It’s mature, full-featured, and supported, and I’m going to keep working on it for as long as people use it. Nothing about neat changes that. But if you find yourself in one of these situations: then neat might be a better fit. It’s small enough that you can read the whole thing in an afternoon, and the library/UI split ( and are perfectly usable from other packages) is genuinely designed for downstream consumers. neat is part of a broader push I’ve been chewing on for a while now: making nREPL a healthy multi-language ecosystem rather than a Clojure-only protocol. That push has three legs: This is also why I keep teasing a “reference CLI client” in conversations. An editor client is one thing, but a small command-line nREPL client written in a non-Lisp language would be a much sharper test of how language-agnostic the protocol really is. neat is plausibly a precursor to that. Time will tell how far I push this; for now I just wanted to get the Emacs side moving. As always, big thanks to Clojurists Together and everyone supporting my open source work. You make it possible for me to keep tweaking and improving CIDER, nREPL, clj-refactor, and friends, and occasionally try something “neat” on the side. isn’t replacing any of the existing Clojure tooling for Emacs. It’s just another tool in the box for the people who want it. Feedback, ideas, and contributions are most welcome over at the issue tracker . Keep hacking! https://github.com/clojure-emacs/cider/issues/3905   ↩︎ For a long time I planned to extract CIDER’s nREPL client code into a reusable package, but now that we have I probably will finally abandon this idea.  ↩︎ : bencode encode/decode. : TCP connections, request dispatch, the standard nREPL ops. : a comint-derived REPL buffer. : the entry point, customization group, and minor mode for source buffers. you write a non-Clojure language whose runtime ships an nREPL server, and you’ve been muddling through with a half-supported CIDER setup, you write Clojure but you value minimalism and don’t need the full CIDER feature set, you’re building an Emacs package that needs to talk nREPL and you want a small, dependency-free library to build on, 2 An actual nREPL specification. The spec.nrepl.org draft is (will be) the formal version of what today is “whatever nREPL the project does”. Reference clients. neat is one. The point of building a deliberately Clojure-free client is that it stress-tests the spec. Anywhere neat ends up needing to special-case the server, the spec has a gap. A compatibility test suite. The parameterised integration suite in neat already runs the same assertions against multiple servers and surfaces real divergences (Clojure batching into a single message where Basilisp emits two, for example). I’d like to grow this into a portable suite that any nREPL server can self-check against. https://github.com/clojure-emacs/cider/issues/3905   ↩︎ For a long time I planned to extract CIDER’s nREPL client code into a reusable package, but now that we have I probably will finally abandon this idea.  ↩︎

0 views
Marc Brooker 1 weeks ago

Agentic software development hypothesis

This is the quality content you come here for, right? Agentic Software Development Hypothesis: First objection: Few meaningful tasks have a complete specification. Second objection: Most oracles aren’t deterministic. Weak form : Any coding task for which a complete specification is available will become trivial. Strong form : Any coding task for which a deterministic oracle is available will become trivial. Strongest form: Any coding task for which a non-adversarial ( pythic? ) oracle exists will become trivial.

0 views
Xe Iaso 1 weeks ago

"No way to prevent this" say users of only language where this regularly happens

In the hours following the release of CVE-2026-45584 for the project Microsoft Windows , site reliability workers and systems administrators scrambled to desperately rebuild and patch all their systems to fix a memory safety vulnerability resulting in arbitrary code execution inside the virus scanner Windows Defender. This is due to the affected components being written in C++, the only programming language where these vulnerabilities regularly happen. "This was a terrible tragedy, but sometimes these things just happen and there's nothing anyone can do to stop them," said programmer Dr. Annabelle Connelly, echoing statements expressed by hundreds of thousands of programmers who use the only language where 90% of the world's memory safety vulnerabilities have occurred in the last 50 years, and whose projects are 20 times more likely to have security vulnerabilities. "It's a shame, but what can we do? There really isn't anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn't want to write their code in a robust manner." At press time, users of the only programming language in the world where these vulnerabilities regularly happen once or twice per quarter for the last eight years were referring to themselves and their situation as "helpless."

0 views
Sean Goedecke 1 weeks ago

Prompts are technical debt too

It’s common and correct to say that “all code is technical debt”. Adding code is a necessary evil for developing new features: you almost always have to do it, but each line of code adds to the complexity and maintenance burden of the system. All future changes to the system have to work with the existing code, or at least avoid breaking it. Once systems accumulate enough code, they become impossible for a single person to understand: instead of reading the code and understanding what it does, you must rely on guesses, theories and heuristics 1 . Sensible engineers write as little code as possible. They write a lot of prompts, though! Many large projects now have a set of codebase-specific prompt files: AGENTS.md, CLAUDE.md, those same files in sub-directories, and skills . If you’re building a program that uses AI 2 , you’ll have separate prompts for capabilities and for each tool , as well as a whole set of system prompts . Prompts are important. Minor tweaks to a LLM’s prompt can unlock significant performance improvements. If the same model feels different across Codex, Cursor, OpenCode, and Copilot, it’s almost certainly due to subtle differences in prompting. AI companies spend a lot of time testing and tweaking their prompts, so it makes sense why engineers would spend a lot of time tweaking their AGENTS.md files 3 for their projects. I’d even call switching tools or workflows to be a form of prompting. If I start wrapping my agents in a Ralph loop , pull in a new skill file, or install an MCP server, that’s still a change to my prompts even though I’m not the one who wrote it. I think it is a bad idea to spend a ton of time tweaking a bespoke agentic coding setup. Why is that, given that prompt adjustments can deliver a lot of value? Because prompt adjustments are model-specific . Earlier I said that AI companies spend a lot of time tweaking their prompts. In fact, they spend that amount of time for each new model release. A prompt that worked great for GPT-5.4 won’t necessarily work as well for GPT-5.5. You have to “learn how to hold the model” each time. In other words, a set of prompts that you carefully crafted in January this year might be out of date or actively harmful by February. Worse still, you might not even notice. Model capabilities are already so hard to pin down (unless you’re running every problem through different models and tools), and even weak AI systems are surprisingly good at some problems. You might just think “huh, the new Anthropic model isn’t as impressive as the hype”, or “wow, Claude Code has gotten worse recently”. In this sense, prompts are a worse form of technical debt than code . When technical debt blows up, it usually causes errors or a tangible slowdown as you try to understand the code. Prompts will decay silently. Also, even janky code tends to be relatively stable when untouched, but every single model upgrade could turn a functional prompt into a non-functional one. Could you simply decide not to upgrade models? Some people are trying this, but the pace of improvement is fast enough that that isn’t really practical. A delicately-prompted agentic harness built around GPT-4.1 is always going to underperform a bare-bones harness built around Opus 4.7. This might be a sensible strategy at some point in the future, when the rate of model improvement slows down (or when models are so capable that you don’t need the extra intelligence for normal engineering tasks), but I don’t believe it’s a good strategy today. In my view, most people should just be picking an AI coding tool maintained by a third-party company (Claude Code, Codex, Cursor, Copilot, etc) and leaving it as unconfigured as possible, so they can piggyback on the work of teams of engineers who are evaluating and tweaking prompts with each new model. Avoid MCP and skills unless absolutely necessary, and keep them off by default. At least this way if one of those teams gets it badly wrong, users will notice eventually and complain about it. When you write AGENTS.md files, try to avoid behavior steering (like the now-outdated “think step by step”, “you are a skilled engineer”, or “if you get a task right I will tip you $200”). Keep them limited to specific, concrete facts about the project. Don’t let models fill your AGENTS.md with pages of barely-reviewed text, for the same reason that you wouldn’t let them fill your codebase with pages of barely-reviewed code. Write your prompts yourself, and delete them whenever you get the chance. Almost every system you might get paid to work on is in this category (if not in the code of the system itself, then in its dependencies and libraries). Instead of just using AI to build a program. This distinction was a real pain when I was working on GitHub Models . Almost every system you might get paid to work on is in this category (if not in the code of the system itself, then in its dependencies and libraries). ↩ Instead of just using AI to build a program. This distinction was a real pain when I was working on GitHub Models . ↩

0 views