Posts in Ruby (20 found)
iDiallo 2 days ago

You paid for it, you should be comfortable in it

A friend of mine bought a Tesla Roadster back in the early 2010s. At the time, spotting a Tesla on the road was a rare event. Maybe even occasion enough to stop and take a picture. I never got the chance to photograph one, let alone drive one, until I met this new friend recently. This was my chance to experience the car firsthand. We walked to the parking structure to see it. As soon as he opened the door, something looked... off. On the outside, it was a pristine, six-figure roadster. But the inside looked completely custom. Not "custom" in the sense of a professional shop install, but more like the driver himself grabbed a hammer and chisel and made it his own. First, the driver's seat had been altered. It was much lower than usual and didn't match the passenger seat. My friend stands 6'7", and the Roadster is a tiny car. He physically couldn't fit, so he modified the seat rails to lower it. But that fix created a new problem: the door armrest now dug into his hip. So, he took a file to the interior panel, shaved it down, and 3D printed a smaller, ergonomic armrest. He even 3D printed a cup holder for the passenger side so his coffee was within reach. To me, the idea of taking a Dremel or a file to a $100,000+ car was unimaginable. You must be crazy to do it. He caught the look on my face and shrugged. "Hey, it's my car. I paid for it. I intend to be comfortable in it." I never thought of it like this. That sentiment stuck with me. Recently when I read an article by Kent Walters about filing the corners of his MacBook , those same feelings resurfaced. My work MacBook has edges so sharp that I've often felt like I was slicing my wrist on the chassis. I treated this as a design flaw I had to endure. But not Kent. He treated it as an obstacle to be removed. He literally filed down the corners of his laptop to ensure the machine he uses every day was comfortable. I may not have the guts to file my work issued MacBook, but I'm no stranger to customization... in software. I modify my tools constantly. I spend days tweaking my IDE, remapping keyboard shortcuts, and writing custom scripts until the software is unrecognizable to anyone else on my team. I don't think twice about rewriting a config file to make the tool fit my brain. When I was a kid, I always had a screw driver around, fixing a device that wasn't really broken. On the home computer, I modified everything. I once deleted all files to improve performance. It didn't work, but it led to a fruitful career. But somehow, when it comes to expensive hardware now, I freeze. I treat the physical object as a museum piece to be preserved. I bought a docking station to banish the laptop to a shelf, using an external mouse and keyboard to avoid touching the sharp chassis. I built a complex workaround to accommodate the tool, rather than performing the simple, brutal act of modifying the tool to accommodate me. We treat our physical tools as if they are on loan from the manufacturer. You'll see a musician buying a vintage guitar but refuses to adjust the action, terrified of ruining the "collector's value." Meanwhile, the working guitarist has sanded down the neck and covered it in stickers because it feels better in their hand. The software engineer accepts the default keybindings to avoid "bad habits," while the power user creates a layout that doubles their speed. If you own a tool, whether it's a car, a computer, or a line of code, you own the right to change it. The manufacturer designed it for the "average" user, but you are a specific human with specific needs. Remember grandma's couch in the living room? It had that plastic cover on it. It was so uncomfortable, but no one dared to remove it. The plastic was to preserve the sofa. No one got to enjoy it, instead everyone accommodated the couch only to preserve its value. A value that one ever benefits from. Don't let the perceived value of an object stop you from making it truly yours. A tool with battle scars is a tool that is loved.

0 views
Corrode 1 weeks ago

Cloudsmith

Rust adoption can be loud, like when companies such as Microsoft, Meta, and Google announce their use of Rust in high-profile projects. But there are countless smaller teams quietly using Rust to solve real-world problems, sometimes even without noticing. This episode tells one such story. Cian and his team at Cloudsmith have been adopting Rust in their Python monolith not because they wanted to rewrite everything in Rust, but because Rust extensions were simply best-in-class for the specific performance problems they were trying to solve in their Django application. As they had these initial successes, they gained more confidence in Rust and started using it in more and more areas of their codebase. CodeCrafters helps you become proficient in Rust by building real-world, production-grade projects. Learn hands-on by creating your own shell, HTTP server, Redis, Kafka, Git, SQLite, or DNS service from scratch. Start for free today and enjoy 40% off any paid plan by using this link . Made with love in Belfast and trusted around the world. Cloudsmith is the fully-managed solution for controlling, securing, and distributing software artifacts. They analyze every package, container, and ML model in an organization’s supply chain, allow blocking bad packages before they reach developers, and build an ironclad chain of custody. Cian is a Service Reliability Engineer located in Dublin, Ireland. He has been working with Rust for 10 years and has a history of helping companies build reliable and efficient software. He has a BA in Computer Programming from Dublin City University. Lee Skillen’s blog - The blog of Lee Skillen, Cloudsmith’s co-founder and CTO Django - Python on Rails Django Mixins - Great for scaling up, not great for long-term maintenance SBOM - Software Bill of Materials Microservice vs Monolith - Martin Fowler’s canonical explanation Jaeger - “Debugger” for microservices PyO3 - Rust-to-Python and Python-to-Rust FFI crate orjson - Pretty fast JSON handling in Python using Rust drf-orjson-renderer - Simple orjson wrapper for Django REST Framework Rust in Python cryptography - Parsing complex data formats is just safer in Rust! jsonschema-py - jsonschema in Python with Rust, mentioned in the PyO3 docs WSGI - Python’s standard for HTTP server interfaces uWSGI - A application server providing a WSGI interface rustimport - Simply import Rust files as modules in Python, great for prototyping granian - WSGI application server written in Rust with tokio and hyper hyper - HTTP parsing and serialization library for Rust HAProxy - Feature rich reverse proxy with good request queue support nginx - Very common reverse proxy with very nice and readable config locust - Fantastic load-test tool with configuration in Python goose - Locust, but in Rust Podman - Daemonless container engine Docker - Container platform buildx - Docker CLI plugin for extended build capabilities with BuildKit OrbStack - Faster Docker for Desktop alternative Rust in Production: curl with Daniel Stenberg - Talking about hyper’s strictness being at odds with curl’s permissive design axum - Ergonomic and modular web framework for Rust rocket - Web framework for Rust Cloudsmith Website Cian Butler’s Website Cian’s E-Mail

0 views
André Arko 1 weeks ago

Towards an Amicable Resolution with Ruby Central

Last week, three members of Ruby Central’s board published a new statement about RubyGems and Bundler , and this week they published an incident report on the events last year . The first statement reports that Ruby Central has now completed a third audit of RubyGems.org’s infrastructure: first by the sole remaining RubyGems.org maintainer , the second by Cloud Security Partners , and the third by Hogan Lovells. In all three cases, Ruby Central found no evidence of compromised end user data, accounts, gems, or infrastructure availability . I hope this can conclusively put to rest the idea that I have any remaining access to the RubyGems.org production systems, or that I caused any harm to the RubyGems.org service at any time. I also appreciate that Ruby Central is taking its share of responsibility, recognizing that its lack of communication with the former maintainers (including me) created confusion and frustration that contributed, in part, to how we ended up where we are today. Ruby Central board members Freedom, Brandon, and Ran state that their intent is now to work towards an amicable resolution. I salute their new commitment, and would like to do my part to help the RubyGems community move past these unfortunate events, with a resolution that puts the dispute fully behind us, and allows all of us to move forward. For my part, despite my claims against Ruby Central, and the threats they have directed against me, I am willing to completely settle all of my disputes with them, and pledge to take no legal action against Ruby Central regarding any of their actions prior to today. In exchange, I am requesting two things. First, I am asking Ruby Central to drop their legal threats, including releasing their claims against me and reimbursing my legal costs. Those costs arise from Ruby Central’s actions, including litigation threats, other escalations, and most recently contacting law enforcement. In addition to forcing me to retain counsel, these actions caused considerable stress and disruption. I am willing to provide invoices to ensure the reimbursement precisely matches only my actual costs. Second, I am asking Ruby Central lay our disagreement to rest with a public statement acknowledging that I did no harm to the RubyGems.org service. If Ruby Central fully drops their legal claims, and states I did not harm the RubyGems.org service, I would consider our disagreement amicably settled.

0 views
Chris Coyier 2 weeks ago

Hawai’i

I’m just back from the United States 50th state, a staggering 2,500 miles from the mainland. For the next week or two, I’ll pronounce it Ha-Vie-ee, like how it’s pronounced in the native Hawaiian language. A language, by the way, that only a few thousand people speak natively, no doubt due to the 91 years (1896-1987) where there was “strict physical punishment” for speaking it in schools. We humans are pretty damn uncool to each other sometimes. Ruby and I travelled there ( again! ) with some wonderful family friends, Matt, Becky, and their kids, Monroe and Zoey. A nice reminder of how rare and lovely it is to have a situation where the kids are friends, and the adults are friends, and everyone travels together well. We stayed in a villa at the Fairmont Kea Lani on Maui. I’ve been to Hawaii before, but this was my first time on Maui. It was a beautiful place to stay. A beautiful property and buildings right on the beach. The villa had two spacious rooms, a full kitchen, and a living room with a pull-out couch, on which all the kids slept together. I’ve stayed at fancy resorts before where the staff uses special greetings with guests. But in Hawaii, naturally, it’s “Aloha.” Probably because, ya know, a real word, and basically the whole brand of Hawaii. But I just can’t shake the feeling that it’s kinda cheesy. Like, do Hawaii long-timers say Aloha to each other? Like it’s 5:21 am and a local is getting a coffee at the gas station in a local neighborhood, do they say Aloha to the cashier? Do they get an Aloha back? I kept meaning to ask this of locals, but kept forgetting. Or not having the exact 1.5 beers in me it takes to reach that perfect level of fun and charm to ask strangers semi-intimate questions. If I were forced to guess, I’d guess Aloha is more of a thing they have to do at work with the tourists. Like your boss side-eyes you if you just say “Hello, good morning” instead. I never said it back, which felt weird. My goal was kind of a winkwink, it’s cool , you don’t have to do the cheesy tourist thing with me, I very promise I don’t care. The first night, we got checked in and b-lined it to Monkeypod . We’d all been there before (at a different location) and have talked about it endlessly. It’s a micro-chain with 4 locations across two islands. It’s just: great. They make a Mai Tai with Honey Liliko‘i Foam on the top which I have fond memories, and it was every bit as good as I remembered. I had wings and mahi-mahi tacos. 10/10. I never get the fish. I don’t like fish. I like specific little bits of seafood once in a while, but rarely cooked slabs of fish. So on that very first night, I decided I’d get fish every night on this trip. Maybe if I try enough of it, I’ll come around. It didn’t work. I struck out more times than I hit. But no big regrets. I tried. Timing-wise, it wasn’t the absolute perfect time to be in Hawaii. But it was spring break for our school district, so C’est la Vie. Unprecedented rain with some flooding. A rather ironic situation after the horrible fires just a few years back. We were watching the weather and reading the news weeks in advance, but things didn’t seem dire enough to cancel the trip. Honestly, some overcast weather isn’t the worst. None of us left with sunburns. It allows you to hang out outside all day, which you just can’t do on full-sun days, as it exhausts you. The first full day turned out to be one of the rainiest, and we spent most of it at the pool anyway. I got us a cabana that turned out to be awfully useful. Being in the pool in the rain is no big deal, but lying out in chairs in the rain is annoying. And you certainly can’t crack open the laptop or read a paperback. I did both that day and was loving it. We were trying to book an ATV tour for ourselves, and that was the one thing we just couldn’t get done. The rainstorms just weren’t letting it happen. Apparently, there was too much debris and whatnot on the trails; the places that offered these tours didn’t reopen until after we left. We started most mornings at the breakfast buffet, included in our fancy villa booking. It was pretty crowded as they couldn’t seat people outside in the wet. Then we’d hit the water without fail. A few days we did the ocean, but came to understand it really wasn’t a good time for that. Storms wash landcrap out to sea, making the water look muddy. Apparently, that’s worse than just looking ugly; it can harbor dangerous bacteria. The guy at 808 clothing told me that you’d have to be a real idiot to go out in it and that real Hawaiians would never. Last year, some lady had to have her legs sliced open to flush out the bacteria (or something? The guy was pretty weird). Also, later, our zip-line guide told us she loves to surf and wouldn’t go out because the “muddy” water is extra-attractive to sharks, since the low visibility helps them more than it helps their prey. Also later, we went to a surfing beach absolutely full of obviously local surfers. Turns out people don’t exactly speak for all people. We did some knee-deep ocean stuff because it’s hard to resist. One day we drove up to Paia, a northern coastal city with extreme charm. Unfortunately we got there when it was pouring pretty good, so we spent most of it hustling between store overhangs. You could really see how close to flooding everything can get, quickly. We mostly just did a little shopping, walking around, and snacking in Paia, and I didn’t take many photos there. It was super cute though, highly recommended. I sorta regret not buying a Ukulele bass from the music shop there as I’ve been eyeing one up like forever, ever since going on a trip with Brad Frost where he brought his. Which reminds me: we had the kids to Uke lessons at the Fairmont and it was kind of a mess. Probably skip that. The hostess at the bar we stopped at told us how to get down to the turtle beach nearby (Ho’okipa). It was really pouring when we got there, so we just parked for a while and watched the surfers. Really amazing to watch. Huge waves. The turtle beach didn’t disappoint! Hitting the pool was a daily event. The kids are old enough that we could shoo them out the door to the pool and not worry about it too much. Two of the kids had trackable wrist watches that could make calls, so that was extra convenient. There was a swim-up bar that I appreciated existing despite never getting around to using it. I did us the walk-up bar once, and the Zach Bryan impersonator bartended made me a cocktail despite it being almost an hour after it was supposed to close. He was being fawned over by two woman who wanted to make sure he had their number for later. I enjoyed that, naturally. Ruby’s favorite experience, and perhaps mine too, was the zip lining we did. We chose Haleakalā Zipline Tours as, well, it was open, and it’s location high up mid-island looked cool. It was. The two charming guides helped make it fun, showering us with bird-facts and about their conservation efforts. Ruby had to get over some fears of zip lining at all, which she did and of course now loves it. I left thinking of other zip lining we could to back home and hoping to see a ʻAlalā (Hawaiian crow). We hit Black Rock Pizza on the way home, my only non-fish dinner. The very last day, our friends moved on to another island, while we were hitting the redeye flight home. We had most of the day to kill, so we wandered around the property a bit, wandered some stores, then went to the local cinema to catch Project Hail Mary (fun!) and then off to the airport. Only a 5-hour flight back to Seattle compared to the 7-hour flight from Salt Lake City on the way there. We both slept a little and it went easy breezy.

0 views
Max Bernstein 2 weeks ago

Using Perfetto in ZJIT

Originally published on Rails At Scale . Look! A trace of slow events in a benchmark! Hover over the image to see it get bigger. Now read on to see what the slow events are and how we got this pretty picture. The first rule of just-in-time compilers is: you stay in JIT code. The second rule of JIT is: you STAY in JIT code! When control leaves the compiled code to run in the interpreter—what the ZJIT team calls either a “side-exit” or a “deopt”, depending on who you talk to—things slow down. In a well-tuned system, this should happen pretty rarely. Right now, because we’re still bringing up the compiler and runtime system, it happens more than we would like. We’re reducing the number of exits over time. We can track our side-exit reduction progress with , which, on process exit, prints out a tidy summary of the counters for all of the bad stuff we track. It’s got side-exits. It’s got calls to C code. It’s got calls to slow-path runtime helpers. It’s got everything. Here is a chopped-up sample of stats output for the Lobsters benchmark, which is a large Rails app: (I’ve cut out significant chunks of the stats output and replaced them with because it’s overwhelming the first time you see it.) The first thing you might note is that the thing I just described as terrible for performance is happening over twelve million times . The second thing you might notice is that despite this, we’re staying in JIT code seemingly a high percentage of the time. Or are we? Is 80% high? Is a 4.5% class guard miss ratio high? What about 11% for shapes? It’s hard to say. The counters are great because they’re quick and they’re reasonably stable proxies for performance. There’s no substitute for painstaking measurements on a quiet machine but if the counter for Bad Slow Thing goes down (and others do not go up), we’re probably doing a good job. But they’re not great for building intuition. For intuition, we want more tangible feeling numbers. We want to see things. The third thing is that you might ask yourself “self, where are these exits coming from?” Unfortunately, counters cannot tell you that. For that, we want stack traces. This lets us know where in the guest (Ruby) code triggers an exit. Ideally also we would want some notion of time: we would want to know not just where these events happen but also when. Are the exits happening early, at application boot? At warmup? Even during what should be steady state application time? Hard to say. So we need more tools. Thankfully, Perfetto exists. Perfetto is a system for visualizing and analyzing traces and profiles that your application generates. It has both a web UI and a command-line UI. We can emit traces for Perfetto and visualize them there. Take a look at this sample ZJIT Perfetto trace generated by running Ruby with 1 . What do you see? I see a couple arrows on the left. Arrows indicate “instant” point-in-time events. Then I see a mess of purple to the right of that until the end of the trace. Hover over an arrow. Find out that each arrow is a side-exit. Scream silently. But it’s a friendly arrow. It tells you what the side-exit reason is. If you click it, it even tells you the stack trace in the pop-up panel on the bottom. If we click a couple of them, maybe we can learn more. We can also zoom by mousing over the track, holding Ctrl, and scrolling. That will get us look closer. But there are so many… Fortunately, Perfetto also provides a SQL interface to the traces. We can write a query to aggregate all of the side exit events from the table and line them up with the topmost method from the backtrace arguments in the table: This pulls up a query box at the bottom showing us that there are a couple big hotspots: It even has a helpful option to export the results Markdown table so I can paste (an edited version) into this blog post: Looks like we should figure out why we’re having shape misses so much and that will clear up a lot of exits. (Hint: it’s because once we make our first guess about what we think the object shape will be, we don’t re-assess… yet .) This has been a taste of Perfetto. There’s probably a lot more to explore. Please join the ZJIT Zulip and let us know if you have any cool tracing or exploring tricks. Now I’ll explain how you too can use Perfetto from your system. Adding support to ZJIT was pretty straightforward. The first thing is that you’ll need some way to get trace data out of your system. We write to a file with a well-known location ( ), but you could do any number of things. Perhaps you can stream events over a socket to another process, or to a server that aggregates them, or store them internally and expose a webserver that serves them over the internet, or… anything, really. Once you have that, you need a couple lines of code to emit the data. Perfetto accepts a number of formats. For example, in his excellent blog post , Tristan Hume opens with such a simple snippet of code for logging Chromium Trace JSON-formatted events (lightly modified by me): This snippet is great. It shows, end-to-end, writing a stream of one event. It is a complete (X) event, as opposed to either: It was enough to get me started. Since it’s JSON, and we have a lot of side exits, the trace quickly ballooned to 8GB large for a several second benchmark. Not great. Now, part of this is our fault—we should side exit less—and part of it is just the verbosity of JSON. Thankfully, Perfetto ingests more compact binary formats, such as the Fuchsia trace format . In addition to being more compact, FXT even supports string interning. After modifying the tracer to emit FXT, we ended with closer to 100MB for the same benchmark. We can reduce further by sampling —not writing every exit to the trace, but instead every K exits (for some (probably prime) K). This is why we provide the option. Check out the trace writer implementation from the point this article was written. We could trace: Visualizations are awesome. Get your data in the right format so you can ask the right questions easily. Thanks for Perfetto! Also, looks like visualizations are now available in Perfetto canary. Time to go make some fun histograms… This is also sampled/strobed, so not every exit is in there. This is just 1/K of them for some K that I don’t remember.  ↩ two discrete timestamped begin (B) and end (E) events that book-end something, or an instant (i) event that has no duration, or a couple other event types in the Chromium Trace Event Format doc When methods get compiled How big the generated code is How long each compile phase takes When (and where) invalidation events happen When (and where) allocations happen from JITed code Garbage collection events This is also sampled/strobed, so not every exit is in there. This is just 1/K of them for some K that I don’t remember.  ↩

0 views
André Arko 3 weeks ago

How to Install a Gem

This post was originally given as a talk at SF Ruby Meetup . The slides are also available. Hello, and welcome to How To Install A Gem . My name is André Arko, and I go by @indirect on all the internet services. You might know me from being 1/3 of the team that shipped Bundler 1.0, or perhaps the 10+ years I spent trying to keep RubyGems.org up and running for everyone to use. More recently, I’ve been working on new projects: , a CLI to install Ruby versions and gems at unprecedented speeds, and gem.coop , a community gem server designed from the ground up so Bundler an can install gems faster and more securely than ever before. So, with that introduction out of the way, let’s get started: do you know how to install a gem? Okay, that’s great! You can come up and give this talk instead of me. I’ll just sit over here while you write the rest of this post. Slightly more seriously, do you know how converts the name that you give it into a URL to download a .gem file? It’s called the “compact index”, and we’ll see how it works very soon. Next, who in the audience knows how to unpack a .gem file? Do you know what format .gem files use, and what’s inside them? We’ll look at gem structure and gemspec files as well. Then, do you know where to put the files from inside the gem? Where do all of these files and directories get put on disk so we can use them later? Does anyone know off the top of their head? Once those files have been unpacked into the correct places, the last thing we need to know is how to require them. How do these unpacked files on disk get found by Ruby, so you can and have that actually work? This exercise was mostly to show that using gems every day actually skips over most of the way they work underneath. So let’s look at what a gem is, and examine how they work. By the end of this talk, you’ll know what’s inside a gem, where how RubyGems figures out what to download, and where and how that download gets installed so you can use it. And if you already everything we just talked about, please feel free to go straight to rv.dev and start sending us pull requests! First, we’re going to look at how the name of a gem becomes a URL for a .gem file. Let’s use as our example. Historically, there have been at least five or six different ways to look up information about a gem based on its name, but today there is one canonical way: the compact index. It’s so simple that you can do it yourself using curl. Just run , and you’ll be able to read the exact output that every tool uses to look up the versions of a gem that exist. Each line in the file describes one version of the gem, so let’s look at one line. We can break down that line with , and tackle each part one at a time. First, . That’s the version of that this line is about. So we now know for sure that exists. Next, a list of dependencies. The gem (version ) declares dependencies on a bunch of other gems: , , , , , , , , , , , , and . Each dependency has a version requirement attached, and for almost every gem it is exactly version , and only version . For , Rails is a little bit more flexible, and allows any version and up. The final section contains a checksum, a ruby requirement, and a rubygems requirement. The checksum is a sha256 hash of the .gem file that contains the gem, so after we download the gem we can check to make sure we have the right file by comparing that checksum. For this version of Rails, the required Ruby version is or greater, and the required RubyGems version is or greater. It’s up to the client to do something with that information, but hopefully you’ll see an error if you are using Ruby or RubyGems that’s too old. Great! So now we know the important information: Rails version is real, and strong, and is our friend. We can download it, and check the checksum against the checksum we were given in the info file line. Let’s do that now: Notice that the checksum produced by exactly matches the checksum we previously saw in our line from the info file: . That lets us know that we got the right file, and there were no network or disk errors. Now that we have the gem, we can investigate: what exactly is inside a gem? At this point, we’re going to pivot from the gem to the gem. There’s a good reason for that, and the reason is… the gem doesn’t actually have any files in it. So it’s a bad example. In order to show off what a gem looks like when it has files in it, we’ll use instead. So, we have our .gem file downloaded with curl. What do we do now? The first piece of secret knowledge that we need: gems are tarballs. That means we can open them up with regular old . Let’s try it. So what’s inside the .gem tarball is… another tarball. And also two gzipped files. Let’s look at the files first. As you might expect from its name, the file is a gzipped YAML file, containing checksums for the other two files. It’s maybe a bit silly to have multiple layers of checksumming here, but it does confirm that the outer layer of tarball and zip was removed without any errors. Okay, so what’s inside ? The answer is… Ruby, sort of. It’s a YAML-serialized instance of the class. We can see exactly what was put into this object at the time the gem was built. After snipping out the YAML that lists the dependencies (which we already looked at, because they are included in the info file), what’s left is some relatively simple information about the gem. Author, author’s email, description, homepage, license, various URLs. For the purposes of installing and using the gem, we care about exactly six pieces of information: , , , , , and . We’re going to combine those items with the files in the remaining file to get our unpacked and installed gem. Now that we know what’s in the gem specification, let’s look at what’s inside the data tarball. It matches up very closely with the long list of entries in the array in the gemspec. So now we have a bunch of files. Where are we going to put these files? Enter: the magic of RubyGems. The scheme that RubyGems has come up with is largely shaped by the constraints of how Ruby finds files to require, which we’re going to look at soon. For now, it is enough for us to know that RubyGems keeps track of a list of directories, a lot like the way works for your shell to find commands to run. To find the current directory, you can run . Here’s what that looks like: From this list, we can see that RubyGems organizes its own files into a few directories. To install a gem, we’re going to need to put the files we have into each of those directories, with specific paths and filenames. Just to recap, the files we need to place somewhere are: So let’s move the files into the directories we see RubyGems offers. First, cache the .gem file so RubyGems doesn’t need to download it again later: Then, add the gem specification so that RubyGems will be able to find it. There’s a small twist here, which is that the directory doesn’t contain YAML files, it contains Ruby files. So we also need to convert the YAML file back into a Ruby object, and then write out the Ruby code to create that object into a file that RubyGems can load later. Next, we need to put the files that make up the contents of the gem into the directory. One more thing we need to do: set up the executables provided by the gem. You can check out the files that RubyGems generates by looking in , but for our purposes we just need to tell RubyGems what gem and executable it needs to run, so we can do that: And with that, we’ve installed the gem! You can run the file that we just created to prove it: As we wrap up here, there are three aspects of gems that we haven’t touched on at all: docs, extensions, and plugins. We don’t have time to talk about them today in this meetup talk slot. Hopefully a future (longer) version of this talk will have space to include all of those things, because they are all super interesting, I promise. In the meantime, I will have to direct you to the docs for RDoc to learn more about docs, to the source code of or RubyGems itself if you want to learn more about gem extensions and plugins. There’s one last thing to figure out before we wrap up: how does find a gem for us to be able to use it? To explain that, we’ll have to drop down to some basic Ruby, and then look at the ways that RubyGems monkeypatches Ruby’s basic to make it possible to have gems with versions. The first thing to know about is that it works exactly like does in your shell. There’s a global Ruby variable named , and it’s an array of paths on disk. When you try to require something, Ruby goes and looks inside each of those paths to see if the thing you asked for is there. You can test this out for yourself in just a few seconds! Let’s try it. The Ruby CLI flag lets you add directories to the variable, and then the function looks inside that directory to find a file with the name that you gave to require. No magic, just a list to check against for files on disk. Now that you understand how the variable makes work, how does RubyGems work? You can’t just put ten different versions of into the and expect to still work. RubyGems handles multiple versions of the same file by monkeypatching . Let’s look at what happens when we , which is a file located inside the gem that we just installed. RubyGems starts by looking at all of the gem specifications, including the one we saved earlier. In each specification, it combines the name and version with the values in to come up with a path on disk. So for our just-installed gem, that would mean a path of: . RubyGems knows that directory contains a file named , so it is a candidate to be “activated”, which is what RubyGems calls it when a gem is added to your . As long as internal bookkeeping shows that no other versions of have already been added to the , we’re good! RubyGems adds this specific directory to the , and delegates to the original implementation of . Require finds the file at , reads it, and evaluates it. With that, we’ve done it! We have found, downloaded, unpacked, and installed a gem so that Ruby is able to run a command and load ruby files, without ever touching the command. If you’re interested in contributing to an open source project that works a lot with gems, we would love to work with you on , where we are working to create the fastest Ruby and gem manager in the world. And of course, if your company could use faster, easier, or more secure gems for developers, for CI, and for production deployments, we can help. We’d love to talk to you and you can find our contact information at spinel.coop . railties-8.1.3.gem (the .gem file itself) metadata.gz (the YAML Gem::Specification object from inside the gem) the unpacked data.tar.gz files (the contents of the gem)

0 views
Giles's blog 1 months ago

Writing an LLM from scratch, part 32e -- Interventions: the learning rate

I'm still working on improving the test loss for a from-scratch GPT-2 small base model, trained on code based on Sebastian Raschka 's book " Build a Large Language Model (from Scratch) ". In my training code, I have this code to create the optimiser: The values in there -- for the learning rate, and for the weight decay -- were just copied from the tiny training run that we do in section 5.2 of the book. What do those values actually mean, and are those really the right values for them? I felt I had a good handle on the learning rate, at least -- it's one of the first things you learn when you start looking at machine learning of any kind -- but how would you go about working out what the correct value for it was? On top of that, when I was reading the Chinchilla paper a while back, I noticed they repeatedly referred to a "cosine cycle" for the learning rate, which didn't fit into anything I'd learned about before. The weight decay was pretty much an unknown for me -- I know it is a parameter controlling the behaviour of the optimiser, but I don't know how it does that. In this post I want to look into the learning rate, and these mysterious cosines; I'll write a follow-up about the weight decay later. If you're reading this blog, you almost certainly know what the learning rate is, but let's go over it briefly to build a solid foundation. The way it's normally explained, using simple gradient descent, goes something like this. Let's assume that we're training a model with just one parameter, and it starts off set to − 5 . We run some training data through, and get a loss, let's say 44.44: We don't know what shape our loss curve is (if we did, we might be able to find the lowest loss algebraically), but we do know the differential of the parameter versus the loss at the point we've measured; it happens to be -13. That is reasonably large and negative: We use that information to say that we want to move in the direction of a larger value for our parameter -- that is, in our case where the gradient is negative, so we have a downhill slope towards the right, we want to increase the parameter to move rightwards on that chart, whereas if it were positive (an uphill slope) we'd want to decrease the parameter to move leftwards. Simply subtracting the gradient from the parameter would lead to an update in the right direction, but it would be a very large one in this case -- we'd move 13 units to the right -- so we multiply the gradient by a small positive number, the learning rate (often written as a lower-case eta, like this: η ), to move a small distance in that direction. Let's say η = 0.3 . That means we want to update our parameter: So now we run that through and get a new loss -- let's say it's 9.06 -- and a new gradient, which happens to be -5.2. Now we can do another update, and our parameter will become 0.46, so we use that and work out another loss and gradient, which come to 3.3816 and -2.08. Let's plot that one, but this time we'll draw back the veil and show the actual loss curve. Now, it's worth reiterating that while we're training this model we don't know what that curve looks like -- we're just finding points on it, along with its gradient at those points, and using that information to work out which parameter value to explore next. But it's pretty clear that as we continue, if the learning rate is set correctly, we'll get to the minimum eventually if the learning rate is the right kind of size, because -- due to the nice smooth U-shape of the curve, the gradient gets smaller the closer we get to the minimum 1 . It's also pretty clear that if the learning rate is smaller than an optimal value, in this simple case we will still find the right point, but it will take more steps because each one is smaller: And, of course, if the learning rate is too high, we might never converge -- we'd "bounce out of" the dip, and wind up with a parameter value that endlessly cycles between increasingly smaller and increasingly larger values, zooming off to infinity: OK, that's the basics. Why might we want to change from something that seems so logical and simple? A few paragraphs back I said: due to the nice smooth U-shape of the curve, the gradient gets smaller the closer we get to the minimum What if it doesn't? Imagine if we had something more like a V-shaped curve, like this: The gradient does not decrease as we get closer to the minimum, and so while we're in the downward-sloping part, each update is exactly the same distance: Now, eventually we'll jump over the minimum: In this example, I've used a gradient of − 8.33 on the downward-sloping part of the curve, and + 8.33 on the upward-sloping part, so that means that our next update just bounces us back to where we were before! Because the gradient isn't decreasing the closer we get to the minimum, we wind up just oscillating around it. That's not very helpful. That's a slightly contrived example (though not entirely -- intuitively, with functions like ReLU or GELU in our real LLMs, it's easy to imagine crazy loss landscapes). But it does show that perhaps we might want to add in our own "artificial" way to decrease the size of the steps we take over the course of training our model rather than just relying on the gradients naturally flattening out for us. Another way of looking at things is that as the model gets trained, we don't want batches of very new-looking data to cause big updates, taking us away from what was a good part of the loss landscape in terms of what we've seen so far. For example, imagine you've been training an LLM on a bunch of documents, which have so far been in English. Halfway through, it encounters a document in Byzantine Greek, the loss skyrockets, and you do a big update. That would be a problem! You might want it to learn a bit from it to push it slightly in a "the world is multi-lingual" direction, but you don't want it to lose a big chunk of the value from its previous training. You might also see a kind of connection to the way that people learn over the course of their lives -- for babies, everything is new and they "update their parameters" constantly as they try to understand the world. Children are still pretty flexible, but as we get older we tend to update our beliefs less and less. That's not always optimal, but as a heuristic it's pretty adaptive. Anyway, in general: for most training runs, we're going to want the learning rate to adjust over time. Most of the time this will be by reducing it, though there can be cases for increasing it again for periods. The general case of doing this is called "learning rate scheduling". There are a bunch of ways that people adjust the learning rate over the course of a train; here are a few that cropped up a lot while I was researching this. If we want the learning rate to go down over time, and we know how many steps we're training for, we can just set it to (say) 0.0004 for the first quarter of our train, then 0.0002 for the next, then 0.0001, then finish off with 0.00005, like this: That can work pretty well! But there is one obvious oddity -- the big step changes in learning rate mean that the exact placement of the drops and the training data before and after can matter. Why are we treating the data and the state of the model immediately before and immediately after so differently? It would make more sense to have a smoother schedule. What functions decay smoothly like that? An exponential curve does: let's say we just multiply the learning rate by a number that is a little smaller than one every step, so that it drops smoothly like this: But there are lots of other curves like that, and one is particularly interesting: As you change θ from 0 to π , the value of cos θ goes smoothly from 1 to − 1 , so it's easy enough to rescale that so that our learning rate follows the same curve: This is called a "cosine annealing" or "cosine decay" schedule, and was apparently inspired by the algorithms used for simulated annealing (an optimisation algorithm that was in turn inspired by how the atomic structures form in metals as they cool -- another one for the list of things to look into in the future...) That solves the mystery from earlier: the cosine that the Chinchilla paper was talking about was exactly this. As it turns out, the cosine decay scheduling curve is quite popular in deep learning, because it has what amounts to two well-defined phases -- an initial high learning rate where lots of exploration of the loss landscape can happen, followed by a smooth transition to something more like fine-tuning to optimise the location in whatever part of the loss landscape we've wound up in. Now, all of the above are assuming that we want the learning rate to start high and finish low, so that we can mimic the textbook gradient descent that we had at the start of this post. Intuitively that feels nice, but on further thought, the important thing is really that we have a low learning rate at the end of the train, so that we can find as close a point as possible for the minimum at the part of the loss landscape we've found ourselves in. But perhaps there's a case for having both high and low periods during the train, so that we don't get stuck in a local minimum -- something to jolt us out of where we were every now and then? 2 With a step function, that's easy: you could, for example, do this: With an exponential, you could do something like this: With cosine decay, of course, things are even easier, because the cosine function is inherently cyclical, so we can just do this: However, at least for our purposes, training an LLM using a Chinchilla-optimal number of training tokens, it makes sense to be guided by what the authors of the Chinchilla paper did. Appendix B says: We find that setting the cosine cycle length too much longer than the target number of training steps results in sub-optimally trained models, as shown in Figure A1. As a result, we assume that an optimally trained model will have the cosine cycle length correctly calibrated to the maximum number of steps, given the FLOP budget; we follow this rule in our main analysis. So, at this point, I think we have one important part of the intervention we want to make: we want to use a cosine learning rate scheduler, going from high near the start of the training run, down to low at the end over one cycle. Additionally, and also from appendix B in the paper: we use a 10x learning rate decay in line with Rae et al. (2021) ...which means that if our learning rate starts at η , then we want it to decay down to η / 10 by the end. So, we just need to work out an initial value for η , and let it rip, right? Well, not so fast... When our model is uninitialised, right at the start of the train, gradients are going to be pretty wild. It's going to be making random errors all of the time, and we'll be making huge jumps across the loss landscape. That sounds bad. Additionally those kind of wild jumps can get the optimiser into a -- well, sub-optimal -- state. I haven't read enough about optimisers yet to have a solid handle on that, but that can wait -- intuitively it makes some kind of sense that erratic gradient updates might confuse it. So, it makes a certain amount of sense to start off with a low learning rate so that we don't do that, and then to increase it gradually to the peak, and only then to schedule the gradual cosine decay. According to this (rather nice looking) masterclass on LLM training , it's typical to do this over "a few thousand steps or a small percentage (e.g., 1-10%) of the total training steps, depending on the dataset size and batch size", and we would just use a linear increase over that period: I think we should do that; a simple linear warmup at the start -- let's relatively arbitrarily say 5% of our training steps going up to our desired peak learning rate. So our learning rate schedule should look something like this: So far I've written a lot about how we vary the learning rate over time, and that's all been very useful. But we still need to know what the value should be initially! In smaller-scale experiments you might just try a bunch of different numbers to see what worked well, but at more than US$30 per train, that's not practical here. Unfortunately it's really quite hard to find good suggestions published anywhere. The GPT-2 paper is (as usual) reticent: The learning rate of each model was manually tuned for the best perplexity on a 5% held-out sample of WebText ...and if you search for "learning rate training llm", you'll see lots of results for when people are fine-tuning existing LLMs ( 2 × 10 − 4 comes up a lot), but almost nothing about when you're training one from scratch. I eventually came across this (long!) post from Hugging Face , which I definitely need to spend time going through in the future, because it covers a lot of the ground I've been going over in this post series. But for this post, I think the most relevant part is in the section " Scaling Laws for Hyperparameters ", where they include a figure from this DeepSeek paper . Here it is, with some of the (also relevant) surrounding text: In our trains we're using something like 5 × 10 18 total FLOPs. Now, they are specifically charting things in terms of non-embedding FLOPs, but I'm going to play a little fast and loose here and ignore that, so reading off their chart, that looks like we should be using about 1.4 × 10 − 3 as our learning rate. We can double-check that against their formula, where C is the compute budget: Nice, a close match! However, it's definitely worth noting that we're using a simple GPT-2 architecture, and they are using something quite different -- RMSNorm instead of LayerNorm, SwiGLU as the activation function on the feed-forward networks, Rotary Position Embedding rather than the fixed ones we're using, and so on. As a sanity check: you can see that they also give a formula for the optimal batch size in terms of tokens. For our FLOP budget, that comes in at 381,782, which is about 373 of our 1,024-token sequences. That is quite a lot higher than the 97-or-so sequences that we appeared to be optimal in our earlier experiments . That is a little concerning, though of course the 97 number came out of a very ad-hoc bit of curve-fitting. For now, I'm going to hope that that doesn't matter too much for the learning rate. This may come back to bite me; if the results of a train with 1.4 × 10 − 3 are radically worse than the existing rate of 4 × 10 − 4 , I'll have to do a bit more investigation. So, now I think we have all of the theoretical pieces in place to do a train. Let's move on to the practicalities. We started by looking at this: What should we change -- disregarding the until the next post? Based on the above, we want to do a linear warmup of about 5% of our steps, going up to a learning rate of 1.4 × 10 − 3 , followed by a cosine decay down to one tenth of that, 1.4 × 10 − 4 . What does that look like in code? The relevant API for scheduling the learning rate in PyTorch is, logically enough, in the module, and there are a bunch of different scheduling classes. You create your optimiser, then create a scheduler for the shape you want, and then you can call on the scheduler (after the on the optimiser) to adjust the optimiser's learning rate over time. Let's make that more concrete; one of the schedulers is , which is what we'll need for our linear warmup period. It takes as its parameters: Let's say that we want to go from almost-zero to our optimiser's learning rate over 1,600 steps -- we'd create our scheduler like this: ...then in our training loop, after we've done the scaled step of the optimiser, we'd also step the scheduler: This confused me a little bit the first time I saw it; after all, if the scheduler hasn't been "triggered" when we step the optimiser, how does the optimiser know what learning rate to use? Surely it would just use whatever it was initialised with? The answer is that when you create the optimiser, it stores away the learning rate that you give it in two places -- an "initial learning rate" and a "current learning rate". Next, when you create your scheduler, it uses the initial learning rate to work out the start and end values, and then sets the current one to the start value immediately. Just by creating a scheduler, you're changing the optimiser's current learning rate -- but not the initial one, which is important, as we'll see in a moment. So, we have a scheduler that handles our warmup period nicely. Another scheduler that's relevant to our interests is the CosineAnnealingLR . This takes: On creation, this scheduler will read in the optimiser's initial learning rate -- note, not the current one -- and then the first time it's stepped, it will set the current learning rate to that value, and then for steps after that it will reduce it so that it follows a nice cosine decay, reaching after steps. So those two cover the two regimes that we want -- the warmup and then the cosine decay. But now we need to put them together; we want to do one and then the other. There's a very useful class, , which allows you to chain schedulers and tell it when each one takes over from the previous one. Let's sketch out some code to use that to do a train with our new peak learning rate of 1.4 × 10 − 3 , a warmup of 1,600 steps, followed by a cosine decay for the next 32,000 steps to one tenth of the peak learning rate: That actually works quite nicely! I wrote a dummy training loop to plot the current learning rate over a fake train using code like the above , and got this: ...with the output confirming that the values were good at the "milestone" point, the start and the end: I was initially a bit surprised by that, as at the time I ran it, I didn't realise that there was that split between the initial and the current learning rates on the optimiser, so I thought that the cosine scheduler would pick up whatever tiny starting value the warmup scheduler had overwritten the optimiser's learning rate with -- but that split saves the day. That means that now we have the outline of how to schedule our learning rate. But before we can put that into the code, we need to think about how it affects our checkpoints. Just like the scheduler and the optimiser, the learning rate scheduler -- or, indeed, our two schedulers here -- contain information about the state of the train. That means that if we recover from a checkpoint, we need to provide them with the information they need. If we just created them afresh, they'd start from the beginning -- for example, if we restarted from step 20,000 in a train like the one above, we'd start a new warmup from pretty much zero, and then start a fresh cosine decay. That would be bad: (Dummy test code here .) Now, we could use the parameter to initialize them with the correct current global step. But they have a state dict, like most other PyTorch objects, so the simplest thing to do is just to write that to another checkpoint file: ...and then load it likewise: (Dummy test code here .) Conveniently, if you save the state dict of a , it will also include the state of all of its component schedulers, and likewise if you reload it, it will load the components' states back in too. The one thing you have to be careful about is what they warn about in the PyTorch docs: Initializing a scheduler overwrites its optimizer’s s. When restoring a checkpoint, initialize the scheduler before calling your optimizer's to avoid overwriting the loaded learning rates. Luckily enough, in our code as it stands, we create all of the things that are checkpointed -- the optimiser and the scaler so far, but shortly the scheduler as well -- before we load in the state dicts, so that drops out quite nicely. So, we have some sketched-out code -- it's time to put it in place for the real training run. I won't go through the details of the changes to my existing DDP training code, though you can see the diff here if you're interested. Much of the complexity was due to keeping backward compatibility so that we don't have to always use a learning rate scheduler; remember that in this mini-series, I'm trying making various changes ("interventions") to the training loop in isolation, seeing whether each one improves things. So it's important to be able to easily train with or without learning rate scheduling; I did that with a flag in the Implementation-wise, initially I was thinking that it would be easiest to always have a scheduler, and in the "non-scheduled" case to just set it to a linear one that didn't change the value over the course of the train. But in the end it turned out to be easier to use as being the switch to tell the training loop which "mode" it was in. The placement of the code to create the schedulers was also a little tricky; the "natural" place was just after the optimiser is created, like it is in the example code above. However, at that point, we don't know how many global steps we're going to have in the train, because we don't have the dataset -- which means that working out the numbers to pass in to the schedulers for the warmup and decay steps would be impossible. It turned out to be easiest to put it in the function , just after the datasets are loaded, as at that point we have all of the information we need. Anyway, that's the code done, so let's see what happens! I wanted to do two trains; one with the learning rate scheduling, and one with just the new value for the learning rate, instead of . I was expecting the updated learning rate alone to be too high and to cause a very choppy train, but had high hopes for the train with the scheduling. Here's how it did; the scheduled learning rate train first: Here's what the training loss looked like over that: Quite a few loss spikes early on in the train when the learning rate is at its peak, but nothing unmanageable -- and, as you'd expect, things calmed down quite a lot later on. I also charted the learning rate, to make sure it really was doing what I thought it was doing: So, a pretty smooth train, and we definitely did the right learning rate scheduling. Time to upload it to Hugging Face , and see what the evals look like. Firstly, the smoke test: Reasonably coherent, at least, though it's not super-impressive. On to the loss on our test set: That's our best loss so far! Let's put it into the table: So, it definitely looked like it was worth it. But was it the scheduling of the learning rate that helped, or just the change from 0.0004 to 0.0014? I kicked off a second run with no scheduling, just a learning rate of 0.0014, to see what would happen. After about an hour, I noticed that the loss chart had stopped updating. The last point had a maximum and minimum loss but no average -- but after that, nothing: However, the learning rate was still being charted, so the train was definitely running: Looking at the checkpoint metadata showed what had happened. At global step 1851, we had this 3 : ...and at the next checkpoint at step 2468, we had this: ...and the same for all checkpoints thereafter. Clearly the parameters had gone off the rails -- exactly what we'd expect with an excessive learning rate: There was no point in continuing the train, as it was pretty much certainly unrecoverable, so I stopped it. Out of interest, I downloaded the model, but I couldn't even run the smoke test on it: So it was pretty clear that just updating the learning rate to 0.0014 was actively harmful. No need to upload that one to HF! And time to wrap up this experiment. While this has been quite a long post, I've really only scratched the surface of how learning rates are set. If I were doing things in more detail, the best would probably be to do a "sweep" over multiple values to try to at least approximate the best possible rate for this model. That would be pretty expensive for me, though, so I decided to stick with the DeepSeek number. It might not be ideal for the specific architecture that I'm using, given how different that is to theirs, but given the results, it's a decent one compared to what I was using. 4 Something that I found interesting is that exactly how to schedule your learning rate is still an area being actively researched. Even in my relatively minimal research, I came across three alternatives to the mainstream warmup-cosine decay pattern: I'm sure there are many more. But for this train, I decided to stick to the mainstream, and the results were pretty good! To reiterate, this has been the most positive intervention so far: So I'll stick with that, and move on to the next thing: what is the parameter that we're passing in to the AdamW optimiser? Tune in next time :-) Yes, I am foreshadowing here.  ↩ To make my earlier analogy about learning rate decaying over time in people as they age even more dubious, we can imagine this as being rather like someone middle-aged going on an ayahuasca retreat ;-)  ↩ If you're wondering how we had a valid maximum and minimum in that first checkpoint when the average was NaN, here's why: You might wonder how large labs work out the right learning rate given their training runs run to millions of dollars. The answer is there in that DeepSeek paper, as that's one of the things they were doing. They scaled their model down from the billions of parameters that they wanted to train to various smaller models, and worked out the optimal learning rate for each of the smaller models by doing full trains on them. Once they had a mapping from model size to the ideal learning rate for their architecture, they could extrapolate that to the large ones that they wanted to train. The problem is that those "smaller" models are actually quite a lot larger than the one we're training here! And while we could potentially scale it down even further, I suspect that such truly tiny models (say, 1M parameters) wouldn't train well enough to give any meaningful results.  ↩ From the paper: Specifically, the learning rate of the model reaches its maximum value after 2000 warmup steps, and then decreases to 31.6% of the maximum value after processing 80% of the training tokens. It further reduces to 10% of the maximum value after 90% of the tokens. , which is the optimiser we're applying it to. , which the optimiser's learning rate is multiplied by to work out where we want to start up. , which is likewise applied to the optimiser's learning rate to work out the value we're heading for. , which is the number of steps over which it should go from the initial learning rate to the final one. , which lets the scheduler know how many steps into its schedule it currently is -- this defaults to , meaning it hasn't started yet. This can be useful if you're resuming from a checkpoint, but for our purposes we can ignore it. , which is the same as the 's. , which is the number of steps before it reaches its minimum , the minimum learning rate we want to get to. , again the same as the 's. Per the Hugging Face paper, some people do warmup, then pause at a set level for a while, then start the cosine decay (warmup-stable-decay). DeepSeek use a relatively simple stepped function after a warmup. 5 I came across a 2025 paper " Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs " which says that a linear decay (after a warmup) outperforms cosine. Yes, I am foreshadowing here.  ↩ To make my earlier analogy about learning rate decaying over time in people as they age even more dubious, we can imagine this as being rather like someone middle-aged going on an ayahuasca retreat ;-)  ↩ If you're wondering how we had a valid maximum and minimum in that first checkpoint when the average was NaN, here's why: ↩ You might wonder how large labs work out the right learning rate given their training runs run to millions of dollars. The answer is there in that DeepSeek paper, as that's one of the things they were doing. They scaled their model down from the billions of parameters that they wanted to train to various smaller models, and worked out the optimal learning rate for each of the smaller models by doing full trains on them. Once they had a mapping from model size to the ideal learning rate for their architecture, they could extrapolate that to the large ones that they wanted to train. The problem is that those "smaller" models are actually quite a lot larger than the one we're training here! And while we could potentially scale it down even further, I suspect that such truly tiny models (say, 1M parameters) wouldn't train well enough to give any meaningful results.  ↩ From the paper: Specifically, the learning rate of the model reaches its maximum value after 2000 warmup steps, and then decreases to 31.6% of the maximum value after processing 80% of the training tokens. It further reduces to 10% of the maximum value after 90% of the tokens. ↩

0 views
Alex White's Blog 1 months ago

Are Design Tools Relevant Anymore

I was a product designer for a few years. I had switched careers to design after suffering burn out as a software engineer. During those years, my entire day was spent in Figma, building high fidelity mockups, leading workshops and creating prototypes. While Figma helped me move quickly, rapidly iterating after receiving user feedback, the engineer part of me always felt it was a throwaway step. You build something, only to then have somebody else build it again in code. I recently had to put on my design hat again, putting together interactive prototypes around a few redesign ideas. At first, I reached for Figma, but after fiddling around for an hour, decided to go a different route. While prototyping in Figma used to be faster than building in code, that’s no longer true. With Claude Code, building out frontend components is fast . Much faster than messing with layers, frames and symbols in Figma. Let me explain. Enterprise apps have well defined brand guidelines. Colors, type, scale. They are often built off an existing component library (think Bootstrap, shadcn). This means you can use Claude in a way that follows the look and feel of your application, and is constrained to the components the development team leverages. The rails help keep Claude from going off into the deep end. Design then becomes focused on solving the user’s problem through UX, less fiddling around with UI. I can open Freeform on my iPad, sketch something out, and prompt Claude to leverage our foundation to make my sketch a reality. Then, I can dig into the code and tweak things to be just right. The result is a more interactive, true to life prototype that gives your engineering team a head start with coded components. You get better feedback from users and stakeholders as it’s easier to visualize what the final product looks like. You discover pitfalls that might not have shown up until an engineer was halfway into the card. On top of all that, you move a lot faster, you’re designing and building in 1 step rather than 2, giving your engineering team a head start once designs are finalized. So then, what’s the point of Figma and Sketch? You can tell Figma is battling with this reality by pushing Figma make. The issue is, it’s too constrained and produces poor results. You can’t link it to existing coded components, Tailwind configs, etc. On the other hand, usin my approach requires a technical background. You need to guide with framework suggestions, foundational setup and be able to takeover and tweak yourself. That said, there in the shorter term there’s likely still a place for Figma and Sketch at the table. Designing using the method I talked about requires a technical background, otherwise your results will be all over the place, and small tweaks will be next to impossible. As the technology gets better though, I’ll be surprised if Figma and Sketch survive the next couple of years.

0 views
André Arko 1 months ago

Four months of Ruby Central moving Ruby backward

From the moment RubyGems was first created in 2004, Ruby Central provided governance without claiming ownership , to support the Ruby community. Providing governance meant creating processes to provide stability and predictability. Avoiding ownership meant allowing the community to contribute, to the point where unpaid volunteers created and controlled the entirety of RubyGems.org for many years. Last year, Ruby Central flipped that successful formula on its head . They now claim ownership of both Bundler and RubyGems, but refuse to provide governance . Ruby Central now claims sole control over all code and decisions, despite paying for only a few percent of the work required to create and sustain the projects across 22 years. Instead of providing stable and predictable processes, Ruby Central suddenly hijacked the Bundler and RubyGems codebases away from the existing maintainers, shut out the community, and started issuing the threats to sue. When confronted by the former maintainers after the hijacking, Marty Haught of Ruby Central stated (in a recorded video call) on September 17 that “yeah, we shouldn’t have changed that”. On September 18, Marty went on to write: In the past, we’ve made the mistake of conflating ownership of the code with ownership of the infra, and vice versa, and we’d like to straighten this out so that we aren’t put in a legal bind that requires us to take control of the entire codebase when, we all agree, that is not proper or correct given the existing model. In the words of Ruby Central itself, “we all agree, [taking control of the entire codebase] is not proper or correct.” Since the beginning of this conflict, Ruby Central has privately admitted it was wrong to hijack the GitHub organization and steal the repos, but has refused to acknowledge this in public. Unfortunately, despite privately admitting their actions were wrong, Ruby Central has publicly continued to dig their hole deeper. Instead of owning up to their mistake, they secretly negotiated a deal with Matz for ruby-core to take over the stolen RubyGems and Bundler repository, further violating the project governance policies. If this situation were just about me personally, I could believe it sprang from from individual disagreements. Ruby Central claims they had good reasons to unilaterally kick me out of the project, even though I don’t think their claims hold water . With that said, regardless of what you think about me personally, the other five long-term maintainers have never gotten any explanation of why they were suddenly kicked out or bypassed entirely, all in violation of existing project governance. In her only public interview about the situation, Ruby Central Executive Director Shan Cureton defended stealing Bundler from its team of fifteen years by saying the removed team “didn’t need to have the story, and it wasn’t their story to have”. Ruby Central has made their position clear: if they steal your project, you are not entitled to know their reasons , and neither is anyone else. There is nothing “community-oriented” about stealing the most-used gem in Ruby and refusing to share your reasons with the community. Despite Ruby Central’s unacceptable treatment of both projects and maintainers, the former RubyGems and Bundler team said we want to move Ruby forward . We offered Ruby Central a path to move past their illegitimate GitHub takeover, past their vicious personal attacks, and past their threats to sue us. It has been four months since we made that offer, and Ruby Central has not accepted . While declining to accept our offer, Ruby Central has nonetheless found the time to propose new governance documents for RubyGems . In those documents, they explicitly require existing maintainers approve adding or removing team members. That rule was already present in the previous governance, and is the exact rule that Ruby Central violated to execute their takeover . When asked why they violated the previous governance, and why the new governance would be any more trustworthy, Ruby Central refused to respond substantively, and then the question itself was hidden by marking it “off topic” . Instead of working to resolve the situation, Ruby Central has spent 4 months rejecting requests for an explanation, while repeatedly threatening to sue me personally. After Ruby Central suddenly took over the Bundler repo, I sent them a standard trademark notice. They replied with a threat to sue me. When I later informed Ruby Central I had learned they violated state employment law, they simply replied with the same threat to sue me again. They are threatening to sue me for “hacking” them, despite their own analysis publicly concluding “no evidence that user data or production operations were harmed” . Without seeking common ground, or even looking for some sort of resolution we can just live with and move on from, Ruby Central has offered all of us — nothing . Ruby Central has made no offer in reply to outreach from the other five maintainers. To me, after four grueling months of private “negotiation”, their entire offer is nothing more than to refrain from suing. But only if I agree to everything that they want. They say I must agree that I have no claim on the name Bundler, despite helping create it and leading the Bundler team for the last 15 years. They say I must agree I was paid legally and fairly, when California law clearly states I was not. They say I must agree that Ruby Central can take over open source projects they host, any time they feel like it, with no explanation, and no consequences. I don’t agree. Letting this situation stay unaddressed sets a dangerous precedent for all open source projects written in Ruby. Ruby Central has resolved nothing. Don’t let their delaying tactics convince you otherwise. The Ruby community cannot trust Ruby Central with control over our gems until there is accountability for destroying the very governance they were supposed to be providing . Until accountability arrives, take action . Tell Ruby Central they owe everyone an explanation for violating the project governance around six long-term maintainers, not just me. Don’t sponsor, attend, or speak at RubyConf. Contribute to projects that aren’t controlled by Ruby Central. The exiled maintainers are working on new projects, with a focus on clear governance, long-term financial sustainability, and community input: Join the gem.coop beta, and stop using RubyGems.org. Use jwl instead of RubyGems. Use or Ruby Butler instead of Bundler. A better world is possible! Ruby Central might want to keep Ruby in the past, but we can work together to build Ruby a future .

0 views
Chris Coyier 1 months ago

FOREVERGREEN

In the first few minutes, Ruby says to me, “ This is like The Giving Tr ee “, and by the end, I was like, “ OK, you’re right .”

0 views
(think) 1 months ago

How to Vim: Build your .vimrc from Scratch

People often think that getting started with Vim means spending hours crafting an elaborate with dozens of plugins. In reality, modern Vim (9+) and Neovim ship with remarkably sane defaults, and you can get very far with a configuration that’s just a few lines long – or even no configuration at all. If you launch Vim 9 without a file, it automatically loads – a built-in configuration that provides a solid foundation. Here’s what you get for free: That’s actually a pretty reasonable editing experience out of the box! You can read the full details with . Neovim goes even further with its defaults – it enables (copies indentation from the previous line), (highlights all search matches), (makes Tab smarter at the start of a line), (reloads files changed outside the editor), always shows the statusline, and sets the command history to 10000 entries, among many other things. If you’re on Neovim, the out-of-the-box experience is excellent. See for the full list. Here’s something that trips up a lot of people: the moment you create a file – even an empty one – Vim stops loading entirely. That means you lose all those nice defaults. The fix is simple. Start your with: This loads the defaults first, and then your own settings override or extend them as needed. This gotcha only applies to Vim. Neovim’s defaults are always active regardless of whether you have an or . Here’s a minimal that builds on the defaults and adds a few things most people want: That’s five settings on top of the defaults. You might not even need all of them – already handles the fundamentals. For Neovim, you don’t need the line – all the equivalents are already active. You also get , , and for free, so the only settings left to add are the ones that are genuinely personal preference: One of the most underappreciated aspects of Vim is how much built-in support it ships for programming languages. When is active (which it is via or Neovim’s defaults), you automatically get: This means that when you open a Python file, Vim already knows to use 4-space indentation. Open a Ruby file and it switches to 2 spaces. Open a Makefile and it uses tabs. All without a single plugin or line of configuration. You can check what’s available with for syntax files or for filetype plugins. The list is impressively long. At some point you’ll probably want more than the bare minimum. Here are a few things worth considering as your next steps: And when you eventually want more plugins, you probably won’t need many. A fuzzy finder, maybe a Git integration, and perhaps a completion engine will cover most needs. But that’s a topic for another day. The key takeaway is this: don’t overthink your . Start with the defaults, add only what you actually need, and resist the urge to copy someone else’s 500-line configuration. A small, well-understood configuration beats a large, cargo-culted one every time. That’s part of the reason why when I started to re-learn Vim I’ve opted to slowly build a Vim 9 configuration from scratch, instead of jumping to something like Neovim + Kickstart.nvim or LazyVim right away. Less is more. Understanding the foundations of your editor matters. 1 Right now my is just 100 lines and I don’t foresee it becoming much bigger in the long run. If you want to see just how far you can go without plugins, I highly recommend the Thoughtbot talk How to Do 90% of What Plugins Do (With Just Vim) . It’s a great demonstration of Vim’s built-in capabilities for file finding, auto-completion, tag navigation, and more. That’s all I have for you today. Keep hacking! I guess this sounds strange coming from the author of Emacs Prelude, right?  ↩︎ – syntax highlighting – filetype detection, language-specific plugins, and automatic indentation – incremental search (results appear as you type) – keeps 5 lines of context around the cursor – shows instead of hiding truncated lines – mouse support in all modes remapped to (text formatting) instead of the mostly useless Ex mode And several other quality-of-life improvements Syntax highlighting for hundreds of languages – Vim ships with around 770+ syntax definitions Language-specific indentation rules for over 420 file types Filetype plugins that set sensible options per language (e.g., , , ) A colorscheme – Vim ships with several built-in options (try followed by Tab to see them). Recent Vim builds even bundle Catppuccin – a beautiful pastel theme that I’m quite fond of. Another favorite of mine is Tokyo Night , which you’ll need to install as a plugin. Neovim’s default colorscheme has also been quite good since 0.10. Persistent undo – lets you undo changes even after closing and reopening a file. A game changer. Clipboard integration – makes yank and paste use the system clipboard by default. vim-unimpaired – if you’re on classic Vim (not Neovim), I think Tim Pope’s vim-unimpaired is essential. It adds a consistent set of / mappings for navigating quickfix lists, buffers, adding blank lines, and much more. Neovim 0.11+ has adopted many of these as built-in defaults, but on Vim there’s no substitute. I guess this sounds strange coming from the author of Emacs Prelude, right?  ↩︎

0 views
DHH 1 months ago

Omacon comes to New York

The vibes around Linux are changing fast. Companies of all shapes and sizes are paying fresh attention. The hardware game on x86 is rapidly improving. And thanks to OpenCode and Claude Code, terminal user interfaces (TUIs) are suddenly everywhere. It's all this and Omarchy that we'll be celebrating in New York City on April 10 at the Shopify SoHo Space for the first OMACON! We've got an incredible lineup of speakers coming. The creator of Hyprland, Vaxry, will be there. Along with ThePrimeagen and TJ DeVries. You'll see OpenCode creator Dax Raad. Omarchy power contributors Ryan Hughes and Bjarne Øverli. As well as Chris Powers (Typecraft) and myself as Linux superfans. All packed into a single day of short sessions, plenty of mingle time, and some good food. Tickets go on sale tomorrow (February 19) at 10am EST. We only have room for 130 attendees total, so I imagine the offered-at-cost $299 tickets will go quickly. But if you can't manage to snatch a ticket in time, we'll also be recording everything, so you won't be left out entirely. But there is just something special about being together in person about a shared passion. I've felt the intensity of that three years in a row now with Rails World. There's an endless amount of information and instruction available online, but a sense of community and connection is far more scarce. We nerds need this. We also need people to JUST DO THINGS. Like kick off a fresh Linux distribution together with over three hundred contributors so far all leaning boldly into aesthetics, ergonomics, and that omakase spirit.  Omarchy only came about last summer, now we're seeing 50,000 ISO downloads a week, 30,000 people on the Discord, and now our very first exclusive gathering in New York City. This is open source at its best. People from all over, coming together, making cool shit. (Oh, and thanks to Shopify and Tobi for hosting. You gotta love when a hundred-plus billion dollar company like this is run by an uber nerd who can just sign off on doing something fun and cool for the community without any direct plausible payback.)

0 views

Leading Without a Map

No one can deny that our industry is in a period of great change. This industry never stops, and the rate goes up and down but change is a constant. Like it or not " change calls the tune we dance to ." One of the biggest reasons people resist change, even people who joined the software business to "change the world" is when they feel it threatens their self-perception and identity. In the west our job is often the primary piece of our identity. One sees it everywhere. Your LinkedIn profile has your name first, and some sort of job title or role description second. Heck even contestants on Jeopardy are introduced as "A marketing consultant from Eyebrow, Saskatchewan ." When completing the sentence "I am a..." most people pick their job. When change is high, that self-conception can quickly feel under threat. Even in the small it can happen. Your company decides they'd be better served writing new code in Java rather than Python or Ruby, you can expect a few "Pythonistas" or "Rubyists" to push back. In their heart of hearts they may agree with the decision on its merits but they nevertheless feel that their very identity is under threat. This can also include their social group/community/tribe membership, something that humans are genetically programmed to value and protect. So it's no doubt understandable that change can bring out strange and unpredictable behaviour in people when they feel like there's risk to their identity, self concept, or tribal membership. Well, first of all, acknowledge to ourselves that we are not immune from these phenomena either. Presumably most of us started out as software developers ourselves and when we started managing the people who did the job, it was the job we used to do so we got it. Over time, that's drifted. New frameworks and paradigms have emerged, new 'best' practices replaced the old 'best' practices and we became less intimately familiar with the day-to-day things our people were doing. This is uncomfortable at times, but we adapt. We learn what we can to stay involved at the right level and to coach and guide the people we're responsible for. Now, the game is changing in a much more fundamental and profound way. And it's happening fast. I don't know what the job of software developer is going to look like in a year from now (or even 6 months for that matter) and, frankly, neither does anyone else. This makes the job of manager much much harder. Your people are used to you having at least some concept of a map and sharing it with them and you don't have one. Everyone's figuring it out together. A good friend and former colleague once described an aspect of leadership as "smiling while the sky is falling." I'm not sure if he came up with it or if I should attribute it to someone else but I heard it from him first. My point here isn't that the sky is falling but rather, when your people are worried, you need to appear steadfast or you make the problem worse. You don't owe them certainty , because that would be dishonest and they'll clock your dishonesty whether they admit it or not. But just like in incident response, panic serves no one . You owe them calm reassurance that you're going to navigate this new world together and that you've got their best-interests at heart. You do this even though you might be feeling the same threat to your identity. You manage engineers but they're becoming some kind of new thing; bot-wranglers. Some of your other responsibilities are being offloaded to LLMs and everyone's role is going to keep changing until things inevitably settle down again (relatively speaking). With no playbook, we need some kind of framework for decision making. This is where we can fall back to 'first principles'. For me these are the things I hold important. Really, the basics: It sounds simple, and really, it is. Taking care of the people right now means recognizing that they're feeling that identity risk. The worst thing you can do is try to talk them out of it or convince them they're not feeling what they're feeling. Acknowledge that things are changing. Maintain ' esprit de corps ' as best you can. Draw on your experience navigating big changes before. If you've been around this industry for any amount of time, you've been through some big paradigm shifts and come out the other side. Tell some stories, but don't make it all about you. The business and customer angles come down to maintaining consistent principles around what software gets shipped to customers. I personally have the pleasing-to-nobody opinion that LLM coding tools are useful but not risk-free. Surely you have some skeptics in your midst who feel the same. Don't dismiss them either. Security, quality, maintainability, incident response, and the work-life balance of your people are still the responsibility of the humans running the company. That's the job right now, however the machinery of it changes. Keep taking care of your people and customers, like you always have. You already know how. " Statue of Captain George Vancouver, anchors and the Custom House, King's Lynn " by ell brown is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Get in touch ! Doing my best to take care of the people. Doing what the business needs most at the given moment. Providing value to customers.

1 views
Max Bernstein 1 months ago

Type-based alias analysis in the Toy Optimizer

Another entry in the Toy Optimizer series . Last time, we did load-store forwarding in the context of our Toy Optimizer. We managed to cache the results of both reads from and writes to the heap—at compile-time! We were careful to mind object aliasing: we separated our heap information into alias classes based on what offset the reads/writes referenced. This way, if we didn’t know if object and aliased, we could at least know that different offsets would never alias (assuming our objects don’t overlap and memory accesses are on word-sized slots). This is a coarse-grained heuristic. Fortunately, we often have much more information available at compile-time than just the offset, so we should use it. I mentioned in a footnote that we could use type information, for example, to improve our alias analysis. We’ll add a lightweight form of type-based alias analysis (TBAA) (PDF) in this post. We return once again to Fil Pizlo land, specifically How I implement SSA form . We’re going to be using the hierarchical heap effect representation from the post in our implementation, but you can use your own type representation if you have one already. This representation divides the heap into disjoint regions by type. Consider, for example, that objects and objects do not overlap. A pointer is never going to alias an pointer. They can therefore be reasoned about separately. But sometimes you don’t have perfect type information available. If you have in your language an base class of all objects, then the heap overlaps with, say, the heap. So you need some way to represent that too—just having an enum doesn’t work cleanly. Here is an example simplified type hierarchy: Where might represent different parts of the runtime’s data structures, and could be further segmented into , , etc. Fil’s idea is that we can represent each node in that hierarchy with a tuple of integers (inclusive, exclusive) that represent the pre- and post-order traversals of the tree. Or, if tree traversals are not engraved into your bones, they represent the range of all the nested objects within them. Then the “does this write interfere with this read” check—the aliasing check—is a range overlap query. Here’s a perhaps over-engineered Python implementation of the range and heap hierarchy based on the Ruby generator and C++ runtime code from JavaScriptCore: Where kicks off the tree-numbering scheme. Fil’s implementation also covers a bunch of abstract heaps such as SSAState and Control because his is used for code motion and whatnot. That can be added on later but we will not do so in this post. So there you have it: a type representation. Now we need to use it in our load-store forwarding. Recall that our load-store optimization pass looks like this: At its core, it iterates over the instructions, keeping a representation of the heap at compile-time. Reads get cached, writes get cached, and writes also invalidate the state of compile-time information about fields that may alias. In this case, our may alias asks only if the offsets overlap. This means that the following unit test will fail: This test is expecting the write to to still remain cached even though we wrote to the same offset in —because we have annotated as being an and as being a . If we account for type information in our alias analysis, we can get this test to pass. After doing a bunch of fussing around with the load-store forwarding (many rewrites), I eventually got it down to a very short diff: If we don’t have any type/alias information, we default to “I know nothing” ( ) for each object. Then we check range overlap. The boolean logic in looks a little weird, maybe. But we can also rewrite (via DeMorgan’s law) as: So, keeping all the cached field state about fields that are known by offset and by type not to alias. Maybe that is clearer (but not as nice a diff). Note that the type representation is not so important here! You could use a bitset version of the type information if you want. The important things are that you can cheaply construct types and check overlap between them. Nice, now our test passes! We can differentiate between memory accesses on objects of different types. But what if we knew more? Sometimes we know where an object came from. For example, we may have seen it get allocated in the trace. If we saw an object’s allocation, we know that it does not alias (for example) any object that was passed in via a parameter. We can use this kind of information to our advantage. For example, in the following made up IR snippet: We know that (among other facts) doesn’t alias or because we have seen its allocation site. I saw this in the old V8 IR Hydrogen’s lightweight alias analysis 1 : There is plenty of other useful information such as: If you have other fun ones, please write in. We only handle loads and stores in our optimizer. Unfortunately, this means we may accidentally cache stale information. Consider: what happens if a function call (or any other opaque instruction) writes into an object we are tracking? The conservative approach is to invalidate all cached information on a function call. This is definitely correct, but it’s a bummer for the optimizer. Can we do anything? Well, perhaps we are calling a well-known function or a specific IR instruction. In that case, we can annotate it with effects in the same abstract heap model: if the instruction does not write, or only writes to some heaps, we can at least only partially invalidate our heap. However, if the function is unknown or otherwise opaque, we need at least more advanced alias information and perhaps even (partial) escape analysis. Consider: even if an instruction takes no operands, we have no idea what state it has access to. If it writes to any object A, we cannot safely cache information about any other object B unless we know for sure that A and B do not alias. And we don’t know what the instruction writes to. So we may only know we can cache information about B because it was allocated locally and has not escaped. Some runtimes such as ART pre-compute all of their alias information in a bit matrix. This makes more sense if you are using alias information in a full control-flow graph, where you might need to iterate over the graph a few times. In a trace context, you can do a lot in one single pass—no need to make a matrix. As usual, this is a toy IR and a toy optimizer, so it’s hard to say how much faster it makes its toy programs. In general, though, there is a dial for analysis and optimization that goes between precision and speed. This is a happy point on that dial, only a tiny incremental analysis cost bump above offset-only invalidation, but for higher precision. I like that tradeoff. Also, it is very useful in JIT compilers where generally the managed language is a little better-behaved than a C-like language . Somewhere in your IR there will be a lot of duplicate loads and stores from a strength reduction pass, and this can clean up the mess. Thanks for joining as I work through a small use of type-based alias analysis for myself. I hope you enjoyed. Thank you to Chris Gregory for helpful feedback. I made a fork of V8 to go spelunk around the Hydrogen IR. I reset the V8 repo to the last commit before they deleted it in favor of their new Sea of Nodes based IR called TurboFan.  ↩ If we know at compile-time that object A has 5 at offset 0 and object B has 7 at offset 0, then A and B don’t alias (thanks, CF) In the RPython JIT in PyPy, this is used to determine if two user (Python) objects don’t alias because we know the contents of the user (Python) class field Object size (though perhaps that is a special case of the above bullet) Field size/type Deferring alias checks to run-time Have a branch I made a fork of V8 to go spelunk around the Hydrogen IR. I reset the V8 repo to the last commit before they deleted it in favor of their new Sea of Nodes based IR called TurboFan.  ↩

0 views
ava's blog 2 months ago

when exercise started helping me

Nowadays, exercising really always saves me without fail. I realized that today, after again feeling absolutely terrible but then dragging myself out of bed to at least walk on my foldable treadmill. I started wondering when this change exactly happened and what led to it, because I used to hate exercise. I didn't understand people who said it helped with depression. When did it truly start being a reliable way to improve my mental state? What I struggled with back then were most definitely access, energy and health . I neither had a gym membership, nor did I have gym equipment at home. Wanting to exercise consisted of pulling out some yoga mat to do crunches like once a year, or going out for a run. Both suck when you haven't built it up over weeks or months! It was immediately difficult, painful and exhausting. My undiagnosed autoimmune diseases added more pain on top; I was just too inflamed to really work out well or even recover for days on end, and I dealt with a lot of fatigue on top of everything. That makes starting and keeping at it almost impossible, except for unexpected good phases. Without at least showing up semi-regularly, I made no progress, and every attempt I did make was immediately very exhausting with no reward. I felt like I couldn't last long enough in a session or exercise regimen to even reap the benefits. It didn't help at all that I immediately always chose something rather difficult or exhausting, as if I had to jump onto a level at which I expected a "default" human being to be at. So what changed is: I was diagnosed and found a working treatment. This one is big; so much pain and fatigue gone. Training results finally showed and made getting motivated and back on track easier. Some exercise even started helping with the residual pain and symptoms. I searched for things to do that were easier on me. I shouldn't immediately run or do crunches. Instead, even just walking, yoga, and some easy Pilates are enough, and more manageable to someone in my position. They are easier to pick back up after a few weeks and allow great control over varying the difficulty. With running, for example, I had no room to vary anything; even just the act of running was so exhausting back then that adjusting speed made no difference. With other forms of movement, I could build something without feeling totally exhausted. I signed up for the gym and just made showing up and walking on the treadmill a goal, and I watched videos or listened to podcasts. This was needed, because when I started it, I was still recovering from a really bad flare up and couldn't be trusted to walk around unsupervised in the forest somewhere. At the gym while just walking, I could slowly build up my exercise tolerance and endurance while seeing it as a sort of "me time" with some enjoyable videos, and with people around in case I suddenly started feeling dizzy or anything, and with some rails to hold on to. By saving videos for this time, I made it more entertaining and had something to look forward to on it. I invested in a spinning bike, and later in a foldable treadmill for at home use. I sometimes feel too bad physically or mentally to make it to the gym (or it is closed), and this enables me to still work out without being discouraged by my issues, time or weather. It also takes away the calculation of "Is it even worth showing up?" if I might just feel like 20 minutes of treadmill that day. Better 20 minutes than nothing! With all that, I slowly built up enough of a a baseline fitness for me that wouldn't make training annoying and just exhausting. It was easier to get back in after a break, and every time I had to take one, I had lost less progress than before. I got better and better at finding my sweet spot, neither under- nor overexercising. The more times I actually pushed myself to exercise despite feeling awful mentally and left it happier, the more it didn't feel like an outlier, but a guaranteed outcome. That made it easier to show up despite everything. It's still hard, but I know now that it is basically like a button to improve my mood, and who doesn't want that? That behavior just keeps getting reinforced every time I can get myself out of a hole with this. It gets harder and harder to convincingly tell myself " No, this time will be different; you'll feel the same or worse when you do this. You should stay in bed instead. " Lying down has a much worse track record: It never makes me feel better. Reply via email Published 12 Feb, 2026 I was diagnosed and found a working treatment. This one is big; so much pain and fatigue gone. Training results finally showed and made getting motivated and back on track easier. Some exercise even started helping with the residual pain and symptoms. I searched for things to do that were easier on me. I shouldn't immediately run or do crunches. Instead, even just walking, yoga, and some easy Pilates are enough, and more manageable to someone in my position. They are easier to pick back up after a few weeks and allow great control over varying the difficulty. With running, for example, I had no room to vary anything; even just the act of running was so exhausting back then that adjusting speed made no difference. With other forms of movement, I could build something without feeling totally exhausted. I signed up for the gym and just made showing up and walking on the treadmill a goal, and I watched videos or listened to podcasts. This was needed, because when I started it, I was still recovering from a really bad flare up and couldn't be trusted to walk around unsupervised in the forest somewhere. At the gym while just walking, I could slowly build up my exercise tolerance and endurance while seeing it as a sort of "me time" with some enjoyable videos, and with people around in case I suddenly started feeling dizzy or anything, and with some rails to hold on to. By saving videos for this time, I made it more entertaining and had something to look forward to on it. I invested in a spinning bike, and later in a foldable treadmill for at home use. I sometimes feel too bad physically or mentally to make it to the gym (or it is closed), and this enables me to still work out without being discouraged by my issues, time or weather. It also takes away the calculation of "Is it even worth showing up?" if I might just feel like 20 minutes of treadmill that day. Better 20 minutes than nothing! With all that, I slowly built up enough of a a baseline fitness for me that wouldn't make training annoying and just exhausting. It was easier to get back in after a break, and every time I had to take one, I had lost less progress than before. I got better and better at finding my sweet spot, neither under- nor overexercising. The more times I actually pushed myself to exercise despite feeling awful mentally and left it happier, the more it didn't feel like an outlier, but a guaranteed outcome. That made it easier to show up despite everything. It's still hard, but I know now that it is basically like a button to improve my mood, and who doesn't want that?

0 views

Rewriting pycparser with the help of an LLM

pycparser is my most widely used open source project (with ~20M daily downloads from PyPI [1] ). It's a pure-Python parser for the C programming language, producing ASTs inspired by Python's own . Until very recently, it's been using PLY: Python Lex-Yacc for the core parsing. In this post, I'll describe how I collaborated with an LLM coding agent (Codex) to help me rewrite pycparser to use a hand-written recursive-descent parser and remove the dependency on PLY. This has been an interesting experience and the post contains lots of information and is therefore quite long; if you're just interested in the final result, check out the latest code of pycparser - the main branch already has the new implementation. While pycparser has been working well overall, there were a number of nagging issues that persisted over years. I began working on pycparser in 2008, and back then using a YACC-based approach for parsing a whole language like C seemed like a no-brainer to me. Isn't this what everyone does when writing a serious parser? Besides, the K&R2 book famously carries the entire grammar of the C99 language in an appendix - so it seemed like a simple matter of translating that to PLY-yacc syntax. And indeed, it wasn't too hard, though there definitely were some complications in building the ASTs for declarations (C's gnarliest part ). Shortly after completing pycparser, I got more and more interested in compilation and started learning about the different kinds of parsers more seriously. Over time, I grew convinced that recursive descent is the way to go - producing parsers that are easier to understand and maintain (and are often faster!). It all ties in to the benefits of dependencies in software projects as a function of effort . Using parser generators is a heavy conceptual dependency: it's really nice when you have to churn out many parsers for small languages. But when you have to maintain a single, very complex parser, as part of a large project - the benefits quickly dissipate and you're left with a substantial dependency that you constantly grapple with. And then there are the usual problems with dependencies; dependencies get abandoned, and they may also develop security issues. Sometimes, both of these become true. Many years ago, pycparser forked and started vendoring its own version of PLY. This was part of transitioning pycparser to a dual Python 2/3 code base when PLY was slower to adapt. I believe this was the right decision, since PLY "just worked" and I didn't have to deal with active (and very tedious in the Python ecosystem, where packaging tools are replaced faster than dirty socks) dependency management. A couple of weeks ago this issue was opened for pycparser. It turns out the some old PLY code triggers security checks used by some Linux distributions; while this code was fixed in a later commit of PLY, PLY itself was apparently abandoned and archived in late 2025. And guess what? That happened in the middle of a large rewrite of the package, so re-vendoring the pre-archiving commit seemed like a risky proposition. On the issue it was suggested that "hopefully the dependent packages move on to a non-abandoned parser or implement their own"; I originally laughed this idea off, but then it got me thinking... which is what this post is all about. The original K&R2 grammar for C99 had - famously - a single shift-reduce conflict having to do with dangling else s belonging to the most recent if statement. And indeed, other than the famous lexer hack used to deal with C's type name / ID ambiguity , pycparser only had this single shift-reduce conflict. But things got more complicated. Over the years, features were added that weren't strictly in the standard but were supported by all the industrial compilers. The more advanced C11 and C23 standards weren't beholden to the promises of conflict-free YACC parsing (since almost no industrial-strength compilers use YACC at this point), so all caution went out of the window. The latest (PLY-based) release of pycparser has many reduce-reduce conflicts [2] ; these are a severe maintenance hazard because it means the parsing rules essentially have to be tie-broken by order of appearance in the code. This is very brittle; pycparser has only managed to maintain its stability and quality through its comprehensive test suite. Over time, it became harder and harder to extend, because YACC parsing rules have all kinds of spooky-action-at-a-distance effects. The straw that broke the camel's back was this PR which again proposed to increase the number of reduce-reduce conflicts [3] . This - again - prompted me to think "what if I just dump YACC and switch to a hand-written recursive descent parser", and here we are. None of the challenges described above are new; I've been pondering them for many years now, and yet biting the bullet and rewriting the parser didn't feel like something I'd like to get into. By my private estimates it'd take at least a week of deep heads-down work to port the gritty 2000 lines of YACC grammar rules to a recursive descent parser [4] . Moreover, it wouldn't be a particularly fun project either - I didn't feel like I'd learn much new and my interests have shifted away from this project. In short, the Potential well was just too deep. I've definitely noticed the improvement in capabilities of LLM coding agents in the past few months, and many reputable people online rave about using them for increasingly larger projects. That said, would an LLM agent really be able to accomplish such a complex project on its own? This isn't just a toy, it's thousands of lines of dense parsing code. What gave me hope is the concept of conformance suites mentioned by Simon Willison . Agents seem to do well when there's a very clear and rigid goal function - such as a large, high-coverage conformance test suite. And pycparser has an very extensive one . Over 2500 lines of test code parsing various C snippets to ASTs with expected results, grown over a decade and a half of real issues and bugs reported by users. I figured the LLM can either succeed or fail and throw its hands up in despair, but it's quite unlikely to produce a wrong port that would still pass all the tests. So I set it to run. I fired up Codex in pycparser's repository, and wrote this prompt just to make sure it understands me and can run the tests: Codex figured it out (I gave it the exact command, after all!); my next prompt was the real thing [5] : Here Codex went to work and churned for over an hour . Having never observed an agent work for nearly this long, I kind of assumed it went off the rails and will fail sooner or later. So I was rather surprised and skeptical when it eventually came back with: It took me a while to poke around the code and run it until I was convinced - it had actually done it! It wrote a new recursive descent parser with only ancillary dependencies on PLY, and that parser passed the test suite. After a few more prompts, we've removed the ancillary dependencies and made the structure clearer. I hadn't looked too deeply into code quality at this point, but at least on the functional level - it succeeded. This was very impressive! A change like the one described above is impossible to code-review as one PR in any meaningful way; so I used a different strategy. Before embarking on this path, I created a new branch and once Codex finished the initial rewrite, I committed this change, knowing that I will review it in detail, piece-by-piece later on. Even though coding agents have their own notion of history and can "revert" certain changes, I felt much safer relying on Git. In the worst case if all of this goes south, I can nuke the branch and it's as if nothing ever happened. I was determined to only merge this branch onto main once I was fully satisfied with the code. In what follows, I had to git reset several times when I didn't like the direction in which Codex was going. In hindsight, doing this work in a branch was absolutely the right choice. Once I've sufficiently convinced myself that the new parser is actually working, I used Codex to similarly rewrite the lexer and get rid of the PLY dependency entirely, deleting it from the repository. Then, I started looking more deeply into code quality - reading the code created by Codex and trying to wrap my head around it. And - oh my - this was quite the journey. Much has been written about the code produced by agents, and much of it seems to be true. Maybe it's a setting I'm missing (I'm not using my own custom AGENTS.md yet, for instance), but Codex seems to be that eager programmer that wants to get from A to B whatever the cost. Readability, minimalism and code clarity are very much secondary goals. Using raise...except for control flow? Yep. Abusing Python's weak typing (like having None , false and other values all mean different things for a given variable)? For sure. Spreading the logic of a complex function all over the place instead of putting all the key parts in a single switch statement? You bet. Moreover, the agent is hilariously lazy . More than once I had to convince it to do something it initially said is impossible, and even insisted again in follow-up messages. The anthropomorphization here is mildly concerning, to be honest. I could never imagine I would be writing something like the following to a computer, and yet - here we are: "Remember how we moved X to Y before? You can do it again for Z, definitely. Just try". My process was to see how I can instruct Codex to fix things, and intervene myself (by rewriting code) as little as possible. I've mostly succeeded in this, and did maybe 20% of the work myself. My branch grew dozens of commits, falling into roughly these categories: Interestingly, after doing (3), the agent was often more effective in giving the code a "fresh look" and succeeding in either (1) or (2). Eventually, after many hours spent in this process, I was reasonably pleased with the code. It's far from perfect, of course, but taking the essential complexities into account, it's something I could see myself maintaining (with or without the help of an agent). I'm sure I'll find more ways to improve it in the future, but I have a reasonable degree of confidence that this will be doable. It passes all the tests, so I've been able to release a new version (3.00) without major issues so far. The only issue I've discovered is that some of CFFI's tests are overly precise about the phrasing of errors reported by pycparser; this was an easy fix . The new parser is also faster, by about 30% based on my benchmarks! This is typical of recursive descent when compared with YACC-generated parsers, in my experience. After reviewing the initial rewrite of the lexer, I've spent a while instructing Codex on how to make it faster, and it worked reasonably well. While working on this, it became quite obvious that static typing would make the process easier. LLM coding agents really benefit from closed loops with strict guardrails (e.g. a test suite to pass), and type-annotations act as such. For example, had pycparser already been type annotated, Codex would probably not have overloaded values to multiple types (like None vs. False vs. others). In a followup, I asked Codex to type-annotate pycparser (running checks using ty ), and this was also a back-and-forth because the process exposed some issues that needed to be refactored. Time will tell, but hopefully it will make further changes in the project simpler for the agent. Based on this experience, I'd bet that coding agents will be somewhat more effective in strongly typed languages like Go, TypeScript and especially Rust. Overall, this project has been a really good experience, and I'm impressed with what modern LLM coding agents can do! While there's no reason to expect that progress in this domain will stop, even if it does - these are already very useful tools that can significantly improve programmer productivity. Could I have done this myself, without an agent's help? Sure. But it would have taken me much longer, assuming that I could even muster the will and concentration to engage in this project. I estimate it would take me at least a week of full-time work (so 30-40 hours) spread over who knows how long to accomplish. With Codex, I put in an order of magnitude less work into this (around 4-5 hours, I'd estimate) and I'm happy with the result. It was also fun . At least in one sense, my professional life can be described as the pursuit of focus, deep work and flow . It's not easy for me to get into this state, but when I do I'm highly productive and find it very enjoyable. Agents really help me here. When I know I need to write some code and it's hard to get started, asking an agent to write a prototype is a great catalyst for my motivation. Hence the meme at the beginning of the post. One can't avoid a nagging question - does the quality of the code produced by agents even matter? Clearly, the agents themselves can understand it (if not today's agent, then at least next year's). Why worry about future maintainability if the agent can maintain it? In other words, does it make sense to just go full vibe-coding? This is a fair question, and one I don't have an answer to. Right now, for projects I maintain and stand behind , it seems obvious to me that the code should be fully understandable and accepted by me, and the agent is just a tool helping me get to that state more efficiently. It's hard to say what the future holds here; it's going to interesting, for sure. There was also the lexer to consider, but this seemed like a much simpler job. My impression is that in the early days of computing, lex gained prominence because of strong regexp support which wasn't very common yet. These days, with excellent regexp libraries existing for pretty much every language, the added value of lex over a custom regexp-based lexer isn't very high. That said, it wouldn't make much sense to embark on a journey to rewrite just the lexer; the dependency on PLY would still remain, and besides, PLY's lexer and parser are designed to work well together. So it wouldn't help me much without tackling the parser beast. The code in X is too complex; why can't we do Y instead? The use of X is needlessly convoluted; change Y to Z, and T to V in all instances. The code in X is unclear; please add a detailed comment - with examples - to explain what it does.

0 views
Justin Duke 2 months ago

Brief notes on migrating to Postgres-backed jobs

It seems premature to talk about a migration that is only halfway done, even if it's the hard half that's done — but I think there's something useful in documenting the why and how of a transition while you're still in the thick of it, before the revisionist history of completion sets in. Early last year, we built out a system for running background jobs directly against Postgres within Django. This very quickly got abstracted out into a generic task runner — shout out to Brandur and many other people who have been beating this drum for a while. And as far as I can tell, this concept of shifting away from Redis and other less-durable caches for job infrastructure is regaining steam on the Rails side of the ecosystem, too. The reason we did it was mostly for ergonomics around graceful batch processing. It is significantly easier to write a poller in Django for stuff backed by the ORM than it is to try and extend RQ or any of the other task runner options that are Redis-friendly. Django gives you migrations, querysets, admin visibility, transactional guarantees — all for free, all without another moving part. And as we started using it and it proved stable, we slowly moved more and more things over to it. At the time of this writing, around half of our jobs by quantity — which represent around two-thirds by overall volume — have been migrated over from RQ onto this system. This is slightly ironic given that we also last year released django-rq-cron , a library that, if I have my druthers, we will no longer need. Fewer moving parts is the watchword. We're removing spindles from the system and getting closer and closer to a simple, portable, and legible stack of infrastructure.

1 views
Steve Klabnik 2 months ago

The most important thing when working with LLMs

Okay, so you’ve got the basics of working with Claude going. But you’ve probably run into some problems: Claude doesn’t do what you want it to do, it gets confused about what’s happening and goes off the rails, all sorts of things can go wrong. Let’s talk about how to improve upon that. The most important thing that you can do when working with an LLM is give it a way to quickly evaluate if it’s doing the right thing, and if it isn’t, point it in the right direction. This is incredibly simple, yet, like many simple things, also wildly complex. But if you can keep this idea in mind, you’ll be well equipped to become effective when working with agents. A long time ago, I used to teach programming classes. Many of these were to adults, but some of them were to children. Teenaged children, but children nonetheless. We used to do an exercise to try and help them understand the difference between talking in English and talking in Ruby, or JavaScript, or whatever kind of programming language rather than human language. The exercise went like this: I would have a jar of peanut butter, a jar of jelly, a loaf of bread, a spoon, and a knife. I would ask the class to take a piece of paper and write down a series of steps to make a peanut butter and jelly sandwich. They’d all then give me their algorithms, and the fun part for me began: find one that’s innocently written that I could hilariously misinterpret. For example, I might find one like: I’d read this aloud to the class, you all understand this is a recipe for a peanut butter and jelly sandwich, right? I’d take the jar of peanut butter and place it upon the unopened bag of bread. I’d do the same with the jar of jelly. This would of course, squish the bread, which feels slightly transgressive given that you’re messing up the bread, so the kids would love that. I’d then say something like “the bread is already together, I do not understand this instruction.” After the inevitable laughter died down, I’d make my point: the computer will do exactly what you say, but not what you mean. So you have to get good at figuring out when you said something different than what you mean. Sort of ironically, LLMs are kind of the inverse of this: they’ll sometimes try to figure out what you mean, and then do that, rather than simply doing what you say. But the core thing here is the same: semantic drift from what we intended our program to do, and what it actually does. The second lesson is something I came up with sometime, I don’t even remember how exactly. But it’s something I told my students a lot. And that’s this: If your program did everything you wanted without problems, you wouldn’t be programming: you’d be using your program. The act of programming is itself perpetually to be in a state where something is either inadequate or broken, and the job is to fix that. I also think this is a bit simplistic but also getting at something. I had originally come up with this in the context of trying to explain how you need to manage your frustration when programming; if you get easily upset by something not working, doing computer programming might not be for you. But I do think these two things combine into something that gets to the heart of what we do: we need to understand what it is we want our software to do, and then make it do that. Sometimes, our software doesn’t do something yet. Sometimes, it does something, but incorrectly. Both of these cases result in a divergence from the program’s intended behavior. So, how do we know if our program does what it should do? Well, what we’ve been doing so far is: This is our little mini software development lifecycle, or “SDLC.” This process works, but is slow. That’s great for getting the feel of things, but programmers are process optimizers by trade. One of my favorite tools for optimization is called Amdahl’s law . The core idea is this, formulated in my own words: If you have a process that takes multiple steps, and you want to speed it up, if you optimize only one step, the maximum amount of speedup you’ll get is determined by the portion of the process that step takes. In other words, imagine we have a three step process: This process takes a total of 13 minutes to complete. If we speed up step 3 by double, it goes from two minutes to one minute, and now our process takes 12 minutes. However, if we were able to speed up step 2 by double, we’d cut off five minutes, and our process would now take 8 minutes. We can use this style of analysis to guide our thinking in many ways, but the most common way, for me, is to decide where to put my effort. Given the process above, I’m going to look at step 2 first to try and figure out how to make it faster. That doesn’t mean we can achieve the 2x speedup, but heck, if we get a 10% decrease in time, that’s the same time as if we did get a 2x on step 3. So it’s at least the place where we should start. I chose the above because, well, I think it properly models the proportion of time we’re taking when doing things with LLMs: we spend some time asking it to do something, and we spend a bit more time reviewing its output. But we spend a lot of time clicking “accept edit,” and a lot of time allowing Claude to execute tools. This will be our next step forward, as this will increase our velocity when working with the tools significantly. However, like with many optimization tasks, this is easier said than done. The actual mechanics of improving the speed of this step are simple at first: hit to auto-accept edits, and “Yes, and don’t ask again for commands” when you think the is safe for Claude to run. By doing this, once you have enough commands allowed, your input for step 2 of our development loop can drop to zero. Of course, it takes time for Claude to actually implement what you’ve asked, so it’s not like our 13 minute process drops to three, but still, this is a major efficiency step. But we were actively monitoring Claude for a reason. Claude will sometimes do incorrect things, and we need to correct it. At some point, Claude will say “Hey I’ve finished doing what you asked of me!” and it doesn’t matter how fast it does step 2 if we get to step 3 and it’s just incorrect, and we need to throw everything out and try again. So, how do we get Claude to guide itself in the right direction? A useful technique for figuring out what you should do is to consider the ending: where do we want to go? That will inform what we need to do to get there. Well, the ending of step 2 is knowing when to transition to step 3. And that transition is gated by “does the software do what it is supposed to do?” That’s a huge question! But in practice, we can do what we always do: start simple, and iterate from there. Right now, the transition from step 2 to step 3 is left up to Claude. Claude will use its own judgement to decide when it thinks that the software is working. And it’ll be right. But why leave that up to chance? I expect that some of you are thinking that maybe I’m belaboring this point. “Why not just skip to ? That’s the idea, right? We need tests.” Well on some level: yes. But on another level, no. I’m trying to teach you how to think here, not give you the answer. Because it might be broader than just “run the tests.” Maybe you are working on a project where the tests aren’t very good yet. Maybe you’re working on a behavior that’s hard to automatically test. Maybe the test suite takes a very long time, and so isn’t appropriate to be running over and over and over. Remember our plan from the last post? Where Claude finished the plan with this: These aren’t “tests” in the traditional sense of a test suite, but they are objective measures that Claude can invoke itself to understand if it’s finished the task. Claude could run after every file edit if it wanted to, and as soon as it sees , it knows that it’s finished. You don’t need a comprehensive test suite. You just need some sort of way for Claude to detect if it’s done in some sort of objective fashion. Of course, we can do better. While giving Claude a way to know if it’s done working is important, there’s a second thing we need to pay attention to: when Claude isn’t done working, can we guide it towards doing the right thing, rather than the wrong thing? For example, those of you who are of a similar vintage as myself may remember the output of early compilers. It was often… not very helpful. Imagine that we told Claude that it should run to know if things are working, and the only output from it was the exit code: 0 if we succeeded, 1 if we failed. That would accomplish our objective of letting Claude know when things are done, but it wouldn’t help Claude know what went wrong when it returns 1. This is one reason why I think Rust works well with LLMs. Take this incorrect Rust program: The Rust compiler won’t just say “yeah this program is incorrect,” it’ll give you this (as of Rust 1.93.0): The compiler will point out the exact place in the code itself of where there’s an issue, and even make suggestions as to how to fix it. This goes beyond just simply saying “it doesn’t work” and instead nudges you to what might fix the problem. Of course, this isn’t perfect, but if it’s helpful more than not, that’s a win. Of course, too much verbosity isn’t helpful either. A lot of tooling has gotten much more verbose lately. Often times, this is really nice as a human. Pleasant terminal output is, well… pleasant. But that doesn’t mean that it’s always good or useful. For example, here’s the default output for : This is not bad output. It’s nice. But it’s also not useful for an LLM. We don’t need to read all of the tests that are passing, we really just want to see some sort of minimal output, and then what failed if something failed. In Cargo’s case, that’s for “quiet”: There is no point in giving a ton of verbose input to an LLM that it isn’t even going to need to use. If you’re feeding a tools’ output to an LLM, you should consider both what the tool does in the failure case, but also the success case. Maybe configure things to be a bit simpler for Claude. You’ll save some tokens and get better results. All of this has various implications for all sorts of things. For example, types are a great way to get quick feedback on what you’re doing. A comprehensive test suite that completes quickly is useful for giving feedback to the LLM. But that also doesn’t inherently mean that types must be better or that you need to be doing TDD; whatever gives you that underlying principle of “objective feedback for the success case and guidance for the failure case” will be golden, no matter what tech stack you use. This brings me to something that may be counter-intuitive, but I think is also true, and worth keeping in the back of your mind: what’s good for Claude is also probably good for humans working on your system. A good test suite was considered golden before LLMs. That it’s great for them is just a nice coincidence. At the end of the day, Claude is not a person, but it tackles programming problems in a similar fashion to how we do: take in the problem, attempt a solution, run the compiler/linter/tests, and then see what feedback it gets, then iterate. That core loop is the same, even if humans can exercise better judgement and can have more skill. And so even though I pitched fancy terminal output as an example of how humans and LLMs need different things, that’s really just a superficial kind of thing. Good error messages are still critical for both. We’re just better at having terminal spinners not take up space in our heads while we’re solving a problem, and can appreciate the aesthetics in a way that Claude does not. Incidentally, this is one of the things that makes me hopeful about the future of software development under agentic influence. Engineers always complain that management doesn’t give us time to do refactorings, to improve the test suite, to clean our code. Part of the reason for this is that we often didn’t do a good job of pitching how it would actually help accomplish business goals. But even if you’re on the fence about AI, and upset that management is all about AI: explain to management that this stuff is a force multiplier for your agents. Use the time you’ve saved by doing things the agentic way towards improving your test suite, or your documentation, or whatever else. I think there’s a chance that all of this stuff leads to higher quality codebases than ones filled with slop. But it also requires us to make the decisions that will lead is in that direction. That’s what I have for you today: consider how you can help Claude evaluate its own work. Give it explicit success criteria, and make evaluating that criteria as simple and objective as possible. In the next post, we’re gonna finally talk about . Can you believe that I’ve talked this much about how to use Claude and we haven’t talked about ? There’s good reason for that, as it turns out. We’re going to talk a bit more about understanding how interacting with LLMs work, and how it can help us both improve step 1 in our process, but also continue to make step 2 better and better. Here’s my post about this post on BlueSky: Steve Klabnik @steveklabnik.com · Jan 22 Replying to Steve Klabnik Agentic development basics: steveklabnik.com/writing/agen... Agentic development basics Blog post: Agentic development basics by Steve Klabnik steveklabnik.com Steve Klabnik @steveklabnik.com The most important thing when working with LLMs steveklabnik.com/writing/the-... The most important thing when working with LLMs Blog post: The most important thing when working with LLMs by Steve Klabnik Put the peanut butter on the bread Put the jelly on the bread Put the bread together Asking the LLM to do something by typing up what we want it to do Closely observing its behavior and course correcting it when it goes off of the rails Eventually, after it says that it’s finished, reviewing its output Ten minutes Two minutes

0 views
daniel.haxx.se 2 months ago

The end of the curl bug-bounty

tldr: an attempt to reduce the terror reporting . There is no longer a curl bug-bounty program. It officially stops on January 31, 2026. After having had a few half-baked previous takes, in April 2019 we kicked off the first real curl bug-bounty with the help of Hackerone, and while it stumbled a bit at first it has been quite successful I think. We attracted skilled researchers who reported plenty of actual vulnerabilities for which we paid fine monetary rewards. We have certainly made curl better as a direct result of this: 87 confirmed vulnerabilities and over 100,000 USD paid as rewards to researchers. I’m quite happy and proud of this accomplishment. I would like to especially highlight the awesome Internet Bug Bounty project, which has paid the bounties for us for many years. We could not have done this without them. Also of course Hackerone, who has graciously hosted us and been our partner through these years. Looking back, I think we can say that the downfall of the bug-bounty program started slowly in the second half of 2024 but accelerated badly in 2025. We saw an explosion in AI slop reports combined with a lower quality even in the reports that were not obvious slop – presumably because they too were actually misled by AI but with that fact just hidden better. Maybe the first five years made it possible for researchers to find and report the low hanging fruit. Previous years we have had a rate of somewhere north of 15% of the submissions ending up confirmed vulnerabilities. Starting 2025, the confirmed-rate plummeted to below 5%. Not even one in twenty was real . The never-ending slop submissions take a serious mental toll to manage and sometimes also a long time to debunk. Time and energy that is completely wasted while also hampering our will to live. I have also started to get the feeling that a lot of the security reporters submit reports with a bad faith attitude. These “helpers” try too hard to twist whatever they find into something horribly bad and a critical vulnerability, but they rarely actively contribute to actually improve curl. They can go to extreme efforts to argue and insist on their specific current finding, but not to write a fix or work with the team on improving curl long-term etc. I don’t think we need more of that. There are these three bad trends combined that makes us take this step: the mind-numbing AI slop, humans doing worse than ever and the apparent will to poke holes rather than to help. In an attempt to do something about the sorry state of curl security reports, this is what we do: We believe that we can maintain and continue to evolve curl security in spite of this change. Maybe even improve thanks to this, as hopefully this step helps prevent more people pouring sand into the machine. Ideally we reduce the amount of wasted time and effort. I believe the best and our most valued security reporters still will tell us when they find security vulnerabilities. If you suspect a security problem in curl going forward, we advise you to head over to GitHub and submit them there. Alternatively, you send an email with the full report to . In both cases, the report is received and handled privately by the curl security team. But with no monetary reward offered . Hackerone was good to us and they have graciously allowed us to run our program on their platform for free for many years. We thank them for that service. As we now drop the rewards, we feel it makes a clear cut and displays a clearer message to everyone involved by also moving away from Hackerone as a platform for vulnerability reporting. It makes the change more visible. It is probably going to be harder for us to publicly disclose every incoming security report in the same way we have done it on Hackerone for the last year. We need to work out something to make sure that we can keep doing it at least imperfectly, because I believe in the goodness of such transparency. Let me emphasize that this change does not impact our presence and mode of operation with the curl repository and its hosting on GitHub . We hear about projects having problems with low-quality AI slop submissions on GitHub as well, in the form of issues and pull-requests, but for curl we have not (yet) seen this – and frankly I don’t think switching to a GitHub alternative saves us from that. Compared to others, we seem to be affected by the sloppy security reports to a higher degree than the average Open Source project. With the help of Hackerone, we got numbers of how the curl bug-bounty has compared with other programs over the last year. It turns out curl’s program has seen more volume and noise than other public open source bug bounty programs in the same cohort. Over the past four quarters, curl’s inbound report volume has risen sharply, while other bounty-paying open source programs in the cohort, such as Ruby, Node, and Rails, have not seen a meaningful increase and have remained mostly flat or declined slightly. In the chart, the pink line represents curl’s report volume, and the gray line reflects the broader cohort. Inbound Report Volume on Hackerone: curl compared to OSS peers We suspect the idea of getting money for it is a big part of the explanation. It brings in real reports, but makes it too easy to be annoying with little to no penalty to the user. The reputation system and available program settings were not sufficient for us to prevent sand from getting into the machine. The exact reason why we suffer more of this abuse than others remains a subject for further speculation and research. There is a non-zero risk that our guesses are wrong and that the volume and security report frequency will keep up even after these changes go into effect. If that happens, we will deal with it then and take further appropriate steps. I prefer not to overdo things or overplan already now for something that ideally does not happen. People keep suggesting that one way to deal with the report tsunami is to charge security researchers a small amount of money for the privilege of submitting a vulnerability report to us. A curl reporters security club with an entrance fee. I think that is a less good solution than just dropping the bounty. Some of the reasons include: Maybe we need to do this later anyway, but we stay away from it for now. We have seen other projects and repositories see similar AI-induced problems for pull requests, but this has not been a problem for the curl project. I believe that for PRs we have much better means to sort out the weed with automatic means, since we have tools, tests and scanners to verify such contributions. We don’t need to waste any human time on pull requests until the quality is good enough to get green check-marks from 200 CI jobs. I will do a talk at FOSDEM 2026 titled Open Source Security in spite of AI that of course will touch on this subject. We never say never. This is now and we might have reasons to reconsider and make a different decision in the future. If we do, we will let you know. These changes are applied now with the hope that they will have a positive effect for the project and its maintainers. If that turns out to not be the outcome, we will of course continue and apply further changes later. Since I created the pull request for updating the bug-bounty information for curl on January 14, almost two weeks before we merged it, various media picked up the news and published articles. Long before I posted this blog post. Also discussed (indirectly) on Hacker News . We no longer offer any monetary rewards for security reports – no matter which severity. In an attempt to remove the incentives for submitting made up lies. We stop using Hackerone as the recommended channel to report security problems. To make the change immediately obvious and because without a bug-bounty program we don’t need it. We refer everyone to submit suspected curl security problems on GitHub using their Private vulnerability reporting feature. We continue to immediately ban and publicly ridicule everyone who submits AI slop to the project. Charging people money in an International context is complicated and a maintenance burden. Dealing with charge-backs, returns and other complaints and friction add work. It would limit who could or would submit issues. Even some who actually find legitimate issues. The Register: Curl shutters bug bounty program to remove incentive for submitting AI slop Elektroniktidningen: cURL removes bug bounties Heise online: curl: Projekt beendet Bug-Bounty-Programm Neowin: Beloved tool, cURL is shutting down its bug bounty over AI slop reports Golem: Curl-Entwickler dreht dem “KI-Schrott” den Geldhahn zu Linux Easy: cURL chiude il programma bug bounty: troppi report generati dall’AI Bleeping Computer: Curl ending bug bounty program after flood of AI slop reports The New Stack: Drowning in AI slop, cURL ends bug bounties Ars Technica: Overrun with AI slop, cURL scraps bug bounties to ensure “intact mental health” PressMind Labs: cURL konczy program bug bounty – czy to koniec jakosci zgloszen? Socket: curl Shuts Down Bug Bounty Program After Flood of AI Slop Reports

0 views
Sean Goedecke 2 months ago

How I estimate work as a staff software engineer

There’s a kind of polite fiction at the heart of the software industry. It goes something like this: Estimating how long software projects will take is very hard, but not impossible. A skilled engineering team can, with time and effort, learn how long it will take for them to deliver work, which will in turn allow their organization to make good business plans. This is, of course, false. As every experienced software engineer knows, it is not possible to accurately estimate software projects . The tension between this polite fiction and its well-understood falseness causes a lot of strange activity in tech companies. For instance, many engineering teams estimate work in t-shirt sizes instead of time, because it just feels too obviously silly to the engineers in question to give direct time estimates. Naturally, these t-shirt sizes are immediately translated into hours and days when the estimates make their way up the management chain. Alternatively, software engineers who are genuinely trying to give good time estimates have ridiculous heuristics like “double your initial estimate and add 20%“. This is basically the same as giving up and saying “just estimate everything at a month”. Should tech companies just stop estimating? One of my guiding principles is that when a tech company is doing something silly, they’re probably doing it for a good reason . In other words, practices that appear to not make sense are often serving some more basic, illegible role in the organization. So what is the actual purpose of estimation, and how can you do it well as a software engineer? Before I get into that, I should justify my core assumption a little more. People have written a lot about this already, so I’ll keep it brief. I’m also going to concede that sometimes you can accurately estimate software work , when that work is very well-understood and very small in scope. For instance, if I know it takes half an hour to deploy a service 1 , and I’m being asked to update the text in a link, I can accurately estimate the work at something like 45 minutes: five minutes to push the change up, ten minutes to wait for CI, thirty minutes to deploy. For most of us, the majority of software work is not like this. We work on poorly-understood systems and cannot predict exactly what must be done in advance. Most programming in large systems is research : identifying prior art, mapping out enough of the system to understand the effects of changes, and so on. Even for fairly small changes, we simply do not know what’s involved in making the change until we go and look. The pro-estimation dogma says that these questions ought to be answered during the planning process, so that each individual piece of work being discussed is scoped small enough to be accurately estimated. I’m not impressed by this answer. It seems to me to be a throwback to the bad old days of software architecture , where one architect would map everything out in advance, so that individual programmers simply had to mechanically follow instructions. Nobody does that now, because it doesn’t work: programmers must be empowered to make architectural decisions, because they’re the ones who are actually in contact with the code 2 . Even if it did work, that would simply shift the impossible-to-estimate part of the process backwards, into the planning meeting (where of course you can’t write or run code, which makes it near-impossible to accurately answer the kind of questions involved). In short: software engineering projects are not dominated by the known work, but by the unknown work, which always takes 90% of the time. However, only the known work can be accurately estimated. It’s therefore impossible to accurately estimate software projects in advance. Estimates do not help engineering teams deliver work more efficiently. Many of the most productive years of my career were spent on teams that did no estimation at all: we were either working on projects that had to be done no matter what, and so didn’t really need an estimate, or on projects that would deliver a constant drip of value as we went, so we could just keep going indefinitely 3 . In a very real sense, estimates aren’t even made by engineers at all . If an engineering team comes up with a long estimate for a project that some VP really wants, they will be pressured into lowering it (or some other, more compliant engineering team will be handed the work). If the estimate on an undesirable project - or a project that’s intended to “hold space” for future unplanned work - is too short, the team will often be encouraged to increase it, or their manager will just add a 30% buffer. One exception to this is projects that are technically impossible, or just genuinely prohibitively difficult. If a manager consistently fails to pressure their teams into giving the “right” estimates, that can send a signal up that maybe the work can’t be done after all. Smart VPs and directors will try to avoid taking on technically impossible projects. Another exception to this is areas of the organization that senior leadership doesn’t really care about. In a sleepy backwater, often the formal estimation process does actually get followed to the letter, because there’s no director or VP who wants to jump in and shape the estimates to their ends. This is one way that some parts of a tech company can have drastically different engineering cultures to other parts. I’ll let you imagine the consequences when the company is re-orged and these teams are pulled into the spotlight. Estimates are political tools for non-engineers in the organization . They help managers, VPs, directors, and C-staff decide on which projects get funded and which projects get cancelled. The standard way of thinking about estimates is that you start with a proposed piece of software work, and you then go and figure out how long it will take. This is entirely backwards. Instead, teams will often start with the estimate, and then go and figure out what kind of software work they can do to meet it. Suppose you’re working on a LLM chatbot, and your director wants to implement “talk with a PDF”. If you have six months to do the work, you might implement a robust file upload system, some pipeline to chunk and embed the PDF content for semantic search, a way to extract PDF pages as image content to capture formatting and diagrams, and so on. If you have one day to do the work, you will naturally search for simpler approaches: for instance, converting the PDF to text client-side and sticking the entire thing in the LLM context, or offering a plain-text “grep the PDF” tool. This is true at even at the level of individual lines of code. When you have weeks or months until your deadline, you might spend a lot of time thinking airily about how you could refactor the codebase to make your new feature fit in as elegantly as possible. When you have hours, you will typically be laser-focused on finding an approach that will actually work. There are always many different ways to solve software problems. Engineers thus have quite a lot of discretion about how to get it done. So how do I estimate, given all that? I gather as much political context as possible before I even look at the code . How much pressure is on this project? Is it a casual ask, or do we have to find a way to do this? What kind of estimate is my management chain looking for? There’s a huge difference between “the CTO really wants this in one week” and “we were looking for work for your team and this seemed like it could fit”. Ideally, I go to the code with an estimate already in hand . Instead of asking myself “how long would it take to do this”, where “this” could be any one of a hundred different software designs, I ask myself “which approaches could be done in one week?“. I spend more time worrying about unknowns than knowns . As I said above, unknown work always dominates software projects. The more “dark forests” in the codebase this feature has to touch, the higher my estimate will be - or, more concretely, the tighter I need to constrain the set of approaches to the known work. Finally, I go back to my manager with a risk assessment, not with a concrete estimate . I don’t ever say “this is a four-week project”. I say something like “I don’t think we’ll get this done in one week, because X Y Z would need to all go right, and at least one of those things is bound to take a lot more work than we expect. Ideally, I go back to my manager with a series of plans, not just one: In other words, I don’t “break down the work to determine how long it will take”. My management chain already knows how long they want it to take. My job is to figure out the set of software approaches that match that estimate. Sometimes that set is empty: the project is just impossible, no matter how you slice it. In that case, my management chain needs to get together and figure out some way to alter the requirements. But if I always said “this is impossible”, my managers would find someone else to do their estimates. When I do that, I’m drawing on a well of trust that I build up by making pragmatic estimates the rest of the time. Many engineers find this approach distasteful. One reason is that they don’t like estimating in conditions of uncertainty, so they insist on having all the unknown questions answered in advance. I have written a lot about this in Engineers who won’t commit and How I provide technical clarity to non-technical leaders , but suffice to say that I think it’s cowardly. If you refuse to estimate, you’re forcing someone less technical to estimate for you. Some engineers think that their job is to constantly push back against engineering management, and that helping their manager find technical compromises is betraying some kind of sacred engineering trust. I wrote about this in Software engineers should be a little bit cynical . If you want to spend your career doing that, that’s fine, but I personally find it more rewarding to find ways to work with my managers (who have almost exclusively been nice people). Other engineers might say that they rarely feel this kind of pressure from their directors or VPs to alter estimates, and that this is really just the sign of a dysfunctional engineering organization. Maybe! I can only speak for the engineering organizations I’ve worked in. But my suspicion is that these engineers are really just saying that they work “out of the spotlight”, where there’s not much pressure in general and teams can adopt whatever processes they want. There’s nothing wrong with that. But I don’t think it qualifies you to give helpful advice to engineers who do feel this kind of pressure. I think software engineering estimation is generally misunderstood. The common view is that a manager proposes some technical project, the team gets together to figure out how long it would take to build, and then the manager makes staffing and planning decisions with that information. In fact, it’s the reverse: a manager comes to the team with an estimate already in hand (though they might not come out and admit it), and then the team must figure out what kind of technical project might be possible within that estimate. This is because estimates are not by or for engineering teams. They are tools used for managers to negotiate with each other about planned work. Very occasionally, when a project is literally impossible, the estimate can serve as a way for the team to communicate that fact upwards. But that requires trust. A team that is always pushing back on estimates will not be believed when they do encounter a genuinely impossible proposal. When I estimate, I extract the range my manager is looking for, and only then do I go through the code and figure out what can be done in that time. I never come back with a flat “two weeks” figure. Instead, I come back with a range of possibilities, each with their own risks, and let my manager make that tradeoff. It is not possible to accurately estimate software work. Software projects spend most of their time grappling with unknown problems, which by definition can’t be estimated in advance. To estimate well, you must therefore basically ignore all the known aspects of the work, and instead try and make educated guesses about how many unknowns there are, and how scary each unknown is. edit: I should thank one of my readers, Karthik, who emailed me to ask about estimates, thus revealing to me that I had many more opinions than I thought. For anyone wincing at that time, I mean like three minutes of actual deployment and twenty-seven minutes of waiting for checks to pass or monitors to turn up green. I write a lot more about this in You can’t design software you don’t work on . For instance, imagine a mandate to improve the performance of some large Rails API, one piece at a time. I could happily do that kind of work forever. We tackle X Y Z directly, which might all go smoothly but if it blows out we’ll be here for a month We bypass Y and Z entirely, which would introduce these other risks but possibly allow us to hit the deadline We bring in help from another team who’s more familiar with X and Y, so we just have to focus on Z For anyone wincing at that time, I mean like three minutes of actual deployment and twenty-seven minutes of waiting for checks to pass or monitors to turn up green. ↩ I write a lot more about this in You can’t design software you don’t work on . ↩ For instance, imagine a mandate to improve the performance of some large Rails API, one piece at a time. I could happily do that kind of work forever. ↩

1 views