Posts in Ruby (20 found)
Giles's blog 4 days ago

Writing an LLM from scratch, part 32e -- Interventions: the learning rate

I'm still working on improving the test loss for a from-scratch GPT-2 small base model, trained on code based on Sebastian Raschka 's book " Build a Large Language Model (from Scratch) ". In my training code, I have this code to create the optimiser: The values in there -- for the learning rate, and for the weight decay -- were just copied from the tiny training run that we do in section 5.2 of the book. What do those values actually mean, and are those really the right values for them? I felt I had a good handle on the learning rate, at least -- it's one of the first things you learn when you start looking at machine learning of any kind -- but how would you go about working out what the correct value for it was? On top of that, when I was reading the Chinchilla paper a while back, I noticed they repeatedly referred to a "cosine cycle" for the learning rate, which didn't fit into anything I'd learned about before. The weight decay was pretty much an unknown for me -- I know it is a parameter controlling the behaviour of the optimiser, but I don't know how it does that. In this post I want to look into the learning rate, and these mysterious cosines; I'll write a follow-up about the weight decay later. If you're reading this blog, you almost certainly know what the learning rate is, but let's go over it briefly to build a solid foundation. The way it's normally explained, using simple gradient descent, goes something like this. Let's assume that we're training a model with just one parameter, and it starts off set to − 5 . We run some training data through, and get a loss, let's say 44.44: We don't know what shape our loss curve is (if we did, we might be able to find the lowest loss algebraically), but we do know the differential of the parameter versus the loss at the point we've measured; it happens to be -13. That is reasonably large and negative: We use that information to say that we want to move in the direction of a larger value for our parameter -- that is, in our case where the gradient is negative, so we have a downhill slope towards the right, we want to increase the parameter to move rightwards on that chart, whereas if it were positive (an uphill slope) we'd want to decrease the parameter to move leftwards. Simply subtracting the gradient from the parameter would lead to an update in the right direction, but it would be a very large one in this case -- we'd move 13 units to the right -- so we multiply the gradient by a small positive number, the learning rate (often written as a lower-case eta, like this: η ), to move a small distance in that direction. Let's say η = 0.3 . That means we want to update our parameter: So now we run that through and get a new loss -- let's say it's 9.06 -- and a new gradient, which happens to be -5.2. Now we can do another update, and our parameter will become 0.46, so we use that and work out another loss and gradient, which come to 3.3816 and -2.08. Let's plot that one, but this time we'll draw back the veil and show the actual loss curve. Now, it's worth reiterating that while we're training this model we don't know what that curve looks like -- we're just finding points on it, along with its gradient at those points, and using that information to work out which parameter value to explore next. But it's pretty clear that as we continue, if the learning rate is set correctly, we'll get to the minimum eventually if the learning rate is the right kind of size, because -- due to the nice smooth U-shape of the curve, the gradient gets smaller the closer we get to the minimum 1 . It's also pretty clear that if the learning rate is smaller than an optimal value, in this simple case we will still find the right point, but it will take more steps because each one is smaller: And, of course, if the learning rate is too high, we might never converge -- we'd "bounce out of" the dip, and wind up with a parameter value that endlessly cycles between increasingly smaller and increasingly larger values, zooming off to infinity: OK, that's the basics. Why might we want to change from something that seems so logical and simple? A few paragraphs back I said: due to the nice smooth U-shape of the curve, the gradient gets smaller the closer we get to the minimum What if it doesn't? Imagine if we had something more like a V-shaped curve, like this: The gradient does not decrease as we get closer to the minimum, and so while we're in the downward-sloping part, each update is exactly the same distance: Now, eventually we'll jump over the minimum: In this example, I've used a gradient of − 8.33 on the downward-sloping part of the curve, and + 8.33 on the upward-sloping part, so that means that our next update just bounces us back to where we were before! Because the gradient isn't decreasing the closer we get to the minimum, we wind up just oscillating around it. That's not very helpful. That's a slightly contrived example (though not entirely -- intuitively, with functions like ReLU or GELU in our real LLMs, it's easy to imagine crazy loss landscapes). But it does show that perhaps we might want to add in our own "artificial" way to decrease the size of the steps we take over the course of training our model rather than just relying on the gradients naturally flattening out for us. Another way of looking at things is that as the model gets trained, we don't want batches of very new-looking data to cause big updates, taking us away from what was a good part of the loss landscape in terms of what we've seen so far. For example, imagine you've been training an LLM on a bunch of documents, which have so far been in English. Halfway through, it encounters a document in Byzantine Greek, the loss skyrockets, and you do a big update. That would be a problem! You might want it to learn a bit from it to push it slightly in a "the world is multi-lingual" direction, but you don't want it to lose a big chunk of the value from its previous training. You might also see a kind of connection to the way that people learn over the course of their lives -- for babies, everything is new and they "update their parameters" constantly as they try to understand the world. Children are still pretty flexible, but as we get older we tend to update our beliefs less and less. That's not always optimal, but as a heuristic it's pretty adaptive. Anyway, in general: for most training runs, we're going to want the learning rate to adjust over time. Most of the time this will be by reducing it, though there can be cases for increasing it again for periods. The general case of doing this is called "learning rate scheduling". There are a bunch of ways that people adjust the learning rate over the course of a train; here are a few that cropped up a lot while I was researching this. If we want the learning rate to go down over time, and we know how many steps we're training for, we can just set it to (say) 0.0004 for the first quarter of our train, then 0.0002 for the next, then 0.0001, then finish off with 0.00005, like this: That can work pretty well! But there is one obvious oddity -- the big step changes in learning rate mean that the exact placement of the drops and the training data before and after can matter. Why are we treating the data and the state of the model immediately before and immediately after so differently? It would make more sense to have a smoother schedule. What functions decay smoothly like that? An exponential curve does: let's say we just multiply the learning rate by a number that is a little smaller than one every step, so that it drops smoothly like this: But there are lots of other curves like that, and one is particularly interesting: As you change θ from 0 to π , the value of cos θ goes smoothly from 1 to − 1 , so it's easy enough to rescale that so that our learning rate follows the same curve: This is called a "cosine annealing" or "cosine decay" schedule, and was apparently inspired by the algorithms used for simulated annealing (an optimisation algorithm that was in turn inspired by how the atomic structures form in metals as they cool -- another one for the list of things to look into in the future...) That solves the mystery from earlier: the cosine that the Chinchilla paper was talking about was exactly this. As it turns out, the cosine decay scheduling curve is quite popular in deep learning, because it has what amounts to two well-defined phases -- an initial high learning rate where lots of exploration of the loss landscape can happen, followed by a smooth transition to something more like fine-tuning to optimise the location in whatever part of the loss landscape we've wound up in. Now, all of the above are assuming that we want the learning rate to start high and finish low, so that we can mimic the textbook gradient descent that we had at the start of this post. Intuitively that feels nice, but on further thought, the important thing is really that we have a low learning rate at the end of the train, so that we can find as close a point as possible for the minimum at the part of the loss landscape we've found ourselves in. But perhaps there's a case for having both high and low periods during the train, so that we don't get stuck in a local minimum -- something to jolt us out of where we were every now and then? 2 With a step function, that's easy: you could, for example, do this: With an exponential, you could do something like this: With cosine decay, of course, things are even easier, because the cosine function is inherently cyclical, so we can just do this: However, at least for our purposes, training an LLM using a Chinchilla-optimal number of training tokens, it makes sense to be guided by what the authors of the Chinchilla paper did. Appendix B says: We find that setting the cosine cycle length too much longer than the target number of training steps results in sub-optimally trained models, as shown in Figure A1. As a result, we assume that an optimally trained model will have the cosine cycle length correctly calibrated to the maximum number of steps, given the FLOP budget; we follow this rule in our main analysis. So, at this point, I think we have one important part of the intervention we want to make: we want to use a cosine learning rate scheduler, going from high near the start of the training run, down to low at the end over one cycle. Additionally, and also from appendix B in the paper: we use a 10x learning rate decay in line with Rae et al. (2021) ...which means that if our learning rate starts at η , then we want it to decay down to η / 10 by the end. So, we just need to work out an initial value for η , and let it rip, right? Well, not so fast... When our model is uninitialised, right at the start of the train, gradients are going to be pretty wild. It's going to be making random errors all of the time, and we'll be making huge jumps across the loss landscape. That sounds bad. Additionally those kind of wild jumps can get the optimiser into a -- well, sub-optimal -- state. I haven't read enough about optimisers yet to have a solid handle on that, but that can wait -- intuitively it makes some kind of sense that erratic gradient updates might confuse it. So, it makes a certain amount of sense to start off with a low learning rate so that we don't do that, and then to increase it gradually to the peak, and only then to schedule the gradual cosine decay. According to this (rather nice looking) masterclass on LLM training , it's typical to do this over "a few thousand steps or a small percentage (e.g., 1-10%) of the total training steps, depending on the dataset size and batch size", and we would just use a linear increase over that period: I think we should do that; a simple linear warmup at the start -- let's relatively arbitrarily say 5% of our training steps going up to our desired peak learning rate. So our learning rate schedule should look something like this: So far I've written a lot about how we vary the learning rate over time, and that's all been very useful. But we still need to know what the value should be initially! In smaller-scale experiments you might just try a bunch of different numbers to see what worked well, but at more than US$30 per train, that's not practical here. Unfortunately it's really quite hard to find good suggestions published anywhere. The GPT-2 paper is (as usual) reticent: The learning rate of each model was manually tuned for the best perplexity on a 5% held-out sample of WebText ...and if you search for "learning rate training llm", you'll see lots of results for when people are fine-tuning existing LLMs ( 2 × 10 − 4 comes up a lot), but almost nothing about when you're training one from scratch. I eventually came across this (long!) post from Hugging Face , which I definitely need to spend time going through in the future, because it covers a lot of the ground I've been going over in this post series. But for this post, I think the most relevant part is in the section " Scaling Laws for Hyperparameters ", where they include a figure from this DeepSeek paper . Here it is, with some of the (also relevant) surrounding text: In our trains we're using something like 5 × 10 18 total FLOPs. Now, they are specifically charting things in terms of non-embedding FLOPs, but I'm going to play a little fast and loose here and ignore that, so reading off their chart, that looks like we should be using about 1.4 × 10 − 3 as our learning rate. We can double-check that against their formula, where C is the compute budget: Nice, a close match! However, it's definitely worth noting that we're using a simple GPT-2 architecture, and they are using something quite different -- RMSNorm instead of LayerNorm, SwiGLU as the activation function on the feed-forward networks, Rotary Position Embedding rather than the fixed ones we're using, and so on. As a sanity check: you can see that they also give a formula for the optimal batch size in terms of tokens. For our FLOP budget, that comes in at 381,782, which is about 373 of our 1,024-token sequences. That is quite a lot higher than the 97-or-so sequences that we appeared to be optimal in our earlier experiments . That is a little concerning, though of course the 97 number came out of a very ad-hoc bit of curve-fitting. For now, I'm going to hope that that doesn't matter too much for the learning rate. This may come back to bite me; if the results of a train with 1.4 × 10 − 3 are radically worse than the existing rate of 4 × 10 − 4 , I'll have to do a bit more investigation. So, now I think we have all of the theoretical pieces in place to do a train. Let's move on to the practicalities. We started by looking at this: What should we change -- disregarding the until the next post? Based on the above, we want to do a linear warmup of about 5% of our steps, going up to a learning rate of 1.4 × 10 − 3 , followed by a cosine decay down to one tenth of that, 1.4 × 10 − 4 . What does that look like in code? The relevant API for scheduling the learning rate in PyTorch is, logically enough, in the module, and there are a bunch of different scheduling classes. You create your optimiser, then create a scheduler for the shape you want, and then you can call on the scheduler (after the on the optimiser) to adjust the optimiser's learning rate over time. Let's make that more concrete; one of the schedulers is , which is what we'll need for our linear warmup period. It takes as its parameters: Let's say that we want to go from almost-zero to our optimiser's learning rate over 1,600 steps -- we'd create our scheduler like this: ...then in our training loop, after we've done the scaled step of the optimiser, we'd also step the scheduler: This confused me a little bit the first time I saw it; after all, if the scheduler hasn't been "triggered" when we step the optimiser, how does the optimiser know what learning rate to use? Surely it would just use whatever it was initialised with? The answer is that when you create the optimiser, it stores away the learning rate that you give it in two places -- an "initial learning rate" and a "current learning rate". Next, when you create your scheduler, it uses the initial learning rate to work out the start and end values, and then sets the current one to the start value immediately. Just by creating a scheduler, you're changing the optimiser's current learning rate -- but not the initial one, which is important, as we'll see in a moment. So, we have a scheduler that handles our warmup period nicely. Another scheduler that's relevant to our interests is the CosineAnnealingLR . This takes: On creation, this scheduler will read in the optimiser's initial learning rate -- note, not the current one -- and then the first time it's stepped, it will set the current learning rate to that value, and then for steps after that it will reduce it so that it follows a nice cosine decay, reaching after steps. So those two cover the two regimes that we want -- the warmup and then the cosine decay. But now we need to put them together; we want to do one and then the other. There's a very useful class, , which allows you to chain schedulers and tell it when each one takes over from the previous one. Let's sketch out some code to use that to do a train with our new peak learning rate of 1.4 × 10 − 3 , a warmup of 1,600 steps, followed by a cosine decay for the next 32,000 steps to one tenth of the peak learning rate: That actually works quite nicely! I wrote a dummy training loop to plot the current learning rate over a fake train using code like the above , and got this: ...with the output confirming that the values were good at the "milestone" point, the start and the end: I was initially a bit surprised by that, as at the time I ran it, I didn't realise that there was that split between the initial and the current learning rates on the optimiser, so I thought that the cosine scheduler would pick up whatever tiny starting value the warmup scheduler had overwritten the optimiser's learning rate with -- but that split saves the day. That means that now we have the outline of how to schedule our learning rate. But before we can put that into the code, we need to think about how it affects our checkpoints. Just like the scheduler and the optimiser, the learning rate scheduler -- or, indeed, our two schedulers here -- contain information about the state of the train. That means that if we recover from a checkpoint, we need to provide them with the information they need. If we just created them afresh, they'd start from the beginning -- for example, if we restarted from step 20,000 in a train like the one above, we'd start a new warmup from pretty much zero, and then start a fresh cosine decay. That would be bad: (Dummy test code here .) Now, we could use the parameter to initialize them with the correct current global step. But they have a state dict, like most other PyTorch objects, so the simplest thing to do is just to write that to another checkpoint file: ...and then load it likewise: (Dummy test code here .) Conveniently, if you save the state dict of a , it will also include the state of all of its component schedulers, and likewise if you reload it, it will load the components' states back in too. The one thing you have to be careful about is what they warn about in the PyTorch docs: Initializing a scheduler overwrites its optimizer’s s. When restoring a checkpoint, initialize the scheduler before calling your optimizer's to avoid overwriting the loaded learning rates. Luckily enough, in our code as it stands, we create all of the things that are checkpointed -- the optimiser and the scaler so far, but shortly the scheduler as well -- before we load in the state dicts, so that drops out quite nicely. So, we have some sketched-out code -- it's time to put it in place for the real training run. I won't go through the details of the changes to my existing DDP training code, though you can see the diff here if you're interested. Much of the complexity was due to keeping backward compatibility so that we don't have to always use a learning rate scheduler; remember that in this mini-series, I'm trying making various changes ("interventions") to the training loop in isolation, seeing whether each one improves things. So it's important to be able to easily train with or without learning rate scheduling; I did that with a flag in the Implementation-wise, initially I was thinking that it would be easiest to always have a scheduler, and in the "non-scheduled" case to just set it to a linear one that didn't change the value over the course of the train. But in the end it turned out to be easier to use as being the switch to tell the training loop which "mode" it was in. The placement of the code to create the schedulers was also a little tricky; the "natural" place was just after the optimiser is created, like it is in the example code above. However, at that point, we don't know how many global steps we're going to have in the train, because we don't have the dataset -- which means that working out the numbers to pass in to the schedulers for the warmup and decay steps would be impossible. It turned out to be easiest to put it in the function , just after the datasets are loaded, as at that point we have all of the information we need. Anyway, that's the code done, so let's see what happens! I wanted to do two trains; one with the learning rate scheduling, and one with just the new value for the learning rate, instead of . I was expecting the updated learning rate alone to be too high and to cause a very choppy train, but had high hopes for the train with the scheduling. Here's how it did; the scheduled learning rate train first: Here's what the training loss looked like over that: Quite a few loss spikes early on in the train when the learning rate is at its peak, but nothing unmanageable -- and, as you'd expect, things calmed down quite a lot later on. I also charted the learning rate, to make sure it really was doing what I thought it was doing: So, a pretty smooth train, and we definitely did the right learning rate scheduling. Time to upload it to Hugging Face , and see what the evals look like. Firstly, the smoke test: Reasonably coherent, at least, though it's not super-impressive. On to the loss on our test set: That's our best loss so far! Let's put it into the table: So, it definitely looked like it was worth it. But was it the scheduling of the learning rate that helped, or just the change from 0.0004 to 0.0014? I kicked off a second run with no scheduling, just a learning rate of 0.0014, to see what would happen. After about an hour, I noticed that the loss chart had stopped updating. The last point had a maximum and minimum loss but no average -- but after that, nothing: However, the learning rate was still being charted, so the train was definitely running: Looking at the checkpoint metadata showed what had happened. At global step 1851, we had this 3 : ...and at the next checkpoint at step 2468, we had this: ...and the same for all checkpoints thereafter. Clearly the parameters had gone off the rails -- exactly what we'd expect with an excessive learning rate: There was no point in continuing the train, as it was pretty much certainly unrecoverable, so I stopped it. Out of interest, I downloaded the model, but I couldn't even run the smoke test on it: So it was pretty clear that just updating the learning rate to 0.0014 was actively harmful. No need to upload that one to HF! And time to wrap up this experiment. While this has been quite a long post, I've really only scratched the surface of how learning rates are set. If I were doing things in more detail, the best would probably be to do a "sweep" over multiple values to try to at least approximate the best possible rate for this model. That would be pretty expensive for me, though, so I decided to stick with the DeepSeek number. It might not be ideal for the specific architecture that I'm using, given how different that is to theirs, but given the results, it's a decent one compared to what I was using. 4 Something that I found interesting is that exactly how to schedule your learning rate is still an area being actively researched. Even in my relatively minimal research, I came across three alternatives to the mainstream warmup-cosine decay pattern: I'm sure there are many more. But for this train, I decided to stick to the mainstream, and the results were pretty good! To reiterate, this has been the most positive intervention so far: So I'll stick with that, and move on to the next thing: what is the parameter that we're passing in to the AdamW optimiser? Tune in next time :-) Yes, I am foreshadowing here.  ↩ To make my earlier analogy about learning rate decaying over time in people as they age even more dubious, we can imagine this as being rather like someone middle-aged going on an ayahuasca retreat ;-)  ↩ If you're wondering how we had a valid maximum and minimum in that first checkpoint when the average was NaN, here's why: You might wonder how large labs work out the right learning rate given their training runs run to millions of dollars. The answer is there in that DeepSeek paper, as that's one of the things they were doing. They scaled their model down from the billions of parameters that they wanted to train to various smaller models, and worked out the optimal learning rate for each of the smaller models by doing full trains on them. Once they had a mapping from model size to the ideal learning rate for their architecture, they could extrapolate that to the large ones that they wanted to train. The problem is that those "smaller" models are actually quite a lot larger than the one we're training here! And while we could potentially scale it down even further, I suspect that such truly tiny models (say, 1M parameters) wouldn't train well enough to give any meaningful results.  ↩ From the paper: Specifically, the learning rate of the model reaches its maximum value after 2000 warmup steps, and then decreases to 31.6% of the maximum value after processing 80% of the training tokens. It further reduces to 10% of the maximum value after 90% of the tokens. , which is the optimiser we're applying it to. , which the optimiser's learning rate is multiplied by to work out where we want to start up. , which is likewise applied to the optimiser's learning rate to work out the value we're heading for. , which is the number of steps over which it should go from the initial learning rate to the final one. , which lets the scheduler know how many steps into its schedule it currently is -- this defaults to , meaning it hasn't started yet. This can be useful if you're resuming from a checkpoint, but for our purposes we can ignore it. , which is the same as the 's. , which is the number of steps before it reaches its minimum , the minimum learning rate we want to get to. , again the same as the 's. Per the Hugging Face paper, some people do warmup, then pause at a set level for a while, then start the cosine decay (warmup-stable-decay). DeepSeek use a relatively simple stepped function after a warmup. 5 I came across a 2025 paper " Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs " which says that a linear decay (after a warmup) outperforms cosine. Yes, I am foreshadowing here.  ↩ To make my earlier analogy about learning rate decaying over time in people as they age even more dubious, we can imagine this as being rather like someone middle-aged going on an ayahuasca retreat ;-)  ↩ If you're wondering how we had a valid maximum and minimum in that first checkpoint when the average was NaN, here's why: ↩ You might wonder how large labs work out the right learning rate given their training runs run to millions of dollars. The answer is there in that DeepSeek paper, as that's one of the things they were doing. They scaled their model down from the billions of parameters that they wanted to train to various smaller models, and worked out the optimal learning rate for each of the smaller models by doing full trains on them. Once they had a mapping from model size to the ideal learning rate for their architecture, they could extrapolate that to the large ones that they wanted to train. The problem is that those "smaller" models are actually quite a lot larger than the one we're training here! And while we could potentially scale it down even further, I suspect that such truly tiny models (say, 1M parameters) wouldn't train well enough to give any meaningful results.  ↩ From the paper: Specifically, the learning rate of the model reaches its maximum value after 2000 warmup steps, and then decreases to 31.6% of the maximum value after processing 80% of the training tokens. It further reduces to 10% of the maximum value after 90% of the tokens. ↩

0 views

Are Design Tools Relevant Anymore

I was a product designer for a few years. I had switched careers to design after suffering burn out as a software engineer. During those years, my entire day was spent in Figma, building high fidelity mockups, leading workshops and creating prototypes. While Figma helped me move quickly, rapidly iterating after receiving user feedback, the engineer part of me always felt it was a throwaway step. You build something, only to then have somebody else build it again in code. I recently had to put on my design hat again, putting together interactive prototypes around a few redesign ideas. At first, I reached for Figma, but after fiddling around for an hour, decided to go a different route. While prototyping in Figma used to be faster than building in code, that’s no longer true. With Claude Code, building out frontend components is fast . Much faster than messing with layers, frames and symbols in Figma. Let me explain. Enterprise apps have well defined brand guidelines. Colors, type, scale. They are often built off an existing component library (think Bootstrap, shadcn). This means you can use Claude in a way that follows the look and feel of your application, and is constrained to the components the development team leverages. The rails help keep Claude from going off into the deep end. Design then becomes focused on solving the user’s problem through UX, less fiddling around with UI. I can open Freeform on my iPad, sketch something out, and prompt Claude to leverage our foundation to make my sketch a reality. Then, I can dig into the code and tweak things to be just right. The result is a more interactive, true to life prototype that gives your engineering team a head start with coded components. You get better feedback from users and stakeholders as it’s easier to visualize what the final product looks like. You discover pitfalls that might not have shown up until an engineer was halfway into the card. On top of all that, you move a lot faster, you’re designing and building in 1 step rather than 2, giving your engineering team a head start once designs are finalized. So then, what’s the point of Figma and Sketch? You can tell Figma is battling with this reality by pushing Figma make. The issue is, it’s too constrained and produces poor results. You can’t link it to existing coded components, Tailwind configs, etc. On the other hand, usin my approach requires a technical background. You need to guide with framework suggestions, foundational setup and be able to takeover and tweak yourself. That said, there in the shorter term there’s likely still a place for Figma and Sketch at the table. Designing using the method I talked about requires a technical background, otherwise your results will be all over the place, and small tweaks will be next to impossible. As the technology gets better though, I’ll be surprised if Figma and Sketch survive the next couple of years.

0 views
André Arko 1 weeks ago

Four months of Ruby Central moving Ruby backward

From the moment RubyGems was first created in 2004, Ruby Central provided governance without claiming ownership , to support the Ruby community. Providing governance meant creating processes to provide stability and predictability. Avoiding ownership meant allowing the community to contribute, to the point where unpaid volunteers created and controlled the entirety of RubyGems.org for many years. Last year, Ruby Central flipped that successful formula on its head . They now claim ownership of both Bundler and RubyGems, but refuse to provide governance . Ruby Central now claims sole control over all code and decisions, despite paying for only a few percent of the work required to create and sustain the projects across 22 years. Instead of providing stable and predictable processes, Ruby Central suddenly hijacked the Bundler and RubyGems codebases away from the existing maintainers, shut out the community, and started issuing the threats to sue. When confronted by the former maintainers after the hijacking, Marty Haught of Ruby Central stated (in a recorded video call) on September 17 that “yeah, we shouldn’t have changed that”. On September 18, Marty went on to write: In the past, we’ve made the mistake of conflating ownership of the code with ownership of the infra, and vice versa, and we’d like to straighten this out so that we aren’t put in a legal bind that requires us to take control of the entire codebase when, we all agree, that is not proper or correct given the existing model. In the words of Ruby Central itself, “we all agree, [taking control of the entire codebase] is not proper or correct.” Since the beginning of this conflict, Ruby Central has privately admitted it was wrong to hijack the GitHub organization and steal the repos, but has refused to acknowledge this in public. Unfortunately, despite privately admitting their actions were wrong, Ruby Central has publicly continued to dig their hole deeper. Instead of owning up to their mistake, they secretly negotiated a deal with Matz for ruby-core to take over the stolen RubyGems and Bundler repository, further violating the project governance policies. If this situation were just about me personally, I could believe it sprang from from individual disagreements. Ruby Central claims they had good reasons to unilaterally kick me out of the project, even though I don’t think their claims hold water . With that said, regardless of what you think about me personally, the other five long-term maintainers have never gotten any explanation of why they were suddenly kicked out or bypassed entirely, all in violation of existing project governance. In her only public interview about the situation, Ruby Central Executive Director Shan Cureton defended stealing Bundler from its team of fifteen years by saying the removed team “didn’t need to have the story, and it wasn’t their story to have”. Ruby Central has made their position clear: if they steal your project, you are not entitled to know their reasons , and neither is anyone else. There is nothing “community-oriented” about stealing the most-used gem in Ruby and refusing to share your reasons with the community. Despite Ruby Central’s unacceptable treatment of both projects and maintainers, the former RubyGems and Bundler team said we want to move Ruby forward . We offered Ruby Central a path to move past their illegitimate GitHub takeover, past their vicious personal attacks, and past their threats to sue us. It has been four months since we made that offer, and Ruby Central has not accepted . While declining to accept our offer, Ruby Central has nonetheless found the time to propose new governance documents for RubyGems . In those documents, they explicitly require existing maintainers approve adding or removing team members. That rule was already present in the previous governance, and is the exact rule that Ruby Central violated to execute their takeover . When asked why they violated the previous governance, and why the new governance would be any more trustworthy, Ruby Central refused to respond substantively, and then the question itself was hidden by marking it “off topic” . Instead of working to resolve the situation, Ruby Central has spent 4 months rejecting requests for an explanation, while repeatedly threatening to sue me personally. After Ruby Central suddenly took over the Bundler repo, I sent them a standard trademark notice. They replied with a threat to sue me. When I later informed Ruby Central I had learned they violated state employment law, they simply replied with the same threat to sue me again. They are threatening to sue me for “hacking” them, despite their own analysis publicly concluding “no evidence that user data or production operations were harmed” . Without seeking common ground, or even looking for some sort of resolution we can just live with and move on from, Ruby Central has offered all of us — nothing . Ruby Central has made no offer in reply to outreach from the other five maintainers. To me, after four grueling months of private “negotiation”, their entire offer is nothing more than to refrain from suing. But only if I agree to everything that they want. They say I must agree that I have no claim on the name Bundler, despite helping create it and leading the Bundler team for the last 15 years. They say I must agree I was paid legally and fairly, when California law clearly states I was not. They say I must agree that Ruby Central can take over open source projects they host, any time they feel like it, with no explanation, and no consequences. I don’t agree. Letting this situation stay unaddressed sets a dangerous precedent for all open source projects written in Ruby. Ruby Central has resolved nothing. Don’t let their delaying tactics convince you otherwise. The Ruby community cannot trust Ruby Central with control over our gems until there is accountability for destroying the very governance they were supposed to be providing . Until accountability arrives, take action . Tell Ruby Central they owe everyone an explanation for violating the project governance around six long-term maintainers, not just me. Don’t sponsor, attend, or speak at RubyConf. Contribute to projects that aren’t controlled by Ruby Central. The exiled maintainers are working on new projects, with a focus on clear governance, long-term financial sustainability, and community input: Join the gem.coop beta, and stop using RubyGems.org. Use jwl instead of RubyGems. Use or Ruby Butler instead of Bundler. A better world is possible! Ruby Central might want to keep Ruby in the past, but we can work together to build Ruby a future .

0 views
Chris Coyier 1 weeks ago

FOREVERGREEN

In the first few minutes, Ruby says to me, “ This is like The Giving Tr ee “, and by the end, I was like, “ OK, you’re right .”

0 views
(think) 2 weeks ago

How to Vim: Build your .vimrc from Scratch

People often think that getting started with Vim means spending hours crafting an elaborate with dozens of plugins. In reality, modern Vim (9+) and Neovim ship with remarkably sane defaults, and you can get very far with a configuration that’s just a few lines long – or even no configuration at all. If you launch Vim 9 without a file, it automatically loads – a built-in configuration that provides a solid foundation. Here’s what you get for free: That’s actually a pretty reasonable editing experience out of the box! You can read the full details with . Neovim goes even further with its defaults – it enables (copies indentation from the previous line), (highlights all search matches), (makes Tab smarter at the start of a line), (reloads files changed outside the editor), always shows the statusline, and sets the command history to 10000 entries, among many other things. If you’re on Neovim, the out-of-the-box experience is excellent. See for the full list. Here’s something that trips up a lot of people: the moment you create a file – even an empty one – Vim stops loading entirely. That means you lose all those nice defaults. The fix is simple. Start your with: This loads the defaults first, and then your own settings override or extend them as needed. This gotcha only applies to Vim. Neovim’s defaults are always active regardless of whether you have an or . Here’s a minimal that builds on the defaults and adds a few things most people want: That’s five settings on top of the defaults. You might not even need all of them – already handles the fundamentals. For Neovim, you don’t need the line – all the equivalents are already active. You also get , , and for free, so the only settings left to add are the ones that are genuinely personal preference: One of the most underappreciated aspects of Vim is how much built-in support it ships for programming languages. When is active (which it is via or Neovim’s defaults), you automatically get: This means that when you open a Python file, Vim already knows to use 4-space indentation. Open a Ruby file and it switches to 2 spaces. Open a Makefile and it uses tabs. All without a single plugin or line of configuration. You can check what’s available with for syntax files or for filetype plugins. The list is impressively long. At some point you’ll probably want more than the bare minimum. Here are a few things worth considering as your next steps: And when you eventually want more plugins, you probably won’t need many. A fuzzy finder, maybe a Git integration, and perhaps a completion engine will cover most needs. But that’s a topic for another day. The key takeaway is this: don’t overthink your . Start with the defaults, add only what you actually need, and resist the urge to copy someone else’s 500-line configuration. A small, well-understood configuration beats a large, cargo-culted one every time. That’s part of the reason why when I started to re-learn Vim I’ve opted to slowly build a Vim 9 configuration from scratch, instead of jumping to something like Neovim + Kickstart.nvim or LazyVim right away. Less is more. Understanding the foundations of your editor matters. 1 Right now my is just 100 lines and I don’t foresee it becoming much bigger in the long run. If you want to see just how far you can go without plugins, I highly recommend the Thoughtbot talk How to Do 90% of What Plugins Do (With Just Vim) . It’s a great demonstration of Vim’s built-in capabilities for file finding, auto-completion, tag navigation, and more. That’s all I have for you today. Keep hacking! I guess this sounds strange coming from the author of Emacs Prelude, right?  ↩︎ – syntax highlighting – filetype detection, language-specific plugins, and automatic indentation – incremental search (results appear as you type) – keeps 5 lines of context around the cursor – shows instead of hiding truncated lines – mouse support in all modes remapped to (text formatting) instead of the mostly useless Ex mode And several other quality-of-life improvements Syntax highlighting for hundreds of languages – Vim ships with around 770+ syntax definitions Language-specific indentation rules for over 420 file types Filetype plugins that set sensible options per language (e.g., , , ) A colorscheme – Vim ships with several built-in options (try followed by Tab to see them). Recent Vim builds even bundle Catppuccin – a beautiful pastel theme that I’m quite fond of. Another favorite of mine is Tokyo Night , which you’ll need to install as a plugin. Neovim’s default colorscheme has also been quite good since 0.10. Persistent undo – lets you undo changes even after closing and reopening a file. A game changer. Clipboard integration – makes yank and paste use the system clipboard by default. vim-unimpaired – if you’re on classic Vim (not Neovim), I think Tim Pope’s vim-unimpaired is essential. It adds a consistent set of / mappings for navigating quickfix lists, buffers, adding blank lines, and much more. Neovim 0.11+ has adopted many of these as built-in defaults, but on Vim there’s no substitute. I guess this sounds strange coming from the author of Emacs Prelude, right?  ↩︎

0 views
DHH 3 weeks ago

Omacon comes to New York

The vibes around Linux are changing fast. Companies of all shapes and sizes are paying fresh attention. The hardware game on x86 is rapidly improving. And thanks to OpenCode and Claude Code, terminal user interfaces (TUIs) are suddenly everywhere. It's all this and Omarchy that we'll be celebrating in New York City on April 10 at the Shopify SoHo Space for the first OMACON! We've got an incredible lineup of speakers coming. The creator of Hyprland, Vaxry, will be there. Along with ThePrimeagen and TJ DeVries. You'll see OpenCode creator Dax Raad. Omarchy power contributors Ryan Hughes and Bjarne Øverli. As well as Chris Powers (Typecraft) and myself as Linux superfans. All packed into a single day of short sessions, plenty of mingle time, and some good food. Tickets go on sale tomorrow (February 19) at 10am EST. We only have room for 130 attendees total, so I imagine the offered-at-cost $299 tickets will go quickly. But if you can't manage to snatch a ticket in time, we'll also be recording everything, so you won't be left out entirely. But there is just something special about being together in person about a shared passion. I've felt the intensity of that three years in a row now with Rails World. There's an endless amount of information and instruction available online, but a sense of community and connection is far more scarce. We nerds need this. We also need people to JUST DO THINGS. Like kick off a fresh Linux distribution together with over three hundred contributors so far all leaning boldly into aesthetics, ergonomics, and that omakase spirit.  Omarchy only came about last summer, now we're seeing 50,000 ISO downloads a week, 30,000 people on the Discord, and now our very first exclusive gathering in New York City. This is open source at its best. People from all over, coming together, making cool shit. (Oh, and thanks to Shopify and Tobi for hosting. You gotta love when a hundred-plus billion dollar company like this is run by an uber nerd who can just sign off on doing something fun and cool for the community without any direct plausible payback.)

0 views

Leading Without a Map

No one can deny that our industry is in a period of great change. This industry never stops, and the rate goes up and down but change is a constant. Like it or not " change calls the tune we dance to ." One of the biggest reasons people resist change, even people who joined the software business to "change the world" is when they feel it threatens their self-perception and identity. In the west our job is often the primary piece of our identity. One sees it everywhere. Your LinkedIn profile has your name first, and some sort of job title or role description second. Heck even contestants on Jeopardy are introduced as "A marketing consultant from Eyebrow, Saskatchewan ." When completing the sentence "I am a..." most people pick their job. When change is high, that self-conception can quickly feel under threat. Even in the small it can happen. Your company decides they'd be better served writing new code in Java rather than Python or Ruby, you can expect a few "Pythonistas" or "Rubyists" to push back. In their heart of hearts they may agree with the decision on its merits but they nevertheless feel that their very identity is under threat. This can also include their social group/community/tribe membership, something that humans are genetically programmed to value and protect. So it's no doubt understandable that change can bring out strange and unpredictable behaviour in people when they feel like there's risk to their identity, self concept, or tribal membership. Well, first of all, acknowledge to ourselves that we are not immune from these phenomena either. Presumably most of us started out as software developers ourselves and when we started managing the people who did the job, it was the job we used to do so we got it. Over time, that's drifted. New frameworks and paradigms have emerged, new 'best' practices replaced the old 'best' practices and we became less intimately familiar with the day-to-day things our people were doing. This is uncomfortable at times, but we adapt. We learn what we can to stay involved at the right level and to coach and guide the people we're responsible for. Now, the game is changing in a much more fundamental and profound way. And it's happening fast. I don't know what the job of software developer is going to look like in a year from now (or even 6 months for that matter) and, frankly, neither does anyone else. This makes the job of manager much much harder. Your people are used to you having at least some concept of a map and sharing it with them and you don't have one. Everyone's figuring it out together. A good friend and former colleague once described an aspect of leadership as "smiling while the sky is falling." I'm not sure if he came up with it or if I should attribute it to someone else but I heard it from him first. My point here isn't that the sky is falling but rather, when your people are worried, you need to appear steadfast or you make the problem worse. You don't owe them certainty , because that would be dishonest and they'll clock your dishonesty whether they admit it or not. But just like in incident response, panic serves no one . You owe them calm reassurance that you're going to navigate this new world together and that you've got their best-interests at heart. You do this even though you might be feeling the same threat to your identity. You manage engineers but they're becoming some kind of new thing; bot-wranglers. Some of your other responsibilities are being offloaded to LLMs and everyone's role is going to keep changing until things inevitably settle down again (relatively speaking). With no playbook, we need some kind of framework for decision making. This is where we can fall back to 'first principles'. For me these are the things I hold important. Really, the basics: It sounds simple, and really, it is. Taking care of the people right now means recognizing that they're feeling that identity risk. The worst thing you can do is try to talk them out of it or convince them they're not feeling what they're feeling. Acknowledge that things are changing. Maintain ' esprit de corps ' as best you can. Draw on your experience navigating big changes before. If you've been around this industry for any amount of time, you've been through some big paradigm shifts and come out the other side. Tell some stories, but don't make it all about you. The business and customer angles come down to maintaining consistent principles around what software gets shipped to customers. I personally have the pleasing-to-nobody opinion that LLM coding tools are useful but not risk-free. Surely you have some skeptics in your midst who feel the same. Don't dismiss them either. Security, quality, maintainability, incident response, and the work-life balance of your people are still the responsibility of the humans running the company. That's the job right now, however the machinery of it changes. Keep taking care of your people and customers, like you always have. You already know how. " Statue of Captain George Vancouver, anchors and the Custom House, King's Lynn " by ell brown is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Get in touch ! Doing my best to take care of the people. Doing what the business needs most at the given moment. Providing value to customers.

1 views
Max Bernstein 3 weeks ago

Type-based alias analysis in the Toy Optimizer

Another entry in the Toy Optimizer series . Last time, we did load-store forwarding in the context of our Toy Optimizer. We managed to cache the results of both reads from and writes to the heap—at compile-time! We were careful to mind object aliasing: we separated our heap information into alias classes based on what offset the reads/writes referenced. This way, if we didn’t know if object and aliased, we could at least know that different offsets would never alias (assuming our objects don’t overlap and memory accesses are on word-sized slots). This is a coarse-grained heuristic. Fortunately, we often have much more information available at compile-time than just the offset, so we should use it. I mentioned in a footnote that we could use type information, for example, to improve our alias analysis. We’ll add a lightweight form of type-based alias analysis (TBAA) (PDF) in this post. We return once again to Fil Pizlo land, specifically How I implement SSA form . We’re going to be using the hierarchical heap effect representation from the post in our implementation, but you can use your own type representation if you have one already. This representation divides the heap into disjoint regions by type. Consider, for example, that objects and objects do not overlap. A pointer is never going to alias an pointer. They can therefore be reasoned about separately. But sometimes you don’t have perfect type information available. If you have in your language an base class of all objects, then the heap overlaps with, say, the heap. So you need some way to represent that too—just having an enum doesn’t work cleanly. Here is an example simplified type hierarchy: Where might represent different parts of the runtime’s data structures, and could be further segmented into , , etc. Fil’s idea is that we can represent each node in that hierarchy with a tuple of integers (inclusive, exclusive) that represent the pre- and post-order traversals of the tree. Or, if tree traversals are not engraved into your bones, they represent the range of all the nested objects within them. Then the “does this write interfere with this read” check—the aliasing check—is a range overlap query. Here’s a perhaps over-engineered Python implementation of the range and heap hierarchy based on the Ruby generator and C++ runtime code from JavaScriptCore: Where kicks off the tree-numbering scheme. Fil’s implementation also covers a bunch of abstract heaps such as SSAState and Control because his is used for code motion and whatnot. That can be added on later but we will not do so in this post. So there you have it: a type representation. Now we need to use it in our load-store forwarding. Recall that our load-store optimization pass looks like this: At its core, it iterates over the instructions, keeping a representation of the heap at compile-time. Reads get cached, writes get cached, and writes also invalidate the state of compile-time information about fields that may alias. In this case, our may alias asks only if the offsets overlap. This means that the following unit test will fail: This test is expecting the write to to still remain cached even though we wrote to the same offset in —because we have annotated as being an and as being a . If we account for type information in our alias analysis, we can get this test to pass. After doing a bunch of fussing around with the load-store forwarding (many rewrites), I eventually got it down to a very short diff: If we don’t have any type/alias information, we default to “I know nothing” ( ) for each object. Then we check range overlap. The boolean logic in looks a little weird, maybe. But we can also rewrite (via DeMorgan’s law) as: So, keeping all the cached field state about fields that are known by offset and by type not to alias. Maybe that is clearer (but not as nice a diff). Note that the type representation is not so important here! You could use a bitset version of the type information if you want. The important things are that you can cheaply construct types and check overlap between them. Nice, now our test passes! We can differentiate between memory accesses on objects of different types. But what if we knew more? Sometimes we know where an object came from. For example, we may have seen it get allocated in the trace. If we saw an object’s allocation, we know that it does not alias (for example) any object that was passed in via a parameter. We can use this kind of information to our advantage. For example, in the following made up IR snippet: We know that (among other facts) doesn’t alias or because we have seen its allocation site. I saw this in the old V8 IR Hydrogen’s lightweight alias analysis 1 : There is plenty of other useful information such as: If you have other fun ones, please write in. We only handle loads and stores in our optimizer. Unfortunately, this means we may accidentally cache stale information. Consider: what happens if a function call (or any other opaque instruction) writes into an object we are tracking? The conservative approach is to invalidate all cached information on a function call. This is definitely correct, but it’s a bummer for the optimizer. Can we do anything? Well, perhaps we are calling a well-known function or a specific IR instruction. In that case, we can annotate it with effects in the same abstract heap model: if the instruction does not write, or only writes to some heaps, we can at least only partially invalidate our heap. However, if the function is unknown or otherwise opaque, we need at least more advanced alias information and perhaps even (partial) escape analysis. Consider: even if an instruction takes no operands, we have no idea what state it has access to. If it writes to any object A, we cannot safely cache information about any other object B unless we know for sure that A and B do not alias. And we don’t know what the instruction writes to. So we may only know we can cache information about B because it was allocated locally and has not escaped. Some runtimes such as ART pre-compute all of their alias information in a bit matrix. This makes more sense if you are using alias information in a full control-flow graph, where you might need to iterate over the graph a few times. In a trace context, you can do a lot in one single pass—no need to make a matrix. As usual, this is a toy IR and a toy optimizer, so it’s hard to say how much faster it makes its toy programs. In general, though, there is a dial for analysis and optimization that goes between precision and speed. This is a happy point on that dial, only a tiny incremental analysis cost bump above offset-only invalidation, but for higher precision. I like that tradeoff. Also, it is very useful in JIT compilers where generally the managed language is a little better-behaved than a C-like language . Somewhere in your IR there will be a lot of duplicate loads and stores from a strength reduction pass, and this can clean up the mess. Thanks for joining as I work through a small use of type-based alias analysis for myself. I hope you enjoyed. Thank you to Chris Gregory for helpful feedback. I made a fork of V8 to go spelunk around the Hydrogen IR. I reset the V8 repo to the last commit before they deleted it in favor of their new Sea of Nodes based IR called TurboFan.  ↩ If we know at compile-time that object A has 5 at offset 0 and object B has 7 at offset 0, then A and B don’t alias (thanks, CF) In the RPython JIT in PyPy, this is used to determine if two user (Python) objects don’t alias because we know the contents of the user (Python) class field Object size (though perhaps that is a special case of the above bullet) Field size/type Deferring alias checks to run-time Have a branch I made a fork of V8 to go spelunk around the Hydrogen IR. I reset the V8 repo to the last commit before they deleted it in favor of their new Sea of Nodes based IR called TurboFan.  ↩

0 views
ava's blog 1 months ago

when exercise started helping me

Nowadays, exercising really always saves me without fail. I realized that today, after again feeling absolutely terrible but then dragging myself out of bed to at least walk on my foldable treadmill. I started wondering when this change exactly happened and what led to it, because I used to hate exercise. I didn't understand people who said it helped with depression. When did it truly start being a reliable way to improve my mental state? What I struggled with back then were most definitely access, energy and health . I neither had a gym membership, nor did I have gym equipment at home. Wanting to exercise consisted of pulling out some yoga mat to do crunches like once a year, or going out for a run. Both suck when you haven't built it up over weeks or months! It was immediately difficult, painful and exhausting. My undiagnosed autoimmune diseases added more pain on top; I was just too inflamed to really work out well or even recover for days on end, and I dealt with a lot of fatigue on top of everything. That makes starting and keeping at it almost impossible, except for unexpected good phases. Without at least showing up semi-regularly, I made no progress, and every attempt I did make was immediately very exhausting with no reward. I felt like I couldn't last long enough in a session or exercise regimen to even reap the benefits. It didn't help at all that I immediately always chose something rather difficult or exhausting, as if I had to jump onto a level at which I expected a "default" human being to be at. So what changed is: I was diagnosed and found a working treatment. This one is big; so much pain and fatigue gone. Training results finally showed and made getting motivated and back on track easier. Some exercise even started helping with the residual pain and symptoms. I searched for things to do that were easier on me. I shouldn't immediately run or do crunches. Instead, even just walking, yoga, and some easy Pilates are enough, and more manageable to someone in my position. They are easier to pick back up after a few weeks and allow great control over varying the difficulty. With running, for example, I had no room to vary anything; even just the act of running was so exhausting back then that adjusting speed made no difference. With other forms of movement, I could build something without feeling totally exhausted. I signed up for the gym and just made showing up and walking on the treadmill a goal, and I watched videos or listened to podcasts. This was needed, because when I started it, I was still recovering from a really bad flare up and couldn't be trusted to walk around unsupervised in the forest somewhere. At the gym while just walking, I could slowly build up my exercise tolerance and endurance while seeing it as a sort of "me time" with some enjoyable videos, and with people around in case I suddenly started feeling dizzy or anything, and with some rails to hold on to. By saving videos for this time, I made it more entertaining and had something to look forward to on it. I invested in a spinning bike, and later in a foldable treadmill for at home use. I sometimes feel too bad physically or mentally to make it to the gym (or it is closed), and this enables me to still work out without being discouraged by my issues, time or weather. It also takes away the calculation of "Is it even worth showing up?" if I might just feel like 20 minutes of treadmill that day. Better 20 minutes than nothing! With all that, I slowly built up enough of a a baseline fitness for me that wouldn't make training annoying and just exhausting. It was easier to get back in after a break, and every time I had to take one, I had lost less progress than before. I got better and better at finding my sweet spot, neither under- nor overexercising. The more times I actually pushed myself to exercise despite feeling awful mentally and left it happier, the more it didn't feel like an outlier, but a guaranteed outcome. That made it easier to show up despite everything. It's still hard, but I know now that it is basically like a button to improve my mood, and who doesn't want that? That behavior just keeps getting reinforced every time I can get myself out of a hole with this. It gets harder and harder to convincingly tell myself " No, this time will be different; you'll feel the same or worse when you do this. You should stay in bed instead. " Lying down has a much worse track record: It never makes me feel better. Reply via email Published 12 Feb, 2026 I was diagnosed and found a working treatment. This one is big; so much pain and fatigue gone. Training results finally showed and made getting motivated and back on track easier. Some exercise even started helping with the residual pain and symptoms. I searched for things to do that were easier on me. I shouldn't immediately run or do crunches. Instead, even just walking, yoga, and some easy Pilates are enough, and more manageable to someone in my position. They are easier to pick back up after a few weeks and allow great control over varying the difficulty. With running, for example, I had no room to vary anything; even just the act of running was so exhausting back then that adjusting speed made no difference. With other forms of movement, I could build something without feeling totally exhausted. I signed up for the gym and just made showing up and walking on the treadmill a goal, and I watched videos or listened to podcasts. This was needed, because when I started it, I was still recovering from a really bad flare up and couldn't be trusted to walk around unsupervised in the forest somewhere. At the gym while just walking, I could slowly build up my exercise tolerance and endurance while seeing it as a sort of "me time" with some enjoyable videos, and with people around in case I suddenly started feeling dizzy or anything, and with some rails to hold on to. By saving videos for this time, I made it more entertaining and had something to look forward to on it. I invested in a spinning bike, and later in a foldable treadmill for at home use. I sometimes feel too bad physically or mentally to make it to the gym (or it is closed), and this enables me to still work out without being discouraged by my issues, time or weather. It also takes away the calculation of "Is it even worth showing up?" if I might just feel like 20 minutes of treadmill that day. Better 20 minutes than nothing! With all that, I slowly built up enough of a a baseline fitness for me that wouldn't make training annoying and just exhausting. It was easier to get back in after a break, and every time I had to take one, I had lost less progress than before. I got better and better at finding my sweet spot, neither under- nor overexercising. The more times I actually pushed myself to exercise despite feeling awful mentally and left it happier, the more it didn't feel like an outlier, but a guaranteed outcome. That made it easier to show up despite everything. It's still hard, but I know now that it is basically like a button to improve my mood, and who doesn't want that?

0 views

Rewriting pycparser with the help of an LLM

pycparser is my most widely used open source project (with ~20M daily downloads from PyPI [1] ). It's a pure-Python parser for the C programming language, producing ASTs inspired by Python's own . Until very recently, it's been using PLY: Python Lex-Yacc for the core parsing. In this post, I'll describe how I collaborated with an LLM coding agent (Codex) to help me rewrite pycparser to use a hand-written recursive-descent parser and remove the dependency on PLY. This has been an interesting experience and the post contains lots of information and is therefore quite long; if you're just interested in the final result, check out the latest code of pycparser - the main branch already has the new implementation. While pycparser has been working well overall, there were a number of nagging issues that persisted over years. I began working on pycparser in 2008, and back then using a YACC-based approach for parsing a whole language like C seemed like a no-brainer to me. Isn't this what everyone does when writing a serious parser? Besides, the K&R2 book famously carries the entire grammar of the C99 language in an appendix - so it seemed like a simple matter of translating that to PLY-yacc syntax. And indeed, it wasn't too hard, though there definitely were some complications in building the ASTs for declarations (C's gnarliest part ). Shortly after completing pycparser, I got more and more interested in compilation and started learning about the different kinds of parsers more seriously. Over time, I grew convinced that recursive descent is the way to go - producing parsers that are easier to understand and maintain (and are often faster!). It all ties in to the benefits of dependencies in software projects as a function of effort . Using parser generators is a heavy conceptual dependency: it's really nice when you have to churn out many parsers for small languages. But when you have to maintain a single, very complex parser, as part of a large project - the benefits quickly dissipate and you're left with a substantial dependency that you constantly grapple with. And then there are the usual problems with dependencies; dependencies get abandoned, and they may also develop security issues. Sometimes, both of these become true. Many years ago, pycparser forked and started vendoring its own version of PLY. This was part of transitioning pycparser to a dual Python 2/3 code base when PLY was slower to adapt. I believe this was the right decision, since PLY "just worked" and I didn't have to deal with active (and very tedious in the Python ecosystem, where packaging tools are replaced faster than dirty socks) dependency management. A couple of weeks ago this issue was opened for pycparser. It turns out the some old PLY code triggers security checks used by some Linux distributions; while this code was fixed in a later commit of PLY, PLY itself was apparently abandoned and archived in late 2025. And guess what? That happened in the middle of a large rewrite of the package, so re-vendoring the pre-archiving commit seemed like a risky proposition. On the issue it was suggested that "hopefully the dependent packages move on to a non-abandoned parser or implement their own"; I originally laughed this idea off, but then it got me thinking... which is what this post is all about. The original K&R2 grammar for C99 had - famously - a single shift-reduce conflict having to do with dangling else s belonging to the most recent if statement. And indeed, other than the famous lexer hack used to deal with C's type name / ID ambiguity , pycparser only had this single shift-reduce conflict. But things got more complicated. Over the years, features were added that weren't strictly in the standard but were supported by all the industrial compilers. The more advanced C11 and C23 standards weren't beholden to the promises of conflict-free YACC parsing (since almost no industrial-strength compilers use YACC at this point), so all caution went out of the window. The latest (PLY-based) release of pycparser has many reduce-reduce conflicts [2] ; these are a severe maintenance hazard because it means the parsing rules essentially have to be tie-broken by order of appearance in the code. This is very brittle; pycparser has only managed to maintain its stability and quality through its comprehensive test suite. Over time, it became harder and harder to extend, because YACC parsing rules have all kinds of spooky-action-at-a-distance effects. The straw that broke the camel's back was this PR which again proposed to increase the number of reduce-reduce conflicts [3] . This - again - prompted me to think "what if I just dump YACC and switch to a hand-written recursive descent parser", and here we are. None of the challenges described above are new; I've been pondering them for many years now, and yet biting the bullet and rewriting the parser didn't feel like something I'd like to get into. By my private estimates it'd take at least a week of deep heads-down work to port the gritty 2000 lines of YACC grammar rules to a recursive descent parser [4] . Moreover, it wouldn't be a particularly fun project either - I didn't feel like I'd learn much new and my interests have shifted away from this project. In short, the Potential well was just too deep. I've definitely noticed the improvement in capabilities of LLM coding agents in the past few months, and many reputable people online rave about using them for increasingly larger projects. That said, would an LLM agent really be able to accomplish such a complex project on its own? This isn't just a toy, it's thousands of lines of dense parsing code. What gave me hope is the concept of conformance suites mentioned by Simon Willison . Agents seem to do well when there's a very clear and rigid goal function - such as a large, high-coverage conformance test suite. And pycparser has an very extensive one . Over 2500 lines of test code parsing various C snippets to ASTs with expected results, grown over a decade and a half of real issues and bugs reported by users. I figured the LLM can either succeed or fail and throw its hands up in despair, but it's quite unlikely to produce a wrong port that would still pass all the tests. So I set it to run. I fired up Codex in pycparser's repository, and wrote this prompt just to make sure it understands me and can run the tests: Codex figured it out (I gave it the exact command, after all!); my next prompt was the real thing [5] : Here Codex went to work and churned for over an hour . Having never observed an agent work for nearly this long, I kind of assumed it went off the rails and will fail sooner or later. So I was rather surprised and skeptical when it eventually came back with: It took me a while to poke around the code and run it until I was convinced - it had actually done it! It wrote a new recursive descent parser with only ancillary dependencies on PLY, and that parser passed the test suite. After a few more prompts, we've removed the ancillary dependencies and made the structure clearer. I hadn't looked too deeply into code quality at this point, but at least on the functional level - it succeeded. This was very impressive! A change like the one described above is impossible to code-review as one PR in any meaningful way; so I used a different strategy. Before embarking on this path, I created a new branch and once Codex finished the initial rewrite, I committed this change, knowing that I will review it in detail, piece-by-piece later on. Even though coding agents have their own notion of history and can "revert" certain changes, I felt much safer relying on Git. In the worst case if all of this goes south, I can nuke the branch and it's as if nothing ever happened. I was determined to only merge this branch onto main once I was fully satisfied with the code. In what follows, I had to git reset several times when I didn't like the direction in which Codex was going. In hindsight, doing this work in a branch was absolutely the right choice. Once I've sufficiently convinced myself that the new parser is actually working, I used Codex to similarly rewrite the lexer and get rid of the PLY dependency entirely, deleting it from the repository. Then, I started looking more deeply into code quality - reading the code created by Codex and trying to wrap my head around it. And - oh my - this was quite the journey. Much has been written about the code produced by agents, and much of it seems to be true. Maybe it's a setting I'm missing (I'm not using my own custom AGENTS.md yet, for instance), but Codex seems to be that eager programmer that wants to get from A to B whatever the cost. Readability, minimalism and code clarity are very much secondary goals. Using raise...except for control flow? Yep. Abusing Python's weak typing (like having None , false and other values all mean different things for a given variable)? For sure. Spreading the logic of a complex function all over the place instead of putting all the key parts in a single switch statement? You bet. Moreover, the agent is hilariously lazy . More than once I had to convince it to do something it initially said is impossible, and even insisted again in follow-up messages. The anthropomorphization here is mildly concerning, to be honest. I could never imagine I would be writing something like the following to a computer, and yet - here we are: "Remember how we moved X to Y before? You can do it again for Z, definitely. Just try". My process was to see how I can instruct Codex to fix things, and intervene myself (by rewriting code) as little as possible. I've mostly succeeded in this, and did maybe 20% of the work myself. My branch grew dozens of commits, falling into roughly these categories: Interestingly, after doing (3), the agent was often more effective in giving the code a "fresh look" and succeeding in either (1) or (2). Eventually, after many hours spent in this process, I was reasonably pleased with the code. It's far from perfect, of course, but taking the essential complexities into account, it's something I could see myself maintaining (with or without the help of an agent). I'm sure I'll find more ways to improve it in the future, but I have a reasonable degree of confidence that this will be doable. It passes all the tests, so I've been able to release a new version (3.00) without major issues so far. The only issue I've discovered is that some of CFFI's tests are overly precise about the phrasing of errors reported by pycparser; this was an easy fix . The new parser is also faster, by about 30% based on my benchmarks! This is typical of recursive descent when compared with YACC-generated parsers, in my experience. After reviewing the initial rewrite of the lexer, I've spent a while instructing Codex on how to make it faster, and it worked reasonably well. While working on this, it became quite obvious that static typing would make the process easier. LLM coding agents really benefit from closed loops with strict guardrails (e.g. a test suite to pass), and type-annotations act as such. For example, had pycparser already been type annotated, Codex would probably not have overloaded values to multiple types (like None vs. False vs. others). In a followup, I asked Codex to type-annotate pycparser (running checks using ty ), and this was also a back-and-forth because the process exposed some issues that needed to be refactored. Time will tell, but hopefully it will make further changes in the project simpler for the agent. Based on this experience, I'd bet that coding agents will be somewhat more effective in strongly typed languages like Go, TypeScript and especially Rust. Overall, this project has been a really good experience, and I'm impressed with what modern LLM coding agents can do! While there's no reason to expect that progress in this domain will stop, even if it does - these are already very useful tools that can significantly improve programmer productivity. Could I have done this myself, without an agent's help? Sure. But it would have taken me much longer, assuming that I could even muster the will and concentration to engage in this project. I estimate it would take me at least a week of full-time work (so 30-40 hours) spread over who knows how long to accomplish. With Codex, I put in an order of magnitude less work into this (around 4-5 hours, I'd estimate) and I'm happy with the result. It was also fun . At least in one sense, my professional life can be described as the pursuit of focus, deep work and flow . It's not easy for me to get into this state, but when I do I'm highly productive and find it very enjoyable. Agents really help me here. When I know I need to write some code and it's hard to get started, asking an agent to write a prototype is a great catalyst for my motivation. Hence the meme at the beginning of the post. One can't avoid a nagging question - does the quality of the code produced by agents even matter? Clearly, the agents themselves can understand it (if not today's agent, then at least next year's). Why worry about future maintainability if the agent can maintain it? In other words, does it make sense to just go full vibe-coding? This is a fair question, and one I don't have an answer to. Right now, for projects I maintain and stand behind , it seems obvious to me that the code should be fully understandable and accepted by me, and the agent is just a tool helping me get to that state more efficiently. It's hard to say what the future holds here; it's going to interesting, for sure. There was also the lexer to consider, but this seemed like a much simpler job. My impression is that in the early days of computing, lex gained prominence because of strong regexp support which wasn't very common yet. These days, with excellent regexp libraries existing for pretty much every language, the added value of lex over a custom regexp-based lexer isn't very high. That said, it wouldn't make much sense to embark on a journey to rewrite just the lexer; the dependency on PLY would still remain, and besides, PLY's lexer and parser are designed to work well together. So it wouldn't help me much without tackling the parser beast. The code in X is too complex; why can't we do Y instead? The use of X is needlessly convoluted; change Y to Z, and T to V in all instances. The code in X is unclear; please add a detailed comment - with examples - to explain what it does.

0 views
Justin Duke 1 months ago

Brief notes on migrating to Postgres-backed jobs

It seems premature to talk about a migration that is only halfway done, even if it's the hard half that's done — but I think there's something useful in documenting the why and how of a transition while you're still in the thick of it, before the revisionist history of completion sets in. Early last year, we built out a system for running background jobs directly against Postgres within Django. This very quickly got abstracted out into a generic task runner — shout out to Brandur and many other people who have been beating this drum for a while. And as far as I can tell, this concept of shifting away from Redis and other less-durable caches for job infrastructure is regaining steam on the Rails side of the ecosystem, too. The reason we did it was mostly for ergonomics around graceful batch processing. It is significantly easier to write a poller in Django for stuff backed by the ORM than it is to try and extend RQ or any of the other task runner options that are Redis-friendly. Django gives you migrations, querysets, admin visibility, transactional guarantees — all for free, all without another moving part. And as we started using it and it proved stable, we slowly moved more and more things over to it. At the time of this writing, around half of our jobs by quantity — which represent around two-thirds by overall volume — have been migrated over from RQ onto this system. This is slightly ironic given that we also last year released django-rq-cron , a library that, if I have my druthers, we will no longer need. Fewer moving parts is the watchword. We're removing spindles from the system and getting closer and closer to a simple, portable, and legible stack of infrastructure.

1 views
Steve Klabnik 1 months ago

The most important thing when working with LLMs

Okay, so you’ve got the basics of working with Claude going. But you’ve probably run into some problems: Claude doesn’t do what you want it to do, it gets confused about what’s happening and goes off the rails, all sorts of things can go wrong. Let’s talk about how to improve upon that. The most important thing that you can do when working with an LLM is give it a way to quickly evaluate if it’s doing the right thing, and if it isn’t, point it in the right direction. This is incredibly simple, yet, like many simple things, also wildly complex. But if you can keep this idea in mind, you’ll be well equipped to become effective when working with agents. A long time ago, I used to teach programming classes. Many of these were to adults, but some of them were to children. Teenaged children, but children nonetheless. We used to do an exercise to try and help them understand the difference between talking in English and talking in Ruby, or JavaScript, or whatever kind of programming language rather than human language. The exercise went like this: I would have a jar of peanut butter, a jar of jelly, a loaf of bread, a spoon, and a knife. I would ask the class to take a piece of paper and write down a series of steps to make a peanut butter and jelly sandwich. They’d all then give me their algorithms, and the fun part for me began: find one that’s innocently written that I could hilariously misinterpret. For example, I might find one like: I’d read this aloud to the class, you all understand this is a recipe for a peanut butter and jelly sandwich, right? I’d take the jar of peanut butter and place it upon the unopened bag of bread. I’d do the same with the jar of jelly. This would of course, squish the bread, which feels slightly transgressive given that you’re messing up the bread, so the kids would love that. I’d then say something like “the bread is already together, I do not understand this instruction.” After the inevitable laughter died down, I’d make my point: the computer will do exactly what you say, but not what you mean. So you have to get good at figuring out when you said something different than what you mean. Sort of ironically, LLMs are kind of the inverse of this: they’ll sometimes try to figure out what you mean, and then do that, rather than simply doing what you say. But the core thing here is the same: semantic drift from what we intended our program to do, and what it actually does. The second lesson is something I came up with sometime, I don’t even remember how exactly. But it’s something I told my students a lot. And that’s this: If your program did everything you wanted without problems, you wouldn’t be programming: you’d be using your program. The act of programming is itself perpetually to be in a state where something is either inadequate or broken, and the job is to fix that. I also think this is a bit simplistic but also getting at something. I had originally come up with this in the context of trying to explain how you need to manage your frustration when programming; if you get easily upset by something not working, doing computer programming might not be for you. But I do think these two things combine into something that gets to the heart of what we do: we need to understand what it is we want our software to do, and then make it do that. Sometimes, our software doesn’t do something yet. Sometimes, it does something, but incorrectly. Both of these cases result in a divergence from the program’s intended behavior. So, how do we know if our program does what it should do? Well, what we’ve been doing so far is: This is our little mini software development lifecycle, or “SDLC.” This process works, but is slow. That’s great for getting the feel of things, but programmers are process optimizers by trade. One of my favorite tools for optimization is called Amdahl’s law . The core idea is this, formulated in my own words: If you have a process that takes multiple steps, and you want to speed it up, if you optimize only one step, the maximum amount of speedup you’ll get is determined by the portion of the process that step takes. In other words, imagine we have a three step process: This process takes a total of 13 minutes to complete. If we speed up step 3 by double, it goes from two minutes to one minute, and now our process takes 12 minutes. However, if we were able to speed up step 2 by double, we’d cut off five minutes, and our process would now take 8 minutes. We can use this style of analysis to guide our thinking in many ways, but the most common way, for me, is to decide where to put my effort. Given the process above, I’m going to look at step 2 first to try and figure out how to make it faster. That doesn’t mean we can achieve the 2x speedup, but heck, if we get a 10% decrease in time, that’s the same time as if we did get a 2x on step 3. So it’s at least the place where we should start. I chose the above because, well, I think it properly models the proportion of time we’re taking when doing things with LLMs: we spend some time asking it to do something, and we spend a bit more time reviewing its output. But we spend a lot of time clicking “accept edit,” and a lot of time allowing Claude to execute tools. This will be our next step forward, as this will increase our velocity when working with the tools significantly. However, like with many optimization tasks, this is easier said than done. The actual mechanics of improving the speed of this step are simple at first: hit to auto-accept edits, and “Yes, and don’t ask again for commands” when you think the is safe for Claude to run. By doing this, once you have enough commands allowed, your input for step 2 of our development loop can drop to zero. Of course, it takes time for Claude to actually implement what you’ve asked, so it’s not like our 13 minute process drops to three, but still, this is a major efficiency step. But we were actively monitoring Claude for a reason. Claude will sometimes do incorrect things, and we need to correct it. At some point, Claude will say “Hey I’ve finished doing what you asked of me!” and it doesn’t matter how fast it does step 2 if we get to step 3 and it’s just incorrect, and we need to throw everything out and try again. So, how do we get Claude to guide itself in the right direction? A useful technique for figuring out what you should do is to consider the ending: where do we want to go? That will inform what we need to do to get there. Well, the ending of step 2 is knowing when to transition to step 3. And that transition is gated by “does the software do what it is supposed to do?” That’s a huge question! But in practice, we can do what we always do: start simple, and iterate from there. Right now, the transition from step 2 to step 3 is left up to Claude. Claude will use its own judgement to decide when it thinks that the software is working. And it’ll be right. But why leave that up to chance? I expect that some of you are thinking that maybe I’m belaboring this point. “Why not just skip to ? That’s the idea, right? We need tests.” Well on some level: yes. But on another level, no. I’m trying to teach you how to think here, not give you the answer. Because it might be broader than just “run the tests.” Maybe you are working on a project where the tests aren’t very good yet. Maybe you’re working on a behavior that’s hard to automatically test. Maybe the test suite takes a very long time, and so isn’t appropriate to be running over and over and over. Remember our plan from the last post? Where Claude finished the plan with this: These aren’t “tests” in the traditional sense of a test suite, but they are objective measures that Claude can invoke itself to understand if it’s finished the task. Claude could run after every file edit if it wanted to, and as soon as it sees , it knows that it’s finished. You don’t need a comprehensive test suite. You just need some sort of way for Claude to detect if it’s done in some sort of objective fashion. Of course, we can do better. While giving Claude a way to know if it’s done working is important, there’s a second thing we need to pay attention to: when Claude isn’t done working, can we guide it towards doing the right thing, rather than the wrong thing? For example, those of you who are of a similar vintage as myself may remember the output of early compilers. It was often… not very helpful. Imagine that we told Claude that it should run to know if things are working, and the only output from it was the exit code: 0 if we succeeded, 1 if we failed. That would accomplish our objective of letting Claude know when things are done, but it wouldn’t help Claude know what went wrong when it returns 1. This is one reason why I think Rust works well with LLMs. Take this incorrect Rust program: The Rust compiler won’t just say “yeah this program is incorrect,” it’ll give you this (as of Rust 1.93.0): The compiler will point out the exact place in the code itself of where there’s an issue, and even make suggestions as to how to fix it. This goes beyond just simply saying “it doesn’t work” and instead nudges you to what might fix the problem. Of course, this isn’t perfect, but if it’s helpful more than not, that’s a win. Of course, too much verbosity isn’t helpful either. A lot of tooling has gotten much more verbose lately. Often times, this is really nice as a human. Pleasant terminal output is, well… pleasant. But that doesn’t mean that it’s always good or useful. For example, here’s the default output for : This is not bad output. It’s nice. But it’s also not useful for an LLM. We don’t need to read all of the tests that are passing, we really just want to see some sort of minimal output, and then what failed if something failed. In Cargo’s case, that’s for “quiet”: There is no point in giving a ton of verbose input to an LLM that it isn’t even going to need to use. If you’re feeding a tools’ output to an LLM, you should consider both what the tool does in the failure case, but also the success case. Maybe configure things to be a bit simpler for Claude. You’ll save some tokens and get better results. All of this has various implications for all sorts of things. For example, types are a great way to get quick feedback on what you’re doing. A comprehensive test suite that completes quickly is useful for giving feedback to the LLM. But that also doesn’t inherently mean that types must be better or that you need to be doing TDD; whatever gives you that underlying principle of “objective feedback for the success case and guidance for the failure case” will be golden, no matter what tech stack you use. This brings me to something that may be counter-intuitive, but I think is also true, and worth keeping in the back of your mind: what’s good for Claude is also probably good for humans working on your system. A good test suite was considered golden before LLMs. That it’s great for them is just a nice coincidence. At the end of the day, Claude is not a person, but it tackles programming problems in a similar fashion to how we do: take in the problem, attempt a solution, run the compiler/linter/tests, and then see what feedback it gets, then iterate. That core loop is the same, even if humans can exercise better judgement and can have more skill. And so even though I pitched fancy terminal output as an example of how humans and LLMs need different things, that’s really just a superficial kind of thing. Good error messages are still critical for both. We’re just better at having terminal spinners not take up space in our heads while we’re solving a problem, and can appreciate the aesthetics in a way that Claude does not. Incidentally, this is one of the things that makes me hopeful about the future of software development under agentic influence. Engineers always complain that management doesn’t give us time to do refactorings, to improve the test suite, to clean our code. Part of the reason for this is that we often didn’t do a good job of pitching how it would actually help accomplish business goals. But even if you’re on the fence about AI, and upset that management is all about AI: explain to management that this stuff is a force multiplier for your agents. Use the time you’ve saved by doing things the agentic way towards improving your test suite, or your documentation, or whatever else. I think there’s a chance that all of this stuff leads to higher quality codebases than ones filled with slop. But it also requires us to make the decisions that will lead is in that direction. That’s what I have for you today: consider how you can help Claude evaluate its own work. Give it explicit success criteria, and make evaluating that criteria as simple and objective as possible. In the next post, we’re gonna finally talk about . Can you believe that I’ve talked this much about how to use Claude and we haven’t talked about ? There’s good reason for that, as it turns out. We’re going to talk a bit more about understanding how interacting with LLMs work, and how it can help us both improve step 1 in our process, but also continue to make step 2 better and better. Here’s my post about this post on BlueSky: Steve Klabnik @steveklabnik.com · Jan 22 Replying to Steve Klabnik Agentic development basics: steveklabnik.com/writing/agen... Agentic development basics Blog post: Agentic development basics by Steve Klabnik steveklabnik.com Steve Klabnik @steveklabnik.com The most important thing when working with LLMs steveklabnik.com/writing/the-... The most important thing when working with LLMs Blog post: The most important thing when working with LLMs by Steve Klabnik Put the peanut butter on the bread Put the jelly on the bread Put the bread together Asking the LLM to do something by typing up what we want it to do Closely observing its behavior and course correcting it when it goes off of the rails Eventually, after it says that it’s finished, reviewing its output Ten minutes Two minutes

0 views
daniel.haxx.se 1 months ago

The end of the curl bug-bounty

tldr: an attempt to reduce the terror reporting . There is no longer a curl bug-bounty program. It officially stops on January 31, 2026. After having had a few half-baked previous takes, in April 2019 we kicked off the first real curl bug-bounty with the help of Hackerone, and while it stumbled a bit at first it has been quite successful I think. We attracted skilled researchers who reported plenty of actual vulnerabilities for which we paid fine monetary rewards. We have certainly made curl better as a direct result of this: 87 confirmed vulnerabilities and over 100,000 USD paid as rewards to researchers. I’m quite happy and proud of this accomplishment. I would like to especially highlight the awesome Internet Bug Bounty project, which has paid the bounties for us for many years. We could not have done this without them. Also of course Hackerone, who has graciously hosted us and been our partner through these years. Looking back, I think we can say that the downfall of the bug-bounty program started slowly in the second half of 2024 but accelerated badly in 2025. We saw an explosion in AI slop reports combined with a lower quality even in the reports that were not obvious slop – presumably because they too were actually misled by AI but with that fact just hidden better. Maybe the first five years made it possible for researchers to find and report the low hanging fruit. Previous years we have had a rate of somewhere north of 15% of the submissions ending up confirmed vulnerabilities. Starting 2025, the confirmed-rate plummeted to below 5%. Not even one in twenty was real . The never-ending slop submissions take a serious mental toll to manage and sometimes also a long time to debunk. Time and energy that is completely wasted while also hampering our will to live. I have also started to get the feeling that a lot of the security reporters submit reports with a bad faith attitude. These “helpers” try too hard to twist whatever they find into something horribly bad and a critical vulnerability, but they rarely actively contribute to actually improve curl. They can go to extreme efforts to argue and insist on their specific current finding, but not to write a fix or work with the team on improving curl long-term etc. I don’t think we need more of that. There are these three bad trends combined that makes us take this step: the mind-numbing AI slop, humans doing worse than ever and the apparent will to poke holes rather than to help. In an attempt to do something about the sorry state of curl security reports, this is what we do: We believe that we can maintain and continue to evolve curl security in spite of this change. Maybe even improve thanks to this, as hopefully this step helps prevent more people pouring sand into the machine. Ideally we reduce the amount of wasted time and effort. I believe the best and our most valued security reporters still will tell us when they find security vulnerabilities. If you suspect a security problem in curl going forward, we advise you to head over to GitHub and submit them there. Alternatively, you send an email with the full report to . In both cases, the report is received and handled privately by the curl security team. But with no monetary reward offered . Hackerone was good to us and they have graciously allowed us to run our program on their platform for free for many years. We thank them for that service. As we now drop the rewards, we feel it makes a clear cut and displays a clearer message to everyone involved by also moving away from Hackerone as a platform for vulnerability reporting. It makes the change more visible. It is probably going to be harder for us to publicly disclose every incoming security report in the same way we have done it on Hackerone for the last year. We need to work out something to make sure that we can keep doing it at least imperfectly, because I believe in the goodness of such transparency. Let me emphasize that this change does not impact our presence and mode of operation with the curl repository and its hosting on GitHub . We hear about projects having problems with low-quality AI slop submissions on GitHub as well, in the form of issues and pull-requests, but for curl we have not (yet) seen this – and frankly I don’t think switching to a GitHub alternative saves us from that. Compared to others, we seem to be affected by the sloppy security reports to a higher degree than the average Open Source project. With the help of Hackerone, we got numbers of how the curl bug-bounty has compared with other programs over the last year. It turns out curl’s program has seen more volume and noise than other public open source bug bounty programs in the same cohort. Over the past four quarters, curl’s inbound report volume has risen sharply, while other bounty-paying open source programs in the cohort, such as Ruby, Node, and Rails, have not seen a meaningful increase and have remained mostly flat or declined slightly. In the chart, the pink line represents curl’s report volume, and the gray line reflects the broader cohort. Inbound Report Volume on Hackerone: curl compared to OSS peers We suspect the idea of getting money for it is a big part of the explanation. It brings in real reports, but makes it too easy to be annoying with little to no penalty to the user. The reputation system and available program settings were not sufficient for us to prevent sand from getting into the machine. The exact reason why we suffer more of this abuse than others remains a subject for further speculation and research. There is a non-zero risk that our guesses are wrong and that the volume and security report frequency will keep up even after these changes go into effect. If that happens, we will deal with it then and take further appropriate steps. I prefer not to overdo things or overplan already now for something that ideally does not happen. People keep suggesting that one way to deal with the report tsunami is to charge security researchers a small amount of money for the privilege of submitting a vulnerability report to us. A curl reporters security club with an entrance fee. I think that is a less good solution than just dropping the bounty. Some of the reasons include: Maybe we need to do this later anyway, but we stay away from it for now. We have seen other projects and repositories see similar AI-induced problems for pull requests, but this has not been a problem for the curl project. I believe that for PRs we have much better means to sort out the weed with automatic means, since we have tools, tests and scanners to verify such contributions. We don’t need to waste any human time on pull requests until the quality is good enough to get green check-marks from 200 CI jobs. I will do a talk at FOSDEM 2026 titled Open Source Security in spite of AI that of course will touch on this subject. We never say never. This is now and we might have reasons to reconsider and make a different decision in the future. If we do, we will let you know. These changes are applied now with the hope that they will have a positive effect for the project and its maintainers. If that turns out to not be the outcome, we will of course continue and apply further changes later. Since I created the pull request for updating the bug-bounty information for curl on January 14, almost two weeks before we merged it, various media picked up the news and published articles. Long before I posted this blog post. Also discussed (indirectly) on Hacker News . We no longer offer any monetary rewards for security reports – no matter which severity. In an attempt to remove the incentives for submitting made up lies. We stop using Hackerone as the recommended channel to report security problems. To make the change immediately obvious and because without a bug-bounty program we don’t need it. We refer everyone to submit suspected curl security problems on GitHub using their Private vulnerability reporting feature. We continue to immediately ban and publicly ridicule everyone who submits AI slop to the project. Charging people money in an International context is complicated and a maintenance burden. Dealing with charge-backs, returns and other complaints and friction add work. It would limit who could or would submit issues. Even some who actually find legitimate issues. The Register: Curl shutters bug bounty program to remove incentive for submitting AI slop Elektroniktidningen: cURL removes bug bounties Heise online: curl: Projekt beendet Bug-Bounty-Programm Neowin: Beloved tool, cURL is shutting down its bug bounty over AI slop reports Golem: Curl-Entwickler dreht dem “KI-Schrott” den Geldhahn zu Linux Easy: cURL chiude il programma bug bounty: troppi report generati dall’AI Bleeping Computer: Curl ending bug bounty program after flood of AI slop reports The New Stack: Drowning in AI slop, cURL ends bug bounties Ars Technica: Overrun with AI slop, cURL scraps bug bounties to ensure “intact mental health” PressMind Labs: cURL konczy program bug bounty – czy to koniec jakosci zgloszen? Socket: curl Shuts Down Bug Bounty Program After Flood of AI Slop Reports

0 views
Sean Goedecke 1 months ago

How I estimate work as a staff software engineer

There’s a kind of polite fiction at the heart of the software industry. It goes something like this: Estimating how long software projects will take is very hard, but not impossible. A skilled engineering team can, with time and effort, learn how long it will take for them to deliver work, which will in turn allow their organization to make good business plans. This is, of course, false. As every experienced software engineer knows, it is not possible to accurately estimate software projects . The tension between this polite fiction and its well-understood falseness causes a lot of strange activity in tech companies. For instance, many engineering teams estimate work in t-shirt sizes instead of time, because it just feels too obviously silly to the engineers in question to give direct time estimates. Naturally, these t-shirt sizes are immediately translated into hours and days when the estimates make their way up the management chain. Alternatively, software engineers who are genuinely trying to give good time estimates have ridiculous heuristics like “double your initial estimate and add 20%“. This is basically the same as giving up and saying “just estimate everything at a month”. Should tech companies just stop estimating? One of my guiding principles is that when a tech company is doing something silly, they’re probably doing it for a good reason . In other words, practices that appear to not make sense are often serving some more basic, illegible role in the organization. So what is the actual purpose of estimation, and how can you do it well as a software engineer? Before I get into that, I should justify my core assumption a little more. People have written a lot about this already, so I’ll keep it brief. I’m also going to concede that sometimes you can accurately estimate software work , when that work is very well-understood and very small in scope. For instance, if I know it takes half an hour to deploy a service 1 , and I’m being asked to update the text in a link, I can accurately estimate the work at something like 45 minutes: five minutes to push the change up, ten minutes to wait for CI, thirty minutes to deploy. For most of us, the majority of software work is not like this. We work on poorly-understood systems and cannot predict exactly what must be done in advance. Most programming in large systems is research : identifying prior art, mapping out enough of the system to understand the effects of changes, and so on. Even for fairly small changes, we simply do not know what’s involved in making the change until we go and look. The pro-estimation dogma says that these questions ought to be answered during the planning process, so that each individual piece of work being discussed is scoped small enough to be accurately estimated. I’m not impressed by this answer. It seems to me to be a throwback to the bad old days of software architecture , where one architect would map everything out in advance, so that individual programmers simply had to mechanically follow instructions. Nobody does that now, because it doesn’t work: programmers must be empowered to make architectural decisions, because they’re the ones who are actually in contact with the code 2 . Even if it did work, that would simply shift the impossible-to-estimate part of the process backwards, into the planning meeting (where of course you can’t write or run code, which makes it near-impossible to accurately answer the kind of questions involved). In short: software engineering projects are not dominated by the known work, but by the unknown work, which always takes 90% of the time. However, only the known work can be accurately estimated. It’s therefore impossible to accurately estimate software projects in advance. Estimates do not help engineering teams deliver work more efficiently. Many of the most productive years of my career were spent on teams that did no estimation at all: we were either working on projects that had to be done no matter what, and so didn’t really need an estimate, or on projects that would deliver a constant drip of value as we went, so we could just keep going indefinitely 3 . In a very real sense, estimates aren’t even made by engineers at all . If an engineering team comes up with a long estimate for a project that some VP really wants, they will be pressured into lowering it (or some other, more compliant engineering team will be handed the work). If the estimate on an undesirable project - or a project that’s intended to “hold space” for future unplanned work - is too short, the team will often be encouraged to increase it, or their manager will just add a 30% buffer. One exception to this is projects that are technically impossible, or just genuinely prohibitively difficult. If a manager consistently fails to pressure their teams into giving the “right” estimates, that can send a signal up that maybe the work can’t be done after all. Smart VPs and directors will try to avoid taking on technically impossible projects. Another exception to this is areas of the organization that senior leadership doesn’t really care about. In a sleepy backwater, often the formal estimation process does actually get followed to the letter, because there’s no director or VP who wants to jump in and shape the estimates to their ends. This is one way that some parts of a tech company can have drastically different engineering cultures to other parts. I’ll let you imagine the consequences when the company is re-orged and these teams are pulled into the spotlight. Estimates are political tools for non-engineers in the organization . They help managers, VPs, directors, and C-staff decide on which projects get funded and which projects get cancelled. The standard way of thinking about estimates is that you start with a proposed piece of software work, and you then go and figure out how long it will take. This is entirely backwards. Instead, teams will often start with the estimate, and then go and figure out what kind of software work they can do to meet it. Suppose you’re working on a LLM chatbot, and your director wants to implement “talk with a PDF”. If you have six months to do the work, you might implement a robust file upload system, some pipeline to chunk and embed the PDF content for semantic search, a way to extract PDF pages as image content to capture formatting and diagrams, and so on. If you have one day to do the work, you will naturally search for simpler approaches: for instance, converting the PDF to text client-side and sticking the entire thing in the LLM context, or offering a plain-text “grep the PDF” tool. This is true at even at the level of individual lines of code. When you have weeks or months until your deadline, you might spend a lot of time thinking airily about how you could refactor the codebase to make your new feature fit in as elegantly as possible. When you have hours, you will typically be laser-focused on finding an approach that will actually work. There are always many different ways to solve software problems. Engineers thus have quite a lot of discretion about how to get it done. So how do I estimate, given all that? I gather as much political context as possible before I even look at the code . How much pressure is on this project? Is it a casual ask, or do we have to find a way to do this? What kind of estimate is my management chain looking for? There’s a huge difference between “the CTO really wants this in one week” and “we were looking for work for your team and this seemed like it could fit”. Ideally, I go to the code with an estimate already in hand . Instead of asking myself “how long would it take to do this”, where “this” could be any one of a hundred different software designs, I ask myself “which approaches could be done in one week?“. I spend more time worrying about unknowns than knowns . As I said above, unknown work always dominates software projects. The more “dark forests” in the codebase this feature has to touch, the higher my estimate will be - or, more concretely, the tighter I need to constrain the set of approaches to the known work. Finally, I go back to my manager with a risk assessment, not with a concrete estimate . I don’t ever say “this is a four-week project”. I say something like “I don’t think we’ll get this done in one week, because X Y Z would need to all go right, and at least one of those things is bound to take a lot more work than we expect. Ideally, I go back to my manager with a series of plans, not just one: In other words, I don’t “break down the work to determine how long it will take”. My management chain already knows how long they want it to take. My job is to figure out the set of software approaches that match that estimate. Sometimes that set is empty: the project is just impossible, no matter how you slice it. In that case, my management chain needs to get together and figure out some way to alter the requirements. But if I always said “this is impossible”, my managers would find someone else to do their estimates. When I do that, I’m drawing on a well of trust that I build up by making pragmatic estimates the rest of the time. Many engineers find this approach distasteful. One reason is that they don’t like estimating in conditions of uncertainty, so they insist on having all the unknown questions answered in advance. I have written a lot about this in Engineers who won’t commit and How I provide technical clarity to non-technical leaders , but suffice to say that I think it’s cowardly. If you refuse to estimate, you’re forcing someone less technical to estimate for you. Some engineers think that their job is to constantly push back against engineering management, and that helping their manager find technical compromises is betraying some kind of sacred engineering trust. I wrote about this in Software engineers should be a little bit cynical . If you want to spend your career doing that, that’s fine, but I personally find it more rewarding to find ways to work with my managers (who have almost exclusively been nice people). Other engineers might say that they rarely feel this kind of pressure from their directors or VPs to alter estimates, and that this is really just the sign of a dysfunctional engineering organization. Maybe! I can only speak for the engineering organizations I’ve worked in. But my suspicion is that these engineers are really just saying that they work “out of the spotlight”, where there’s not much pressure in general and teams can adopt whatever processes they want. There’s nothing wrong with that. But I don’t think it qualifies you to give helpful advice to engineers who do feel this kind of pressure. I think software engineering estimation is generally misunderstood. The common view is that a manager proposes some technical project, the team gets together to figure out how long it would take to build, and then the manager makes staffing and planning decisions with that information. In fact, it’s the reverse: a manager comes to the team with an estimate already in hand (though they might not come out and admit it), and then the team must figure out what kind of technical project might be possible within that estimate. This is because estimates are not by or for engineering teams. They are tools used for managers to negotiate with each other about planned work. Very occasionally, when a project is literally impossible, the estimate can serve as a way for the team to communicate that fact upwards. But that requires trust. A team that is always pushing back on estimates will not be believed when they do encounter a genuinely impossible proposal. When I estimate, I extract the range my manager is looking for, and only then do I go through the code and figure out what can be done in that time. I never come back with a flat “two weeks” figure. Instead, I come back with a range of possibilities, each with their own risks, and let my manager make that tradeoff. It is not possible to accurately estimate software work. Software projects spend most of their time grappling with unknown problems, which by definition can’t be estimated in advance. To estimate well, you must therefore basically ignore all the known aspects of the work, and instead try and make educated guesses about how many unknowns there are, and how scary each unknown is. edit: I should thank one of my readers, Karthik, who emailed me to ask about estimates, thus revealing to me that I had many more opinions than I thought. For anyone wincing at that time, I mean like three minutes of actual deployment and twenty-seven minutes of waiting for checks to pass or monitors to turn up green. I write a lot more about this in You can’t design software you don’t work on . For instance, imagine a mandate to improve the performance of some large Rails API, one piece at a time. I could happily do that kind of work forever. We tackle X Y Z directly, which might all go smoothly but if it blows out we’ll be here for a month We bypass Y and Z entirely, which would introduce these other risks but possibly allow us to hit the deadline We bring in help from another team who’s more familiar with X and Y, so we just have to focus on Z For anyone wincing at that time, I mean like three minutes of actual deployment and twenty-seven minutes of waiting for checks to pass or monitors to turn up green. ↩ I write a lot more about this in You can’t design software you don’t work on . ↩ For instance, imagine a mandate to improve the performance of some large Rails API, one piece at a time. I could happily do that kind of work forever. ↩

1 views
Max Bernstein 1 months ago

A multi-entry CFG design conundrum

The ZJIT compiler compiles Ruby bytecode (YARV) to machine code. It starts by transforming the stack machine bytecode into a high-level graph-based intermediate representation called HIR. We use a more or less typical 1 control-flow graph (CFG) in HIR. We have a compilation unit, , which has multiple basic blocks, . Each block contains multiple instructions, . HIR is always in SSA form, and we use the variant of SSA with block parameters instead of phi nodes. Where it gets weird, though, is our handling of multiple entrypoints. See, YARV handles default positional parameters (but not default keyword parameters) by embedding the code to compute the defaults inside the callee bytecode. Then callers are responsible for figuring out what offset in the bytecode they should start running the callee, depending on the amount of arguments the caller provides. 2 In the following example, we have a function that takes two optional positional parameters and . If neither is provided, we start at offset . If just is provided, we start at offset . If both are provided, we can start at offset . (See the jump table debug output: ) Unlike in Python, where default arguments are evaluated at function creation time , Ruby computes the default values at function call time . For this reason, embedding the default code inside the callee makes a lot of sense; we have a full call frame already set up, so any exception handling machinery or profiling or … doesn’t need special treatment. Since the caller knows what arguments it is passing, and often to what function, we can efficiently support this in the JIT. We just need to know what offset in the compiled callee to call into. The interpreter can also call into the compiled function, which just has a stub to do dispatch to the appropriate entry block. This has led us to design the HIR to support multiple function entrypoints . Instead of having just a single entry block, as most control-flow graphs do, each of our functions now has an array of function entries: one for the interpreter, at least one for the JIT, and more for default parameter handling. Each of these entry blocks is separately callable from the outside world. Here is what the (slightly cleaned up) HIR looks like for the above example: If you’re not a fan of text HIR, here is an embedded clickable visualization of HIR thanks to our former intern Aiden porting Firefox’s Iongraph : (You might have to scroll sideways and down and zoom around. Or you can open it in its own window .) Each entry block also comes with block parameters which mirror the function’s parameters. These get passed in (roughly) the System V ABI registers. This is kind of gross. We have to handle these blocks specially in reverse post-order (RPO) graph traversal. And, recently, I ran into an even worse case when trying to implement the Cooper-style “engineered” dominator algorithm: if we walk backwards in block dominators, the walk is not guaranteed to converge. All non-entry blocks are dominated by all entry blocks, which are only dominated by themselves. There is no one “start block”. So what is there to do? Approach 1 is to keep everything as-is, but handle entry blocks specially in the dominator algorithm too. I’m not exactly sure what would be needed, but it seems possible. Most of the existing block infra could be left alone, but it’s not clear how much this would “spread” within the compiler. What else in the future might need to be handled specially? Approach 2 is to synthesize a super-entry block and make it a predecessor of every interpreter and JIT entry block. Inside this approach there are two ways to do it: one ( 2.a ) is to fake it and report some non-existent block. Another ( 2.b ) is to actually make a block and a new instruction that is a quasi-jump instruction. In this approach, we would either need to synthesize fake block arguments for the JIT entry block parameters or add some kind of new instruction that reads the argument i passed in. (suggested by Iain Ireland, as seen in the IBM COBOL compiler) Approach 3 is to duplicate the entire CFG per entrypoint. This would return us to having one entry block per CFG at the expense of code duplication. It handles the problem pretty cleanly but then forces code duplication. I think I want the duplication to be opt-in instead of having it be the only way we support multiple entrypoints. What if it increases memory too much? The specialization probably would make the generated code faster, though. (suggested by Ben Titzer) None of these approaches feel great to me. The probable candidate is 2.b where we have instructions. That gives us flexibility to also later add full specialization without forcing it. Cameron Zwarich also notes that this this is an analogue to the common problem people have when implementing the reverse: postdominators. This is because often functions have multiple return IR instructions. He notes the usual solution is to transform them into branches to a single return instruction. Do you have this problem? What does your compiler do? We use extended basic blocks (EBBs), but this doesn’t matter for this post. It makes dominators and predecessors slightly more complicated (now you have dominating instructions ), but that’s about it as far as I can tell. We’ll see how they fare in the face of more complicated analysis later.  ↩ Keyword parameters have some mix of caller/callee presence checks in the callee because they are passed in un-ordered. The caller handles simple constant defaults whereas the callee handles anything that may raise. Check out Kevin Newton’s awesome overview .  ↩ We use extended basic blocks (EBBs), but this doesn’t matter for this post. It makes dominators and predecessors slightly more complicated (now you have dominating instructions ), but that’s about it as far as I can tell. We’ll see how they fare in the face of more complicated analysis later.  ↩ Keyword parameters have some mix of caller/callee presence checks in the callee because they are passed in un-ordered. The caller handles simple constant defaults whereas the callee handles anything that may raise. Check out Kevin Newton’s awesome overview .  ↩

0 views

3D printing my laptop ergonomic setup

Apparently, one of my hobbies is making updates to my ergonomic setup, then blogging about it from an Amtrak train. I've gone and done it again. My setup stayed static for some time, but my most recent iteration ended up letting me down and I had to change it again. It gave me a lot of useful information and strongly shaped how I approached this iteration. This new one is closest to the first one I wrote about in 2024, but with some major improvements and reproducibility. First things first, though. Why am making I yet more changes to this setup? Besides my constant neurodivergent drive to make things perfect, my setups all kept causing me some problems. In chronological order, here are the problems and neat benefits of each setup I used for at least a few months. So my immediate previous version was heavy and tedious to setup. I had a trip coming up to Brooklyn, so I had to either make something more portable or leave my laptop at home. I decided to take my laptop, and did a design sprint to see if I can make my dream setup. At this point I'll probably be working on this setup forever, but I hope I can stop if I am able to satisfy all my goals at some point. My dream setup has these characteristics: So, you know, it's not like I want a lot out of this setup. It's not like these are kind of a lot to all fit into one thing. I'm sure it'll be a piece of cake. I use OpenSCAD for 3D modeling. It's pretty pleasant, though some things are hard in general (like roundovers and fillets on any more complicated shapes). My design to start is basically one of my previous versions: my split keyboard at adjustable width on a base, and a slot to hold my laptop vertically. I started by measuring important dimensions, like how far apart I wanted my keyboard halves and the dimensions of my laptop. Then I compared these to my 3D printer's print volume, and started working out how I'd have to print it. The rig is wider than my 3D printer, so I had to split it up into parts. The slot would fit as a separate piece if I oriented it diagonally. The base itself would have to be split into two separate halves. To join the halves and the slot, I decided to use dovetail joints. I'm familiar with them from woodworking, and I figured they'd give a strong join here as well. I added the library BOSL2 to generate the dovetails, and these were pretty easy to model in. Then I also made some keyboard mounts, which I attach using a camera tripod mount (the Keyboardio Model 100 has threading for this). This is where I ended up for my initial design. When I printed the first pieces, I ran into a problem. The pieces came out alright, mostly, but there was this wavy defect on the top of it! It ended up being (I think) that the print was not adhering well to the printbed. This was easily solved by washing it with some water and dish soap, then prints started coming out beautifully. The other problem was that the sliders and rails worked too smoothly, and I realized that I'd need to have some way to lock the keyboard in place or it would slide around in a difficult to use way. I punted on this, and printed the whole thing. I knew I'd need another iteration on it for material reasons: I am printing the prototype from PLA, since it's easy to work with, but I wanted to print the final one from PETG for slightly better heat resistance. So, onwards, and with a clean printbed, I was able to make the full first prototype! It was 3 parts which took 2-3.5 hours each to print, for a total print time of under 12 hours. I assembled the pieces and glued them together. At this point I was able to use the setup to work on itself, which was really satisfying. I did need to make the keyboard lock in place for carrying it, but it was fairly stable on my desk at least. Now it was time to make a few tweaks, and print the whole thing in PETG for its heat resistance. I did a few things this iteration: I carved out a honeycomb pattern on the base to reduce weight and filament; I added a nubbin and detentes to the keyboard slider to lock it in place where I want (in 10mm increments); I lengthened the keyboard rails to go further in; and I widened the keyboard slot for a less snug fit. This time is when I met the challenge that is printing with PETG! I dried my filament and started doing some prototyping. I sliced apart chunks of my model to see if things fit together still, since that can change with materials. I also printed a test of my locking clicky mechanism for the keyboard, and good thing: it needed design changes, but the second print worked great (I modified the first with a knife until it fit, then measured the remaining material, and modeled that). Then I printed it. And it came out pretty well! I mean, I had major stringing and bed adherence issues the first time I tried it, but with thorough bed cleaning and a nozzle wipe, it came out cleanly. I had one spot with a minor quality issue, but it's on the bottom and not visible. And it's working out really well! Mostly! The good things here are what make it usable. It is lightweight (about 280 grams), which is comparable to my lightest previous setup but that one fell apart promptly. It seems durable; we'll see over time, but it did survive multiple backpack loadings and a trip to Brooklyn today, where I hauled it around the city with me. And it's pretty fast to deploy: I can put it together in 15 seconds. The keyboard width is very easy to adjust, and it's solidly in place where it won't slide by accident. The laptop screen is at a good height. It's reproducible: others could print it as well, with access to the files. (I'm considering making them open source, but I don't think they're quite ready to share. It needs some iteration first.) And I quite like the way it looks. However, it's not all good. I want to make some changes to it soon, after a break from the long print times and iterations. Here's the list to address: I don't know if addressing those is all feasible, or if it will satisfy my dream setup. But I do know by now that I'll not be done with this for a long, long time. Everyone needs a hobby, apparently this is one of mine. It's been surprisingly rewarding to work on my own ergonomic setup like this. I have made this setup specifically for health reasons: without it, I cannot use a laptop without severe nerve pain, and I rather like being able to work from anywhere. I have a very uncommon setup in that I'm able to use my Keyboardio Model 100 from a train; I've not seen that before. The amazing thing about 3D printers is enabling this kind of solution. I made my previous versions in my workshop out of mostly wood. It took time and iteration was a big challenge. With a 3D printer, it's doable to design it and even send it off to someone else to print. And we can make exactly what we need, at relatively low cost. It's a technology that truly changes things in making custom tailored solutions far more accessible. As far as I know, the main laptops that do this are the Framework 13 and some Lenovo Thinkpads. No Apple laptop does this. It's a big constraint and I haven't been able to design it out of my setup. I'm starting to wonder if the ticket is a headless small form factor computer with a portable monitor. ↩ I am annoyed at this, because it limits my keyboard options and I would love something lighter. Don't get me wrong, I love my Model 100. But I'm uncomfortable relying only on one keyboard from one company. ↩ My first one was difficult to adjust the keyboard width . You had to flip it over and loosen hardware from the bottom. It was also a little heavy . There's a limit to how far I can reduce weight when using a Keyboardio Model 100, but we can get closer. However, this rig was very fast to set up. It also did keep my keyboard at a good width. My second one used hinges made from fabric and hook-and-loop fasteners, which was neat but ultimately it fell apart , it was tedious to adjust , and it took a long time to set up . The big benefit of this setup was that it was extremely light . This was helpful when I was suffering from a lot of fatigue and POTS. My third one had a neat hinging mechanism which was useful for smaller spaces but wasn't much faster to set up . It used a smaller lighter keyboard, but ultimately that keyboard ended up relapsing my nerve pain . My fourth one, not previously written about, was... way too heavy . It was also a little tedious to setup , but the weight was its biggest problem. I made that one from off-the-shelf parts (mostly), with the goal of making something reproducible for others . And it worked with any laptop , not just ones with a 180 degree hinge like mine [1] . But, with how heavy and annoying it was, it's not worth reproducing . relatively lightweight : it's not going to get super light with both a laptop and my keyboard, but I want to minimize the weight beyond those solid mount for my Keyboardio Model 100 : this keyboard is, vexingly [2] , the only keyboard that keeps my nerve pain in remission. I need to use it. good laptop screen height : another problem with laptop use generally is that the screen is usually too low or the keyboard is too high. I want to make sure the screen is at a reasonable height so that I don't wreck my body through poor posture. durability : it needs to be pretty durable since I'm going to use this rig for travel. I don't abuse my laptop or my setup, but it has to stand up to regularly being taken in and out of a bag and being used in random places. It has to stand up to a variety of environmental conditions, too. as easy as opening my laptop : a lot of ergonomic problems stem from ergonomic setups being inconvenient , so if I can reduce that inconvenience, I can reduce the problems easily adjustable keyboard width : I shift around my keyboard position as my body asks for it, and having dynamic positioning helps me feel comfortable. I'd like to be able to do this with little fuss, or else I won't do it (see the previous point). mounting points for accessories : I use an eink tablet to take notes, and would love to be able to put it on a little mount on the rig. I also want to be able to mount USB hubs or the mic I use for Talon. Having options for attaching accessories would make it not just equivalent to a laptop, but far more flexible. reproducible : This setup gets a lot of comments from people, and it solves real problems for me that other people have as well. I want more people to be able to use it. interesting : whenever I take this thing out, I get comments on it. It's how I find other engineers and software folks: most people are all "ignore the lady with the weird rig" but y'all actually strike up conversations with me about it. (If you ever run into me in public, please do talk to me! Even if it looks like I'm working!) I don't want this social benefit to go away! attractive aesthetic : I've been fine using my homebrew wood setups, but they're so obviously homemade and don't look good. My dream is that it would look like it's not homemade, and would simply look like it's how the computer is intended to be used. Replacements for the camera z-mounts : I'd like to 3d print something for this, and it will be the first iteration I make. The z-mounts are over a pound of metal together, so I could bring down the weight a bit more this way. However, it may be not worth it. Add non-slip feet and extra rails on the bottom : I'd like to raise it off the surface it's on a little bit and add some rails on the bottom for a little more rigidity. Make it more rigid : it is a little bit floppy, but not to the point of being distracting when using it. I'd like it to feel a little sturdier, especially if anyone else were going to use it. Add attachment points for accessories : on Friday, someone at Recurse Center saw my coffee perched in the middle and he suggested a cupholder. I'd like that, or mounts for my mic or USB hub or myriad other things. I can use the honeycomb grid for attachment points, if I add those rails/feet on the bottom to raise it all up a little bit. Make it modular and customizable : it only works today if you have a split keyboard with a tripod mount on the bottom of it. So, that's not great for people who don't have the exact same keyboard I do! And if you have other laptops, well, it would need to be adjusted for that. I want to address this before releasing the files. (If you do have the hardware that makes this useful for you today, let me know. I'm happy to help people out with that, I just don't want to do a big public release.) As far as I know, the main laptops that do this are the Framework 13 and some Lenovo Thinkpads. No Apple laptop does this. It's a big constraint and I haven't been able to design it out of my setup. I'm starting to wonder if the ticket is a headless small form factor computer with a portable monitor. ↩ I am annoyed at this, because it limits my keyboard options and I would love something lighter. Don't get me wrong, I love my Model 100. But I'm uncomfortable relying only on one keyboard from one company. ↩

0 views
Phil Eaton 1 months ago

LLMs and your career

The most conservative way to build a career as a software developer is 1) to be practical and effective at problem solving but 2) not to treat all existing code as a black box. 1 means that as a conservative developer you should generally use PostgreSQL or MySQL (or whatever existing database), Rails or .NET (or whatever existing framework), and adapt code from Stack Overflow or LLMs. 2 means that you're curious and work over time to better understand how web servers and databases and operating systems and the browser actually work so that you can make better decisions for your own problems as you adapt other people's code and ideas. Zooming out, coding via LLM is not fundamentally different from coding with Rails or coding by perusing Stack Overflow. It's faster and more direct but it's still potentially just a human mindlessly adapting existing code. The people who were only willing to look at existing frameworks and libraries and applications as black boxes were already not the most competitive when it came to finding and retaining work. And on the other hand, the most technically interesting companies always wanted to hire developers who understood fundamentals because they're 1) operating at such a scale that the way the application is written matters or they're 2) building PostgreSQL or MySQL or Rails or .NET or Stack Overflow or LLMs, etc. The march of software has always been to reduce the need for (ever larger sizes of) SMBs (and teams within non-SMBs) to hire developers to solve problems or increase productivity. LLMs are part of that march. That doesn't change that at some point companies (or teams) need to hire developers because the business or its customer base has become too complex or too large. The jobs that were dependent on fundamentals of software aren't going to stop being dependent on fundamentals of software. And if more non-developers are using LLMs it's going to mean all the more stress on tools and applications and systems that rely on fundamentals of software. All of this is to say that if you like doing software development, I don't think interesting software development jobs are going to go away. So keep learning and keep building compilers and databases and operating systems and keep looking for companies that have compiler and database and operating system products, or companies with other sorts of interesting problems where fundamentals matter due to their scale. LLMs and your career pic.twitter.com/lxu1HLF2LC

1 views
Manuel Moreale 1 months ago

Web, Social Networks, Social Web

The other day, a podcast episode caught my attention. It was titled “Can We Build a Better Social Network”, and it was a collaboration between Hard Fork and Search Engine. I thought it was just a discussion about the state of social networks, but then I read the description of the episode: Over the past year, we've been working with the podcast "Search Engine" on a project that reimagines what the internet can be. What if instead of rage-baiting, a social platform incentivized friendly interaction and good-faith discussion? Today, we're bringing "Hard Fork" listeners an episode we made with the "Search Engine" team called "The Fediverse Experiment", where we end up creating our own social media platform. A year of work? Creating a social media platform? Reimagining the internet? Sounds ambitious, and also very interesting. As you probably know, calling me a skeptic of social media would be an understatement, but I’m still very much intrigued by people who want to try different approaches, and so I started listening. Not even 5 minutes in, the conversation was already off the rails, and they were saying things that made absolutely no sense. «So the fediverse is a way for people to take back the internet for themselves.» I’m sorry what? «It's a way to have a identity and connect to other things that are important to you online and just not worry about having to fight through a Google algorithm or a Facebook algorithm. In fact, you could bring your own algorithm if you want to. I'm already doing such a bad job of explaining what the Fedverse is.» Ok at least they were aware that it was an awful explanation. The first interesting bit of the podcast is at around 7 minutes, where they say something I find so infuriatingly wrong that I was about to stop listening. The story these people told me went like this. Basically all of them, as different as they were from one another, had a shared view of what had gone wrong with our internet. The way they saw it in the nineties, even in the early two thousands, our internet had truly been an open place. Infinite websites, infinite message boards populated by all sorts of people with all sorts of values, free to live how they wanted in the little neighborhoods they'd made. If you wanted to move homes on that internet, say switch your email from Yahoo to Gmail, it was mildly annoying, but not a huge deal. So far, so good. But then social media arrived. To access those platforms, you usually needed a dedicated account. Once you started posting on that account, you were now in a game to build as large a following as possible. Already, the fuck? First, even to access earlier platforms, you needed a dedicated account. Heck, you needed accounts for everything. Forums, message boards, you name it. Also, «Once you started posting on that account, you were now in a game to build as large a following as possible» ? Says who? This is what social media became over time, sure, but social media didn’t start this way, and in the early days, it sure wasn’t only a matter of amassing an audience. But the architects of the Fediverse, they had a more radical idea. The vision they held was that they could take control of social media out of the hands of the Musks and Zuckerbergs and reroute it back towards more open internet where no mogul would ever have the same kind of power they do now. Did you spot the shift? We started with “our internet had truly been an open place”, and now we’re trying to take back control of social media. I don’t know about you, but to me, the internet ≠ social media. Wild take, I know. Anyway, they then embark on this journey of, their words not mine, «finish building the fediverse» and I can only hope it was said jokingly. The whole episode is a wild ride if you know anything about these topics, and the very underwhelming outcome of all this is that what they built was…a Mastodon instance. And they’re not even self-hosting it. What they “built” is a Mastodon instance hosted by masto.host and, of course, since this is 2026, they had to use AI somehow to do it. Sigh… If the episode was titled “We have set up a Mastodon server”, I’d not have bothered listening to it. That said, listening to the episode made me realize how some people have a very narrow view of what the internet is and can be from a social interaction standpoint. Imagine a social platform that’s not controlled by a single billionaire. A platform that’s not powered by a closed-source algorithm. Usernames are unique , the underlying protocol powering it is flexible and very robust. Your profile page is infinitely customizable, and no two profiles need to look the same. It supports DMs and chats . A platform where you can post videos, photos, audio, 3D content, you name it, and where you can follow other people’s pages and be sure that no algorithm will hide that content from you. A platform that's not censored or moderated by arbitrary rules set by a Silicon Valley billionaire. How good does that sound to you? Because to me, a platform like that looks like a dream, if only we could figure out a way to build it. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
André Arko 2 months ago

Announcing <code>rv clean-install</code>

Originally posted on the Spinel blog . As part of our quest to build a fast Ruby project tool , we’ve been hard at work on the next step of project management: installing gems. As we’ve learned over the last 15 years of working on Bundler and RubyGems, package managers are really complicated! It’s too much to try to copy all of rbenv, and ruby-build, and RubyGems, and Bundler, all at the same time. Since we can’t ship everything at once, we spent some time discussing the first project management feature we should add after Ruby versions. Inspired by and , we decided to build . Today, we’re releasing the command as part of version 0.4. So, what is a clean install? In this case, clean means “from a clean slate”. You can use to install the packages your project needs after a fresh checkout, or before running your tests in CI. It’s useful by itself, and it’s also concrete step towards managing a project and its dependencies. Even better, it lays a lot of the groundwork for future gem management functionality, including downloading, caching, and unpacking gems, compiling native gem extensions, and providing libraries that can be loaded by Bundler at runtime. While we don’t (yet!) handle adding, removing, or updating gem versions, we’re extremely proud of the progress that we’ve made, and we’re looking forward to improving based on your feedback. Try running today, and see how it goes. Is it fast? Slow? Are there errors? What do you want to see next? Let us know what you think .

0 views
Tenderlove Making 2 months ago

Pixoo64 Ruby Client

I bought a Pixoo64 LED Display to play around with, and I love it! It connects to WiFi and has an on-board HTTP API so you can program it. I made a Ruby client for it that even includes code to convert PNG files to the binary format the sign wants. One cool thing is that the display can be configured to fetch data from a remote server, so I configured mine to fetch PM2.5 and CO2 data for my office. Here’s what it’s looking like so far: Yes, this is how I discovered I need to open a window 😂

0 views