Posts in Ruby (20 found)
(think) 1 weeks ago

How to Vim: Build your .vimrc from Scratch

People often think that getting started with Vim means spending hours crafting an elaborate with dozens of plugins. In reality, modern Vim (9+) and Neovim ship with remarkably sane defaults, and you can get very far with a configuration that’s just a few lines long – or even no configuration at all. If you launch Vim 9 without a file, it automatically loads – a built-in configuration that provides a solid foundation. Here’s what you get for free: That’s actually a pretty reasonable editing experience out of the box! You can read the full details with . Neovim goes even further with its defaults – it enables (copies indentation from the previous line), (highlights all search matches), (makes Tab smarter at the start of a line), (reloads files changed outside the editor), always shows the statusline, and sets the command history to 10000 entries, among many other things. If you’re on Neovim, the out-of-the-box experience is excellent. See for the full list. Here’s something that trips up a lot of people: the moment you create a file – even an empty one – Vim stops loading entirely. That means you lose all those nice defaults. The fix is simple. Start your with: This loads the defaults first, and then your own settings override or extend them as needed. This gotcha only applies to Vim. Neovim’s defaults are always active regardless of whether you have an or . Here’s a minimal that builds on the defaults and adds a few things most people want: That’s five settings on top of the defaults. You might not even need all of them – already handles the fundamentals. For Neovim, you don’t need the line – all the equivalents are already active. You also get , , and for free, so the only settings left to add are the ones that are genuinely personal preference: One of the most underappreciated aspects of Vim is how much built-in support it ships for programming languages. When is active (which it is via or Neovim’s defaults), you automatically get: This means that when you open a Python file, Vim already knows to use 4-space indentation. Open a Ruby file and it switches to 2 spaces. Open a Makefile and it uses tabs. All without a single plugin or line of configuration. You can check what’s available with for syntax files or for filetype plugins. The list is impressively long. At some point you’ll probably want more than the bare minimum. Here are a few things worth considering as your next steps: And when you eventually want more plugins, you probably won’t need many. A fuzzy finder, maybe a Git integration, and perhaps a completion engine will cover most needs. But that’s a topic for another day. The key takeaway is this: don’t overthink your . Start with the defaults, add only what you actually need, and resist the urge to copy someone else’s 500-line configuration. A small, well-understood configuration beats a large, cargo-culted one every time. That’s part of the reason why when I started to re-learn Vim I’ve opted to slowly build a Vim 9 configuration from scratch, instead of jumping to something like Neovim + Kickstart.nvim or LazyVim right away. Less is more. Understanding the foundations of your editor matters. 1 Right now my is just 100 lines and I don’t foresee it becoming much bigger in the long run. If you want to see just how far you can go without plugins, I highly recommend the Thoughtbot talk How to Do 90% of What Plugins Do (With Just Vim) . It’s a great demonstration of Vim’s built-in capabilities for file finding, auto-completion, tag navigation, and more. That’s all I have for you today. Keep hacking! I guess this sounds strange coming from the author of Emacs Prelude, right?  ↩︎ – syntax highlighting – filetype detection, language-specific plugins, and automatic indentation – incremental search (results appear as you type) – keeps 5 lines of context around the cursor – shows instead of hiding truncated lines – mouse support in all modes remapped to (text formatting) instead of the mostly useless Ex mode And several other quality-of-life improvements Syntax highlighting for hundreds of languages – Vim ships with around 770+ syntax definitions Language-specific indentation rules for over 420 file types Filetype plugins that set sensible options per language (e.g., , , ) A colorscheme – Vim ships with several built-in options (try followed by Tab to see them). Recent Vim builds even bundle Catppuccin – a beautiful pastel theme that I’m quite fond of. Another favorite of mine is Tokyo Night , which you’ll need to install as a plugin. Neovim’s default colorscheme has also been quite good since 0.10. Persistent undo – lets you undo changes even after closing and reopening a file. A game changer. Clipboard integration – makes yank and paste use the system clipboard by default. vim-unimpaired – if you’re on classic Vim (not Neovim), I think Tim Pope’s vim-unimpaired is essential. It adds a consistent set of / mappings for navigating quickfix lists, buffers, adding blank lines, and much more. Neovim 0.11+ has adopted many of these as built-in defaults, but on Vim there’s no substitute. I guess this sounds strange coming from the author of Emacs Prelude, right?  ↩︎

0 views
DHH 1 weeks ago

Omacon comes to New York

The vibes around Linux are changing fast. Companies of all shapes and sizes are paying fresh attention. The hardware game on x86 is rapidly improving. And thanks to OpenCode and Claude Code, terminal user interfaces (TUIs) are suddenly everywhere. It's all this and Omarchy that we'll be celebrating in New York City on April 10 at the Shopify SoHo Space for the first OMACON! We've got an incredible lineup of speakers coming. The creator of Hyprland, Vaxry, will be there. Along with ThePrimeagen and TJ DeVries. You'll see OpenCode creator Dax Raad. Omarchy power contributors Ryan Hughes and Bjarne Øverli. As well as Chris Powers (Typecraft) and myself as Linux superfans. All packed into a single day of short sessions, plenty of mingle time, and some good food. Tickets go on sale tomorrow (February 19) at 10am EST. We only have room for 130 attendees total, so I imagine the offered-at-cost $299 tickets will go quickly. But if you can't manage to snatch a ticket in time, we'll also be recording everything, so you won't be left out entirely. But there is just something special about being together in person about a shared passion. I've felt the intensity of that three years in a row now with Rails World. There's an endless amount of information and instruction available online, but a sense of community and connection is far more scarce. We nerds need this. We also need people to JUST DO THINGS. Like kick off a fresh Linux distribution together with over three hundred contributors so far all leaning boldly into aesthetics, ergonomics, and that omakase spirit.  Omarchy only came about last summer, now we're seeing 50,000 ISO downloads a week, 30,000 people on the Discord, and now our very first exclusive gathering in New York City. This is open source at its best. People from all over, coming together, making cool shit. (Oh, and thanks to Shopify and Tobi for hosting. You gotta love when a hundred-plus billion dollar company like this is run by an uber nerd who can just sign off on doing something fun and cool for the community without any direct plausible payback.)

0 views

Leading Without a Map

No one can deny that our industry is in a period of great change. This industry never stops, and the rate goes up and down but change is a constant. Like it or not " change calls the tune we dance to ." One of the biggest reasons people resist change, even people who joined the software business to "change the world" is when they feel it threatens their self-perception and identity. In the west our job is often the primary piece of our identity. One sees it everywhere. Your LinkedIn profile has your name first, and some sort of job title or role description second. Heck even contestants on Jeopardy are introduced as "A marketing consultant from Eyebrow, Saskatchewan ." When completing the sentence "I am a..." most people pick their job. When change is high, that self-conception can quickly feel under threat. Even in the small it can happen. Your company decides they'd be better served writing new code in Java rather than Python or Ruby, you can expect a few "Pythonistas" or "Rubyists" to push back. In their heart of hearts they may agree with the decision on its merits but they nevertheless feel that their very identity is under threat. This can also include their social group/community/tribe membership, something that humans are genetically programmed to value and protect. So it's no doubt understandable that change can bring out strange and unpredictable behaviour in people when they feel like there's risk to their identity, self concept, or tribal membership. Well, first of all, acknowledge to ourselves that we are not immune from these phenomena either. Presumably most of us started out as software developers ourselves and when we started managing the people who did the job, it was the job we used to do so we got it. Over time, that's drifted. New frameworks and paradigms have emerged, new 'best' practices replaced the old 'best' practices and we became less intimately familiar with the day-to-day things our people were doing. This is uncomfortable at times, but we adapt. We learn what we can to stay involved at the right level and to coach and guide the people we're responsible for. Now, the game is changing in a much more fundamental and profound way. And it's happening fast. I don't know what the job of software developer is going to look like in a year from now (or even 6 months for that matter) and, frankly, neither does anyone else. This makes the job of manager much much harder. Your people are used to you having at least some concept of a map and sharing it with them and you don't have one. Everyone's figuring it out together. A good friend and former colleague once described an aspect of leadership as "smiling while the sky is falling." I'm not sure if he came up with it or if I should attribute it to someone else but I heard it from him first. My point here isn't that the sky is falling but rather, when your people are worried, you need to appear steadfast or you make the problem worse. You don't owe them certainty , because that would be dishonest and they'll clock your dishonesty whether they admit it or not. But just like in incident response, panic serves no one . You owe them calm reassurance that you're going to navigate this new world together and that you've got their best-interests at heart. You do this even though you might be feeling the same threat to your identity. You manage engineers but they're becoming some kind of new thing; bot-wranglers. Some of your other responsibilities are being offloaded to LLMs and everyone's role is going to keep changing until things inevitably settle down again (relatively speaking). With no playbook, we need some kind of framework for decision making. This is where we can fall back to 'first principles'. For me these are the things I hold important. Really, the basics: It sounds simple, and really, it is. Taking care of the people right now means recognizing that they're feeling that identity risk. The worst thing you can do is try to talk them out of it or convince them they're not feeling what they're feeling. Acknowledge that things are changing. Maintain ' esprit de corps ' as best you can. Draw on your experience navigating big changes before. If you've been around this industry for any amount of time, you've been through some big paradigm shifts and come out the other side. Tell some stories, but don't make it all about you. The business and customer angles come down to maintaining consistent principles around what software gets shipped to customers. I personally have the pleasing-to-nobody opinion that LLM coding tools are useful but not risk-free. Surely you have some skeptics in your midst who feel the same. Don't dismiss them either. Security, quality, maintainability, incident response, and the work-life balance of your people are still the responsibility of the humans running the company. That's the job right now, however the machinery of it changes. Keep taking care of your people and customers, like you always have. You already know how. " Statue of Captain George Vancouver, anchors and the Custom House, King's Lynn " by ell brown is licensed under CC BY 2.0 . Like this? Please feel free to share it on your favourite social media or link site! Share it with friends! Hit subscribe to get new posts delivered to your inbox automatically. Feedback? Get in touch ! Doing my best to take care of the people. Doing what the business needs most at the given moment. Providing value to customers.

1 views
Max Bernstein 1 weeks ago

Type-based alias analysis in the Toy Optimizer

Another entry in the Toy Optimizer series . Last time, we did load-store forwarding in the context of our Toy Optimizer. We managed to cache the results of both reads from and writes to the heap—at compile-time! We were careful to mind object aliasing: we separated our heap information into alias classes based on what offset the reads/writes referenced. This way, if we didn’t know if object and aliased, we could at least know that different offsets would never alias (assuming our objects don’t overlap and memory accesses are on word-sized slots). This is a coarse-grained heuristic. Fortunately, we often have much more information available at compile-time than just the offset, so we should use it. I mentioned in a footnote that we could use type information, for example, to improve our alias analysis. We’ll add a lightweight form of type-based alias analysis (TBAA) (PDF) in this post. We return once again to Fil Pizlo land, specifically How I implement SSA form . We’re going to be using the hierarchical heap effect representation from the post in our implementation, but you can use your own type representation if you have one already. This representation divides the heap into disjoint regions by type. Consider, for example, that objects and objects do not overlap. A pointer is never going to alias an pointer. They can therefore be reasoned about separately. But sometimes you don’t have perfect type information available. If you have in your language an base class of all objects, then the heap overlaps with, say, the heap. So you need some way to represent that too—just having an enum doesn’t work cleanly. Here is an example simplified type hierarchy: Where might represent different parts of the runtime’s data structures, and could be further segmented into , , etc. Fil’s idea is that we can represent each node in that hierarchy with a tuple of integers (inclusive, exclusive) that represent the pre- and post-order traversals of the tree. Or, if tree traversals are not engraved into your bones, they represent the range of all the nested objects within them. Then the “does this write interfere with this read” check—the aliasing check—is a range overlap query. Here’s a perhaps over-engineered Python implementation of the range and heap hierarchy based on the Ruby generator and C++ runtime code from JavaScriptCore: Where kicks off the tree-numbering scheme. Fil’s implementation also covers a bunch of abstract heaps such as SSAState and Control because his is used for code motion and whatnot. That can be added on later but we will not do so in this post. So there you have it: a type representation. Now we need to use it in our load-store forwarding. Recall that our load-store optimization pass looks like this: At its core, it iterates over the instructions, keeping a representation of the heap at compile-time. Reads get cached, writes get cached, and writes also invalidate the state of compile-time information about fields that may alias. In this case, our may alias asks only if the offsets overlap. This means that the following unit test will fail: This test is expecting the write to to still remain cached even though we wrote to the same offset in —because we have annotated as being an and as being a . If we account for type information in our alias analysis, we can get this test to pass. After doing a bunch of fussing around with the load-store forwarding (many rewrites), I eventually got it down to a very short diff: If we don’t have any type/alias information, we default to “I know nothing” ( ) for each object. Then we check range overlap. The boolean logic in looks a little weird, maybe. But we can also rewrite (via DeMorgan’s law) as: So, keeping all the cached field state about fields that are known by offset and by type not to alias. Maybe that is clearer (but not as nice a diff). Note that the type representation is not so important here! You could use a bitset version of the type information if you want. The important things are that you can cheaply construct types and check overlap between them. Nice, now our test passes! We can differentiate between memory accesses on objects of different types. But what if we knew more? Sometimes we know where an object came from. For example, we may have seen it get allocated in the trace. If we saw an object’s allocation, we know that it does not alias (for example) any object that was passed in via a parameter. We can use this kind of information to our advantage. For example, in the following made up IR snippet: We know that (among other facts) doesn’t alias or because we have seen its allocation site. I saw this in the old V8 IR Hydrogen’s lightweight alias analysis 1 : There is plenty of other useful information such as: If you have other fun ones, please write in. We only handle loads and stores in our optimizer. Unfortunately, this means we may accidentally cache stale information. Consider: what happens if a function call (or any other opaque instruction) writes into an object we are tracking? The conservative approach is to invalidate all cached information on a function call. This is definitely correct, but it’s a bummer for the optimizer. Can we do anything? Well, perhaps we are calling a well-known function or a specific IR instruction. In that case, we can annotate it with effects in the same abstract heap model: if the instruction does not write, or only writes to some heaps, we can at least only partially invalidate our heap. However, if the function is unknown or otherwise opaque, we need at least more advanced alias information and perhaps even (partial) escape analysis. Consider: even if an instruction takes no operands, we have no idea what state it has access to. If it writes to any object A, we cannot safely cache information about any other object B unless we know for sure that A and B do not alias. And we don’t know what the instruction writes to. So we may only know we can cache information about B because it was allocated locally and has not escaped. Some runtimes such as ART pre-compute all of their alias information in a bit matrix. This makes more sense if you are using alias information in a full control-flow graph, where you might need to iterate over the graph a few times. In a trace context, you can do a lot in one single pass—no need to make a matrix. As usual, this is a toy IR and a toy optimizer, so it’s hard to say how much faster it makes its toy programs. In general, though, there is a dial for analysis and optimization that goes between precision and speed. This is a happy point on that dial, only a tiny incremental analysis cost bump above offset-only invalidation, but for higher precision. I like that tradeoff. Also, it is very useful in JIT compilers where generally the managed language is a little better-behaved than a C-like language . Somewhere in your IR there will be a lot of duplicate loads and stores from a strength reduction pass, and this can clean up the mess. Thanks for joining as I work through a small use of type-based alias analysis for myself. I hope you enjoyed. Thank you to Chris Gregory for helpful feedback. I made a fork of V8 to go spelunk around the Hydrogen IR. I reset the V8 repo to the last commit before they deleted it in favor of their new Sea of Nodes based IR called TurboFan.  ↩ If we know at compile-time that object A has 5 at offset 0 and object B has 7 at offset 0, then A and B don’t alias (thanks, CF) In the RPython JIT in PyPy, this is used to determine if two user (Python) objects don’t alias because we know the contents of the user (Python) class field Object size (though perhaps that is a special case of the above bullet) Field size/type Deferring alias checks to run-time Have a branch I made a fork of V8 to go spelunk around the Hydrogen IR. I reset the V8 repo to the last commit before they deleted it in favor of their new Sea of Nodes based IR called TurboFan.  ↩

0 views
ava's blog 2 weeks ago

when exercise started helping me

Nowadays, exercising really always saves me without fail. I realized that today, after again feeling absolutely terrible but then dragging myself out of bed to at least walk on my foldable treadmill. I started wondering when this change exactly happened and what led to it, because I used to hate exercise. I didn't understand people who said it helped with depression. When did it truly start being a reliable way to improve my mental state? What I struggled with back then were most definitely access, energy and health . I neither had a gym membership, nor did I have gym equipment at home. Wanting to exercise consisted of pulling out some yoga mat to do crunches like once a year, or going out for a run. Both suck when you haven't built it up over weeks or months! It was immediately difficult, painful and exhausting. My undiagnosed autoimmune diseases added more pain on top; I was just too inflamed to really work out well or even recover for days on end, and I dealt with a lot of fatigue on top of everything. That makes starting and keeping at it almost impossible, except for unexpected good phases. Without at least showing up semi-regularly, I made no progress, and every attempt I did make was immediately very exhausting with no reward. I felt like I couldn't last long enough in a session or exercise regimen to even reap the benefits. It didn't help at all that I immediately always chose something rather difficult or exhausting, as if I had to jump onto a level at which I expected a "default" human being to be at. So what changed is: I was diagnosed and found a working treatment. This one is big; so much pain and fatigue gone. Training results finally showed and made getting motivated and back on track easier. Some exercise even started helping with the residual pain and symptoms. I searched for things to do that were easier on me. I shouldn't immediately run or do crunches. Instead, even just walking, yoga, and some easy Pilates are enough, and more manageable to someone in my position. They are easier to pick back up after a few weeks and allow great control over varying the difficulty. With running, for example, I had no room to vary anything; even just the act of running was so exhausting back then that adjusting speed made no difference. With other forms of movement, I could build something without feeling totally exhausted. I signed up for the gym and just made showing up and walking on the treadmill a goal, and I watched videos or listened to podcasts. This was needed, because when I started it, I was still recovering from a really bad flare up and couldn't be trusted to walk around unsupervised in the forest somewhere. At the gym while just walking, I could slowly build up my exercise tolerance and endurance while seeing it as a sort of "me time" with some enjoyable videos, and with people around in case I suddenly started feeling dizzy or anything, and with some rails to hold on to. By saving videos for this time, I made it more entertaining and had something to look forward to on it. I invested in a spinning bike, and later in a foldable treadmill for at home use. I sometimes feel too bad physically or mentally to make it to the gym (or it is closed), and this enables me to still work out without being discouraged by my issues, time or weather. It also takes away the calculation of "Is it even worth showing up?" if I might just feel like 20 minutes of treadmill that day. Better 20 minutes than nothing! With all that, I slowly built up enough of a a baseline fitness for me that wouldn't make training annoying and just exhausting. It was easier to get back in after a break, and every time I had to take one, I had lost less progress than before. I got better and better at finding my sweet spot, neither under- nor overexercising. The more times I actually pushed myself to exercise despite feeling awful mentally and left it happier, the more it didn't feel like an outlier, but a guaranteed outcome. That made it easier to show up despite everything. It's still hard, but I know now that it is basically like a button to improve my mood, and who doesn't want that? That behavior just keeps getting reinforced every time I can get myself out of a hole with this. It gets harder and harder to convincingly tell myself " No, this time will be different; you'll feel the same or worse when you do this. You should stay in bed instead. " Lying down has a much worse track record: It never makes me feel better. Reply via email Published 12 Feb, 2026 I was diagnosed and found a working treatment. This one is big; so much pain and fatigue gone. Training results finally showed and made getting motivated and back on track easier. Some exercise even started helping with the residual pain and symptoms. I searched for things to do that were easier on me. I shouldn't immediately run or do crunches. Instead, even just walking, yoga, and some easy Pilates are enough, and more manageable to someone in my position. They are easier to pick back up after a few weeks and allow great control over varying the difficulty. With running, for example, I had no room to vary anything; even just the act of running was so exhausting back then that adjusting speed made no difference. With other forms of movement, I could build something without feeling totally exhausted. I signed up for the gym and just made showing up and walking on the treadmill a goal, and I watched videos or listened to podcasts. This was needed, because when I started it, I was still recovering from a really bad flare up and couldn't be trusted to walk around unsupervised in the forest somewhere. At the gym while just walking, I could slowly build up my exercise tolerance and endurance while seeing it as a sort of "me time" with some enjoyable videos, and with people around in case I suddenly started feeling dizzy or anything, and with some rails to hold on to. By saving videos for this time, I made it more entertaining and had something to look forward to on it. I invested in a spinning bike, and later in a foldable treadmill for at home use. I sometimes feel too bad physically or mentally to make it to the gym (or it is closed), and this enables me to still work out without being discouraged by my issues, time or weather. It also takes away the calculation of "Is it even worth showing up?" if I might just feel like 20 minutes of treadmill that day. Better 20 minutes than nothing! With all that, I slowly built up enough of a a baseline fitness for me that wouldn't make training annoying and just exhausting. It was easier to get back in after a break, and every time I had to take one, I had lost less progress than before. I got better and better at finding my sweet spot, neither under- nor overexercising. The more times I actually pushed myself to exercise despite feeling awful mentally and left it happier, the more it didn't feel like an outlier, but a guaranteed outcome. That made it easier to show up despite everything. It's still hard, but I know now that it is basically like a button to improve my mood, and who doesn't want that?

0 views

Rewriting pycparser with the help of an LLM

pycparser is my most widely used open source project (with ~20M daily downloads from PyPI [1] ). It's a pure-Python parser for the C programming language, producing ASTs inspired by Python's own . Until very recently, it's been using PLY: Python Lex-Yacc for the core parsing. In this post, I'll describe how I collaborated with an LLM coding agent (Codex) to help me rewrite pycparser to use a hand-written recursive-descent parser and remove the dependency on PLY. This has been an interesting experience and the post contains lots of information and is therefore quite long; if you're just interested in the final result, check out the latest code of pycparser - the main branch already has the new implementation. While pycparser has been working well overall, there were a number of nagging issues that persisted over years. I began working on pycparser in 2008, and back then using a YACC-based approach for parsing a whole language like C seemed like a no-brainer to me. Isn't this what everyone does when writing a serious parser? Besides, the K&R2 book famously carries the entire grammar of the C99 language in an appendix - so it seemed like a simple matter of translating that to PLY-yacc syntax. And indeed, it wasn't too hard, though there definitely were some complications in building the ASTs for declarations (C's gnarliest part ). Shortly after completing pycparser, I got more and more interested in compilation and started learning about the different kinds of parsers more seriously. Over time, I grew convinced that recursive descent is the way to go - producing parsers that are easier to understand and maintain (and are often faster!). It all ties in to the benefits of dependencies in software projects as a function of effort . Using parser generators is a heavy conceptual dependency: it's really nice when you have to churn out many parsers for small languages. But when you have to maintain a single, very complex parser, as part of a large project - the benefits quickly dissipate and you're left with a substantial dependency that you constantly grapple with. And then there are the usual problems with dependencies; dependencies get abandoned, and they may also develop security issues. Sometimes, both of these become true. Many years ago, pycparser forked and started vendoring its own version of PLY. This was part of transitioning pycparser to a dual Python 2/3 code base when PLY was slower to adapt. I believe this was the right decision, since PLY "just worked" and I didn't have to deal with active (and very tedious in the Python ecosystem, where packaging tools are replaced faster than dirty socks) dependency management. A couple of weeks ago this issue was opened for pycparser. It turns out the some old PLY code triggers security checks used by some Linux distributions; while this code was fixed in a later commit of PLY, PLY itself was apparently abandoned and archived in late 2025. And guess what? That happened in the middle of a large rewrite of the package, so re-vendoring the pre-archiving commit seemed like a risky proposition. On the issue it was suggested that "hopefully the dependent packages move on to a non-abandoned parser or implement their own"; I originally laughed this idea off, but then it got me thinking... which is what this post is all about. The original K&R2 grammar for C99 had - famously - a single shift-reduce conflict having to do with dangling else s belonging to the most recent if statement. And indeed, other than the famous lexer hack used to deal with C's type name / ID ambiguity , pycparser only had this single shift-reduce conflict. But things got more complicated. Over the years, features were added that weren't strictly in the standard but were supported by all the industrial compilers. The more advanced C11 and C23 standards weren't beholden to the promises of conflict-free YACC parsing (since almost no industrial-strength compilers use YACC at this point), so all caution went out of the window. The latest (PLY-based) release of pycparser has many reduce-reduce conflicts [2] ; these are a severe maintenance hazard because it means the parsing rules essentially have to be tie-broken by order of appearance in the code. This is very brittle; pycparser has only managed to maintain its stability and quality through its comprehensive test suite. Over time, it became harder and harder to extend, because YACC parsing rules have all kinds of spooky-action-at-a-distance effects. The straw that broke the camel's back was this PR which again proposed to increase the number of reduce-reduce conflicts [3] . This - again - prompted me to think "what if I just dump YACC and switch to a hand-written recursive descent parser", and here we are. None of the challenges described above are new; I've been pondering them for many years now, and yet biting the bullet and rewriting the parser didn't feel like something I'd like to get into. By my private estimates it'd take at least a week of deep heads-down work to port the gritty 2000 lines of YACC grammar rules to a recursive descent parser [4] . Moreover, it wouldn't be a particularly fun project either - I didn't feel like I'd learn much new and my interests have shifted away from this project. In short, the Potential well was just too deep. I've definitely noticed the improvement in capabilities of LLM coding agents in the past few months, and many reputable people online rave about using them for increasingly larger projects. That said, would an LLM agent really be able to accomplish such a complex project on its own? This isn't just a toy, it's thousands of lines of dense parsing code. What gave me hope is the concept of conformance suites mentioned by Simon Willison . Agents seem to do well when there's a very clear and rigid goal function - such as a large, high-coverage conformance test suite. And pycparser has an very extensive one . Over 2500 lines of test code parsing various C snippets to ASTs with expected results, grown over a decade and a half of real issues and bugs reported by users. I figured the LLM can either succeed or fail and throw its hands up in despair, but it's quite unlikely to produce a wrong port that would still pass all the tests. So I set it to run. I fired up Codex in pycparser's repository, and wrote this prompt just to make sure it understands me and can run the tests: Codex figured it out (I gave it the exact command, after all!); my next prompt was the real thing [5] : Here Codex went to work and churned for over an hour . Having never observed an agent work for nearly this long, I kind of assumed it went off the rails and will fail sooner or later. So I was rather surprised and skeptical when it eventually came back with: It took me a while to poke around the code and run it until I was convinced - it had actually done it! It wrote a new recursive descent parser with only ancillary dependencies on PLY, and that parser passed the test suite. After a few more prompts, we've removed the ancillary dependencies and made the structure clearer. I hadn't looked too deeply into code quality at this point, but at least on the functional level - it succeeded. This was very impressive! A change like the one described above is impossible to code-review as one PR in any meaningful way; so I used a different strategy. Before embarking on this path, I created a new branch and once Codex finished the initial rewrite, I committed this change, knowing that I will review it in detail, piece-by-piece later on. Even though coding agents have their own notion of history and can "revert" certain changes, I felt much safer relying on Git. In the worst case if all of this goes south, I can nuke the branch and it's as if nothing ever happened. I was determined to only merge this branch onto main once I was fully satisfied with the code. In what follows, I had to git reset several times when I didn't like the direction in which Codex was going. In hindsight, doing this work in a branch was absolutely the right choice. Once I've sufficiently convinced myself that the new parser is actually working, I used Codex to similarly rewrite the lexer and get rid of the PLY dependency entirely, deleting it from the repository. Then, I started looking more deeply into code quality - reading the code created by Codex and trying to wrap my head around it. And - oh my - this was quite the journey. Much has been written about the code produced by agents, and much of it seems to be true. Maybe it's a setting I'm missing (I'm not using my own custom AGENTS.md yet, for instance), but Codex seems to be that eager programmer that wants to get from A to B whatever the cost. Readability, minimalism and code clarity are very much secondary goals. Using raise...except for control flow? Yep. Abusing Python's weak typing (like having None , false and other values all mean different things for a given variable)? For sure. Spreading the logic of a complex function all over the place instead of putting all the key parts in a single switch statement? You bet. Moreover, the agent is hilariously lazy . More than once I had to convince it to do something it initially said is impossible, and even insisted again in follow-up messages. The anthropomorphization here is mildly concerning, to be honest. I could never imagine I would be writing something like the following to a computer, and yet - here we are: "Remember how we moved X to Y before? You can do it again for Z, definitely. Just try". My process was to see how I can instruct Codex to fix things, and intervene myself (by rewriting code) as little as possible. I've mostly succeeded in this, and did maybe 20% of the work myself. My branch grew dozens of commits, falling into roughly these categories: Interestingly, after doing (3), the agent was often more effective in giving the code a "fresh look" and succeeding in either (1) or (2). Eventually, after many hours spent in this process, I was reasonably pleased with the code. It's far from perfect, of course, but taking the essential complexities into account, it's something I could see myself maintaining (with or without the help of an agent). I'm sure I'll find more ways to improve it in the future, but I have a reasonable degree of confidence that this will be doable. It passes all the tests, so I've been able to release a new version (3.00) without major issues so far. The only issue I've discovered is that some of CFFI's tests are overly precise about the phrasing of errors reported by pycparser; this was an easy fix . The new parser is also faster, by about 30% based on my benchmarks! This is typical of recursive descent when compared with YACC-generated parsers, in my experience. After reviewing the initial rewrite of the lexer, I've spent a while instructing Codex on how to make it faster, and it worked reasonably well. While working on this, it became quite obvious that static typing would make the process easier. LLM coding agents really benefit from closed loops with strict guardrails (e.g. a test suite to pass), and type-annotations act as such. For example, had pycparser already been type annotated, Codex would probably not have overloaded values to multiple types (like None vs. False vs. others). In a followup, I asked Codex to type-annotate pycparser (running checks using ty ), and this was also a back-and-forth because the process exposed some issues that needed to be refactored. Time will tell, but hopefully it will make further changes in the project simpler for the agent. Based on this experience, I'd bet that coding agents will be somewhat more effective in strongly typed languages like Go, TypeScript and especially Rust. Overall, this project has been a really good experience, and I'm impressed with what modern LLM coding agents can do! While there's no reason to expect that progress in this domain will stop, even if it does - these are already very useful tools that can significantly improve programmer productivity. Could I have done this myself, without an agent's help? Sure. But it would have taken me much longer, assuming that I could even muster the will and concentration to engage in this project. I estimate it would take me at least a week of full-time work (so 30-40 hours) spread over who knows how long to accomplish. With Codex, I put in an order of magnitude less work into this (around 4-5 hours, I'd estimate) and I'm happy with the result. It was also fun . At least in one sense, my professional life can be described as the pursuit of focus, deep work and flow . It's not easy for me to get into this state, but when I do I'm highly productive and find it very enjoyable. Agents really help me here. When I know I need to write some code and it's hard to get started, asking an agent to write a prototype is a great catalyst for my motivation. Hence the meme at the beginning of the post. One can't avoid a nagging question - does the quality of the code produced by agents even matter? Clearly, the agents themselves can understand it (if not today's agent, then at least next year's). Why worry about future maintainability if the agent can maintain it? In other words, does it make sense to just go full vibe-coding? This is a fair question, and one I don't have an answer to. Right now, for projects I maintain and stand behind , it seems obvious to me that the code should be fully understandable and accepted by me, and the agent is just a tool helping me get to that state more efficiently. It's hard to say what the future holds here; it's going to interesting, for sure. There was also the lexer to consider, but this seemed like a much simpler job. My impression is that in the early days of computing, lex gained prominence because of strong regexp support which wasn't very common yet. These days, with excellent regexp libraries existing for pretty much every language, the added value of lex over a custom regexp-based lexer isn't very high. That said, it wouldn't make much sense to embark on a journey to rewrite just the lexer; the dependency on PLY would still remain, and besides, PLY's lexer and parser are designed to work well together. So it wouldn't help me much without tackling the parser beast. The code in X is too complex; why can't we do Y instead? The use of X is needlessly convoluted; change Y to Z, and T to V in all instances. The code in X is unclear; please add a detailed comment - with examples - to explain what it does.

0 views
Justin Duke 3 weeks ago

Brief notes on migrating to Postgres-backed jobs

It seems premature to talk about a migration that is only halfway done, even if it's the hard half that's done — but I think there's something useful in documenting the why and how of a transition while you're still in the thick of it, before the revisionist history of completion sets in. Early last year, we built out a system for running background jobs directly against Postgres within Django. This very quickly got abstracted out into a generic task runner — shout out to Brandur and many other people who have been beating this drum for a while. And as far as I can tell, this concept of shifting away from Redis and other less-durable caches for job infrastructure is regaining steam on the Rails side of the ecosystem, too. The reason we did it was mostly for ergonomics around graceful batch processing. It is significantly easier to write a poller in Django for stuff backed by the ORM than it is to try and extend RQ or any of the other task runner options that are Redis-friendly. Django gives you migrations, querysets, admin visibility, transactional guarantees — all for free, all without another moving part. And as we started using it and it proved stable, we slowly moved more and more things over to it. At the time of this writing, around half of our jobs by quantity — which represent around two-thirds by overall volume — have been migrated over from RQ onto this system. This is slightly ironic given that we also last year released django-rq-cron , a library that, if I have my druthers, we will no longer need. Fewer moving parts is the watchword. We're removing spindles from the system and getting closer and closer to a simple, portable, and legible stack of infrastructure.

1 views
Steve Klabnik 1 months ago

The most important thing when working with LLMs

Okay, so you’ve got the basics of working with Claude going. But you’ve probably run into some problems: Claude doesn’t do what you want it to do, it gets confused about what’s happening and goes off the rails, all sorts of things can go wrong. Let’s talk about how to improve upon that. The most important thing that you can do when working with an LLM is give it a way to quickly evaluate if it’s doing the right thing, and if it isn’t, point it in the right direction. This is incredibly simple, yet, like many simple things, also wildly complex. But if you can keep this idea in mind, you’ll be well equipped to become effective when working with agents. A long time ago, I used to teach programming classes. Many of these were to adults, but some of them were to children. Teenaged children, but children nonetheless. We used to do an exercise to try and help them understand the difference between talking in English and talking in Ruby, or JavaScript, or whatever kind of programming language rather than human language. The exercise went like this: I would have a jar of peanut butter, a jar of jelly, a loaf of bread, a spoon, and a knife. I would ask the class to take a piece of paper and write down a series of steps to make a peanut butter and jelly sandwich. They’d all then give me their algorithms, and the fun part for me began: find one that’s innocently written that I could hilariously misinterpret. For example, I might find one like: I’d read this aloud to the class, you all understand this is a recipe for a peanut butter and jelly sandwich, right? I’d take the jar of peanut butter and place it upon the unopened bag of bread. I’d do the same with the jar of jelly. This would of course, squish the bread, which feels slightly transgressive given that you’re messing up the bread, so the kids would love that. I’d then say something like “the bread is already together, I do not understand this instruction.” After the inevitable laughter died down, I’d make my point: the computer will do exactly what you say, but not what you mean. So you have to get good at figuring out when you said something different than what you mean. Sort of ironically, LLMs are kind of the inverse of this: they’ll sometimes try to figure out what you mean, and then do that, rather than simply doing what you say. But the core thing here is the same: semantic drift from what we intended our program to do, and what it actually does. The second lesson is something I came up with sometime, I don’t even remember how exactly. But it’s something I told my students a lot. And that’s this: If your program did everything you wanted without problems, you wouldn’t be programming: you’d be using your program. The act of programming is itself perpetually to be in a state where something is either inadequate or broken, and the job is to fix that. I also think this is a bit simplistic but also getting at something. I had originally come up with this in the context of trying to explain how you need to manage your frustration when programming; if you get easily upset by something not working, doing computer programming might not be for you. But I do think these two things combine into something that gets to the heart of what we do: we need to understand what it is we want our software to do, and then make it do that. Sometimes, our software doesn’t do something yet. Sometimes, it does something, but incorrectly. Both of these cases result in a divergence from the program’s intended behavior. So, how do we know if our program does what it should do? Well, what we’ve been doing so far is: This is our little mini software development lifecycle, or “SDLC.” This process works, but is slow. That’s great for getting the feel of things, but programmers are process optimizers by trade. One of my favorite tools for optimization is called Amdahl’s law . The core idea is this, formulated in my own words: If you have a process that takes multiple steps, and you want to speed it up, if you optimize only one step, the maximum amount of speedup you’ll get is determined by the portion of the process that step takes. In other words, imagine we have a three step process: This process takes a total of 13 minutes to complete. If we speed up step 3 by double, it goes from two minutes to one minute, and now our process takes 12 minutes. However, if we were able to speed up step 2 by double, we’d cut off five minutes, and our process would now take 8 minutes. We can use this style of analysis to guide our thinking in many ways, but the most common way, for me, is to decide where to put my effort. Given the process above, I’m going to look at step 2 first to try and figure out how to make it faster. That doesn’t mean we can achieve the 2x speedup, but heck, if we get a 10% decrease in time, that’s the same time as if we did get a 2x on step 3. So it’s at least the place where we should start. I chose the above because, well, I think it properly models the proportion of time we’re taking when doing things with LLMs: we spend some time asking it to do something, and we spend a bit more time reviewing its output. But we spend a lot of time clicking “accept edit,” and a lot of time allowing Claude to execute tools. This will be our next step forward, as this will increase our velocity when working with the tools significantly. However, like with many optimization tasks, this is easier said than done. The actual mechanics of improving the speed of this step are simple at first: hit to auto-accept edits, and “Yes, and don’t ask again for commands” when you think the is safe for Claude to run. By doing this, once you have enough commands allowed, your input for step 2 of our development loop can drop to zero. Of course, it takes time for Claude to actually implement what you’ve asked, so it’s not like our 13 minute process drops to three, but still, this is a major efficiency step. But we were actively monitoring Claude for a reason. Claude will sometimes do incorrect things, and we need to correct it. At some point, Claude will say “Hey I’ve finished doing what you asked of me!” and it doesn’t matter how fast it does step 2 if we get to step 3 and it’s just incorrect, and we need to throw everything out and try again. So, how do we get Claude to guide itself in the right direction? A useful technique for figuring out what you should do is to consider the ending: where do we want to go? That will inform what we need to do to get there. Well, the ending of step 2 is knowing when to transition to step 3. And that transition is gated by “does the software do what it is supposed to do?” That’s a huge question! But in practice, we can do what we always do: start simple, and iterate from there. Right now, the transition from step 2 to step 3 is left up to Claude. Claude will use its own judgement to decide when it thinks that the software is working. And it’ll be right. But why leave that up to chance? I expect that some of you are thinking that maybe I’m belaboring this point. “Why not just skip to ? That’s the idea, right? We need tests.” Well on some level: yes. But on another level, no. I’m trying to teach you how to think here, not give you the answer. Because it might be broader than just “run the tests.” Maybe you are working on a project where the tests aren’t very good yet. Maybe you’re working on a behavior that’s hard to automatically test. Maybe the test suite takes a very long time, and so isn’t appropriate to be running over and over and over. Remember our plan from the last post? Where Claude finished the plan with this: These aren’t “tests” in the traditional sense of a test suite, but they are objective measures that Claude can invoke itself to understand if it’s finished the task. Claude could run after every file edit if it wanted to, and as soon as it sees , it knows that it’s finished. You don’t need a comprehensive test suite. You just need some sort of way for Claude to detect if it’s done in some sort of objective fashion. Of course, we can do better. While giving Claude a way to know if it’s done working is important, there’s a second thing we need to pay attention to: when Claude isn’t done working, can we guide it towards doing the right thing, rather than the wrong thing? For example, those of you who are of a similar vintage as myself may remember the output of early compilers. It was often… not very helpful. Imagine that we told Claude that it should run to know if things are working, and the only output from it was the exit code: 0 if we succeeded, 1 if we failed. That would accomplish our objective of letting Claude know when things are done, but it wouldn’t help Claude know what went wrong when it returns 1. This is one reason why I think Rust works well with LLMs. Take this incorrect Rust program: The Rust compiler won’t just say “yeah this program is incorrect,” it’ll give you this (as of Rust 1.93.0): The compiler will point out the exact place in the code itself of where there’s an issue, and even make suggestions as to how to fix it. This goes beyond just simply saying “it doesn’t work” and instead nudges you to what might fix the problem. Of course, this isn’t perfect, but if it’s helpful more than not, that’s a win. Of course, too much verbosity isn’t helpful either. A lot of tooling has gotten much more verbose lately. Often times, this is really nice as a human. Pleasant terminal output is, well… pleasant. But that doesn’t mean that it’s always good or useful. For example, here’s the default output for : This is not bad output. It’s nice. But it’s also not useful for an LLM. We don’t need to read all of the tests that are passing, we really just want to see some sort of minimal output, and then what failed if something failed. In Cargo’s case, that’s for “quiet”: There is no point in giving a ton of verbose input to an LLM that it isn’t even going to need to use. If you’re feeding a tools’ output to an LLM, you should consider both what the tool does in the failure case, but also the success case. Maybe configure things to be a bit simpler for Claude. You’ll save some tokens and get better results. All of this has various implications for all sorts of things. For example, types are a great way to get quick feedback on what you’re doing. A comprehensive test suite that completes quickly is useful for giving feedback to the LLM. But that also doesn’t inherently mean that types must be better or that you need to be doing TDD; whatever gives you that underlying principle of “objective feedback for the success case and guidance for the failure case” will be golden, no matter what tech stack you use. This brings me to something that may be counter-intuitive, but I think is also true, and worth keeping in the back of your mind: what’s good for Claude is also probably good for humans working on your system. A good test suite was considered golden before LLMs. That it’s great for them is just a nice coincidence. At the end of the day, Claude is not a person, but it tackles programming problems in a similar fashion to how we do: take in the problem, attempt a solution, run the compiler/linter/tests, and then see what feedback it gets, then iterate. That core loop is the same, even if humans can exercise better judgement and can have more skill. And so even though I pitched fancy terminal output as an example of how humans and LLMs need different things, that’s really just a superficial kind of thing. Good error messages are still critical for both. We’re just better at having terminal spinners not take up space in our heads while we’re solving a problem, and can appreciate the aesthetics in a way that Claude does not. Incidentally, this is one of the things that makes me hopeful about the future of software development under agentic influence. Engineers always complain that management doesn’t give us time to do refactorings, to improve the test suite, to clean our code. Part of the reason for this is that we often didn’t do a good job of pitching how it would actually help accomplish business goals. But even if you’re on the fence about AI, and upset that management is all about AI: explain to management that this stuff is a force multiplier for your agents. Use the time you’ve saved by doing things the agentic way towards improving your test suite, or your documentation, or whatever else. I think there’s a chance that all of this stuff leads to higher quality codebases than ones filled with slop. But it also requires us to make the decisions that will lead is in that direction. That’s what I have for you today: consider how you can help Claude evaluate its own work. Give it explicit success criteria, and make evaluating that criteria as simple and objective as possible. In the next post, we’re gonna finally talk about . Can you believe that I’ve talked this much about how to use Claude and we haven’t talked about ? There’s good reason for that, as it turns out. We’re going to talk a bit more about understanding how interacting with LLMs work, and how it can help us both improve step 1 in our process, but also continue to make step 2 better and better. Here’s my post about this post on BlueSky: Steve Klabnik @steveklabnik.com · Jan 22 Replying to Steve Klabnik Agentic development basics: steveklabnik.com/writing/agen... Agentic development basics Blog post: Agentic development basics by Steve Klabnik steveklabnik.com Steve Klabnik @steveklabnik.com The most important thing when working with LLMs steveklabnik.com/writing/the-... The most important thing when working with LLMs Blog post: The most important thing when working with LLMs by Steve Klabnik Put the peanut butter on the bread Put the jelly on the bread Put the bread together Asking the LLM to do something by typing up what we want it to do Closely observing its behavior and course correcting it when it goes off of the rails Eventually, after it says that it’s finished, reviewing its output Ten minutes Two minutes

0 views
daniel.haxx.se 1 months ago

The end of the curl bug-bounty

tldr: an attempt to reduce the terror reporting . There is no longer a curl bug-bounty program. It officially stops on January 31, 2026. After having had a few half-baked previous takes, in April 2019 we kicked off the first real curl bug-bounty with the help of Hackerone, and while it stumbled a bit at first it has been quite successful I think. We attracted skilled researchers who reported plenty of actual vulnerabilities for which we paid fine monetary rewards. We have certainly made curl better as a direct result of this: 87 confirmed vulnerabilities and over 100,000 USD paid as rewards to researchers. I’m quite happy and proud of this accomplishment. I would like to especially highlight the awesome Internet Bug Bounty project, which has paid the bounties for us for many years. We could not have done this without them. Also of course Hackerone, who has graciously hosted us and been our partner through these years. Looking back, I think we can say that the downfall of the bug-bounty program started slowly in the second half of 2024 but accelerated badly in 2025. We saw an explosion in AI slop reports combined with a lower quality even in the reports that were not obvious slop – presumably because they too were actually misled by AI but with that fact just hidden better. Maybe the first five years made it possible for researchers to find and report the low hanging fruit. Previous years we have had a rate of somewhere north of 15% of the submissions ending up confirmed vulnerabilities. Starting 2025, the confirmed-rate plummeted to below 5%. Not even one in twenty was real . The never-ending slop submissions take a serious mental toll to manage and sometimes also a long time to debunk. Time and energy that is completely wasted while also hampering our will to live. I have also started to get the feeling that a lot of the security reporters submit reports with a bad faith attitude. These “helpers” try too hard to twist whatever they find into something horribly bad and a critical vulnerability, but they rarely actively contribute to actually improve curl. They can go to extreme efforts to argue and insist on their specific current finding, but not to write a fix or work with the team on improving curl long-term etc. I don’t think we need more of that. There are these three bad trends combined that makes us take this step: the mind-numbing AI slop, humans doing worse than ever and the apparent will to poke holes rather than to help. In an attempt to do something about the sorry state of curl security reports, this is what we do: We believe that we can maintain and continue to evolve curl security in spite of this change. Maybe even improve thanks to this, as hopefully this step helps prevent more people pouring sand into the machine. Ideally we reduce the amount of wasted time and effort. I believe the best and our most valued security reporters still will tell us when they find security vulnerabilities. If you suspect a security problem in curl going forward, we advise you to head over to GitHub and submit them there. Alternatively, you send an email with the full report to . In both cases, the report is received and handled privately by the curl security team. But with no monetary reward offered . Hackerone was good to us and they have graciously allowed us to run our program on their platform for free for many years. We thank them for that service. As we now drop the rewards, we feel it makes a clear cut and displays a clearer message to everyone involved by also moving away from Hackerone as a platform for vulnerability reporting. It makes the change more visible. It is probably going to be harder for us to publicly disclose every incoming security report in the same way we have done it on Hackerone for the last year. We need to work out something to make sure that we can keep doing it at least imperfectly, because I believe in the goodness of such transparency. Let me emphasize that this change does not impact our presence and mode of operation with the curl repository and its hosting on GitHub . We hear about projects having problems with low-quality AI slop submissions on GitHub as well, in the form of issues and pull-requests, but for curl we have not (yet) seen this – and frankly I don’t think switching to a GitHub alternative saves us from that. Compared to others, we seem to be affected by the sloppy security reports to a higher degree than the average Open Source project. With the help of Hackerone, we got numbers of how the curl bug-bounty has compared with other programs over the last year. It turns out curl’s program has seen more volume and noise than other public open source bug bounty programs in the same cohort. Over the past four quarters, curl’s inbound report volume has risen sharply, while other bounty-paying open source programs in the cohort, such as Ruby, Node, and Rails, have not seen a meaningful increase and have remained mostly flat or declined slightly. In the chart, the pink line represents curl’s report volume, and the gray line reflects the broader cohort. Inbound Report Volume on Hackerone: curl compared to OSS peers We suspect the idea of getting money for it is a big part of the explanation. It brings in real reports, but makes it too easy to be annoying with little to no penalty to the user. The reputation system and available program settings were not sufficient for us to prevent sand from getting into the machine. The exact reason why we suffer more of this abuse than others remains a subject for further speculation and research. There is a non-zero risk that our guesses are wrong and that the volume and security report frequency will keep up even after these changes go into effect. If that happens, we will deal with it then and take further appropriate steps. I prefer not to overdo things or overplan already now for something that ideally does not happen. People keep suggesting that one way to deal with the report tsunami is to charge security researchers a small amount of money for the privilege of submitting a vulnerability report to us. A curl reporters security club with an entrance fee. I think that is a less good solution than just dropping the bounty. Some of the reasons include: Maybe we need to do this later anyway, but we stay away from it for now. We have seen other projects and repositories see similar AI-induced problems for pull requests, but this has not been a problem for the curl project. I believe that for PRs we have much better means to sort out the weed with automatic means, since we have tools, tests and scanners to verify such contributions. We don’t need to waste any human time on pull requests until the quality is good enough to get green check-marks from 200 CI jobs. I will do a talk at FOSDEM 2026 titled Open Source Security in spite of AI that of course will touch on this subject. We never say never. This is now and we might have reasons to reconsider and make a different decision in the future. If we do, we will let you know. These changes are applied now with the hope that they will have a positive effect for the project and its maintainers. If that turns out to not be the outcome, we will of course continue and apply further changes later. Since I created the pull request for updating the bug-bounty information for curl on January 14, almost two weeks before we merged it, various media picked up the news and published articles. Long before I posted this blog post. Also discussed (indirectly) on Hacker News . We no longer offer any monetary rewards for security reports – no matter which severity. In an attempt to remove the incentives for submitting made up lies. We stop using Hackerone as the recommended channel to report security problems. To make the change immediately obvious and because without a bug-bounty program we don’t need it. We refer everyone to submit suspected curl security problems on GitHub using their Private vulnerability reporting feature. We continue to immediately ban and publicly ridicule everyone who submits AI slop to the project. Charging people money in an International context is complicated and a maintenance burden. Dealing with charge-backs, returns and other complaints and friction add work. It would limit who could or would submit issues. Even some who actually find legitimate issues. The Register: Curl shutters bug bounty program to remove incentive for submitting AI slop Elektroniktidningen: cURL removes bug bounties Heise online: curl: Projekt beendet Bug-Bounty-Programm Neowin: Beloved tool, cURL is shutting down its bug bounty over AI slop reports Golem: Curl-Entwickler dreht dem “KI-Schrott” den Geldhahn zu Linux Easy: cURL chiude il programma bug bounty: troppi report generati dall’AI Bleeping Computer: Curl ending bug bounty program after flood of AI slop reports The New Stack: Drowning in AI slop, cURL ends bug bounties Ars Technica: Overrun with AI slop, cURL scraps bug bounties to ensure “intact mental health” PressMind Labs: cURL konczy program bug bounty – czy to koniec jakosci zgloszen? Socket: curl Shuts Down Bug Bounty Program After Flood of AI Slop Reports

0 views
Sean Goedecke 1 months ago

How I estimate work as a staff software engineer

There’s a kind of polite fiction at the heart of the software industry. It goes something like this: Estimating how long software projects will take is very hard, but not impossible. A skilled engineering team can, with time and effort, learn how long it will take for them to deliver work, which will in turn allow their organization to make good business plans. This is, of course, false. As every experienced software engineer knows, it is not possible to accurately estimate software projects . The tension between this polite fiction and its well-understood falseness causes a lot of strange activity in tech companies. For instance, many engineering teams estimate work in t-shirt sizes instead of time, because it just feels too obviously silly to the engineers in question to give direct time estimates. Naturally, these t-shirt sizes are immediately translated into hours and days when the estimates make their way up the management chain. Alternatively, software engineers who are genuinely trying to give good time estimates have ridiculous heuristics like “double your initial estimate and add 20%“. This is basically the same as giving up and saying “just estimate everything at a month”. Should tech companies just stop estimating? One of my guiding principles is that when a tech company is doing something silly, they’re probably doing it for a good reason . In other words, practices that appear to not make sense are often serving some more basic, illegible role in the organization. So what is the actual purpose of estimation, and how can you do it well as a software engineer? Before I get into that, I should justify my core assumption a little more. People have written a lot about this already, so I’ll keep it brief. I’m also going to concede that sometimes you can accurately estimate software work , when that work is very well-understood and very small in scope. For instance, if I know it takes half an hour to deploy a service 1 , and I’m being asked to update the text in a link, I can accurately estimate the work at something like 45 minutes: five minutes to push the change up, ten minutes to wait for CI, thirty minutes to deploy. For most of us, the majority of software work is not like this. We work on poorly-understood systems and cannot predict exactly what must be done in advance. Most programming in large systems is research : identifying prior art, mapping out enough of the system to understand the effects of changes, and so on. Even for fairly small changes, we simply do not know what’s involved in making the change until we go and look. The pro-estimation dogma says that these questions ought to be answered during the planning process, so that each individual piece of work being discussed is scoped small enough to be accurately estimated. I’m not impressed by this answer. It seems to me to be a throwback to the bad old days of software architecture , where one architect would map everything out in advance, so that individual programmers simply had to mechanically follow instructions. Nobody does that now, because it doesn’t work: programmers must be empowered to make architectural decisions, because they’re the ones who are actually in contact with the code 2 . Even if it did work, that would simply shift the impossible-to-estimate part of the process backwards, into the planning meeting (where of course you can’t write or run code, which makes it near-impossible to accurately answer the kind of questions involved). In short: software engineering projects are not dominated by the known work, but by the unknown work, which always takes 90% of the time. However, only the known work can be accurately estimated. It’s therefore impossible to accurately estimate software projects in advance. Estimates do not help engineering teams deliver work more efficiently. Many of the most productive years of my career were spent on teams that did no estimation at all: we were either working on projects that had to be done no matter what, and so didn’t really need an estimate, or on projects that would deliver a constant drip of value as we went, so we could just keep going indefinitely 3 . In a very real sense, estimates aren’t even made by engineers at all . If an engineering team comes up with a long estimate for a project that some VP really wants, they will be pressured into lowering it (or some other, more compliant engineering team will be handed the work). If the estimate on an undesirable project - or a project that’s intended to “hold space” for future unplanned work - is too short, the team will often be encouraged to increase it, or their manager will just add a 30% buffer. One exception to this is projects that are technically impossible, or just genuinely prohibitively difficult. If a manager consistently fails to pressure their teams into giving the “right” estimates, that can send a signal up that maybe the work can’t be done after all. Smart VPs and directors will try to avoid taking on technically impossible projects. Another exception to this is areas of the organization that senior leadership doesn’t really care about. In a sleepy backwater, often the formal estimation process does actually get followed to the letter, because there’s no director or VP who wants to jump in and shape the estimates to their ends. This is one way that some parts of a tech company can have drastically different engineering cultures to other parts. I’ll let you imagine the consequences when the company is re-orged and these teams are pulled into the spotlight. Estimates are political tools for non-engineers in the organization . They help managers, VPs, directors, and C-staff decide on which projects get funded and which projects get cancelled. The standard way of thinking about estimates is that you start with a proposed piece of software work, and you then go and figure out how long it will take. This is entirely backwards. Instead, teams will often start with the estimate, and then go and figure out what kind of software work they can do to meet it. Suppose you’re working on a LLM chatbot, and your director wants to implement “talk with a PDF”. If you have six months to do the work, you might implement a robust file upload system, some pipeline to chunk and embed the PDF content for semantic search, a way to extract PDF pages as image content to capture formatting and diagrams, and so on. If you have one day to do the work, you will naturally search for simpler approaches: for instance, converting the PDF to text client-side and sticking the entire thing in the LLM context, or offering a plain-text “grep the PDF” tool. This is true at even at the level of individual lines of code. When you have weeks or months until your deadline, you might spend a lot of time thinking airily about how you could refactor the codebase to make your new feature fit in as elegantly as possible. When you have hours, you will typically be laser-focused on finding an approach that will actually work. There are always many different ways to solve software problems. Engineers thus have quite a lot of discretion about how to get it done. So how do I estimate, given all that? I gather as much political context as possible before I even look at the code . How much pressure is on this project? Is it a casual ask, or do we have to find a way to do this? What kind of estimate is my management chain looking for? There’s a huge difference between “the CTO really wants this in one week” and “we were looking for work for your team and this seemed like it could fit”. Ideally, I go to the code with an estimate already in hand . Instead of asking myself “how long would it take to do this”, where “this” could be any one of a hundred different software designs, I ask myself “which approaches could be done in one week?“. I spend more time worrying about unknowns than knowns . As I said above, unknown work always dominates software projects. The more “dark forests” in the codebase this feature has to touch, the higher my estimate will be - or, more concretely, the tighter I need to constrain the set of approaches to the known work. Finally, I go back to my manager with a risk assessment, not with a concrete estimate . I don’t ever say “this is a four-week project”. I say something like “I don’t think we’ll get this done in one week, because X Y Z would need to all go right, and at least one of those things is bound to take a lot more work than we expect. Ideally, I go back to my manager with a series of plans, not just one: In other words, I don’t “break down the work to determine how long it will take”. My management chain already knows how long they want it to take. My job is to figure out the set of software approaches that match that estimate. Sometimes that set is empty: the project is just impossible, no matter how you slice it. In that case, my management chain needs to get together and figure out some way to alter the requirements. But if I always said “this is impossible”, my managers would find someone else to do their estimates. When I do that, I’m drawing on a well of trust that I build up by making pragmatic estimates the rest of the time. Many engineers find this approach distasteful. One reason is that they don’t like estimating in conditions of uncertainty, so they insist on having all the unknown questions answered in advance. I have written a lot about this in Engineers who won’t commit and How I provide technical clarity to non-technical leaders , but suffice to say that I think it’s cowardly. If you refuse to estimate, you’re forcing someone less technical to estimate for you. Some engineers think that their job is to constantly push back against engineering management, and that helping their manager find technical compromises is betraying some kind of sacred engineering trust. I wrote about this in Software engineers should be a little bit cynical . If you want to spend your career doing that, that’s fine, but I personally find it more rewarding to find ways to work with my managers (who have almost exclusively been nice people). Other engineers might say that they rarely feel this kind of pressure from their directors or VPs to alter estimates, and that this is really just the sign of a dysfunctional engineering organization. Maybe! I can only speak for the engineering organizations I’ve worked in. But my suspicion is that these engineers are really just saying that they work “out of the spotlight”, where there’s not much pressure in general and teams can adopt whatever processes they want. There’s nothing wrong with that. But I don’t think it qualifies you to give helpful advice to engineers who do feel this kind of pressure. I think software engineering estimation is generally misunderstood. The common view is that a manager proposes some technical project, the team gets together to figure out how long it would take to build, and then the manager makes staffing and planning decisions with that information. In fact, it’s the reverse: a manager comes to the team with an estimate already in hand (though they might not come out and admit it), and then the team must figure out what kind of technical project might be possible within that estimate. This is because estimates are not by or for engineering teams. They are tools used for managers to negotiate with each other about planned work. Very occasionally, when a project is literally impossible, the estimate can serve as a way for the team to communicate that fact upwards. But that requires trust. A team that is always pushing back on estimates will not be believed when they do encounter a genuinely impossible proposal. When I estimate, I extract the range my manager is looking for, and only then do I go through the code and figure out what can be done in that time. I never come back with a flat “two weeks” figure. Instead, I come back with a range of possibilities, each with their own risks, and let my manager make that tradeoff. It is not possible to accurately estimate software work. Software projects spend most of their time grappling with unknown problems, which by definition can’t be estimated in advance. To estimate well, you must therefore basically ignore all the known aspects of the work, and instead try and make educated guesses about how many unknowns there are, and how scary each unknown is. edit: I should thank one of my readers, Karthik, who emailed me to ask about estimates, thus revealing to me that I had many more opinions than I thought. For anyone wincing at that time, I mean like three minutes of actual deployment and twenty-seven minutes of waiting for checks to pass or monitors to turn up green. I write a lot more about this in You can’t design software you don’t work on . For instance, imagine a mandate to improve the performance of some large Rails API, one piece at a time. I could happily do that kind of work forever. We tackle X Y Z directly, which might all go smoothly but if it blows out we’ll be here for a month We bypass Y and Z entirely, which would introduce these other risks but possibly allow us to hit the deadline We bring in help from another team who’s more familiar with X and Y, so we just have to focus on Z For anyone wincing at that time, I mean like three minutes of actual deployment and twenty-seven minutes of waiting for checks to pass or monitors to turn up green. ↩ I write a lot more about this in You can’t design software you don’t work on . ↩ For instance, imagine a mandate to improve the performance of some large Rails API, one piece at a time. I could happily do that kind of work forever. ↩

1 views
Max Bernstein 1 months ago

A multi-entry CFG design conundrum

The ZJIT compiler compiles Ruby bytecode (YARV) to machine code. It starts by transforming the stack machine bytecode into a high-level graph-based intermediate representation called HIR. We use a more or less typical 1 control-flow graph (CFG) in HIR. We have a compilation unit, , which has multiple basic blocks, . Each block contains multiple instructions, . HIR is always in SSA form, and we use the variant of SSA with block parameters instead of phi nodes. Where it gets weird, though, is our handling of multiple entrypoints. See, YARV handles default positional parameters (but not default keyword parameters) by embedding the code to compute the defaults inside the callee bytecode. Then callers are responsible for figuring out what offset in the bytecode they should start running the callee, depending on the amount of arguments the caller provides. 2 In the following example, we have a function that takes two optional positional parameters and . If neither is provided, we start at offset . If just is provided, we start at offset . If both are provided, we can start at offset . (See the jump table debug output: ) Unlike in Python, where default arguments are evaluated at function creation time , Ruby computes the default values at function call time . For this reason, embedding the default code inside the callee makes a lot of sense; we have a full call frame already set up, so any exception handling machinery or profiling or … doesn’t need special treatment. Since the caller knows what arguments it is passing, and often to what function, we can efficiently support this in the JIT. We just need to know what offset in the compiled callee to call into. The interpreter can also call into the compiled function, which just has a stub to do dispatch to the appropriate entry block. This has led us to design the HIR to support multiple function entrypoints . Instead of having just a single entry block, as most control-flow graphs do, each of our functions now has an array of function entries: one for the interpreter, at least one for the JIT, and more for default parameter handling. Each of these entry blocks is separately callable from the outside world. Here is what the (slightly cleaned up) HIR looks like for the above example: If you’re not a fan of text HIR, here is an embedded clickable visualization of HIR thanks to our former intern Aiden porting Firefox’s Iongraph : (You might have to scroll sideways and down and zoom around. Or you can open it in its own window .) Each entry block also comes with block parameters which mirror the function’s parameters. These get passed in (roughly) the System V ABI registers. This is kind of gross. We have to handle these blocks specially in reverse post-order (RPO) graph traversal. And, recently, I ran into an even worse case when trying to implement the Cooper-style “engineered” dominator algorithm: if we walk backwards in block dominators, the walk is not guaranteed to converge. All non-entry blocks are dominated by all entry blocks, which are only dominated by themselves. There is no one “start block”. So what is there to do? Approach 1 is to keep everything as-is, but handle entry blocks specially in the dominator algorithm too. I’m not exactly sure what would be needed, but it seems possible. Most of the existing block infra could be left alone, but it’s not clear how much this would “spread” within the compiler. What else in the future might need to be handled specially? Approach 2 is to synthesize a super-entry block and make it a predecessor of every interpreter and JIT entry block. Inside this approach there are two ways to do it: one ( 2.a ) is to fake it and report some non-existent block. Another ( 2.b ) is to actually make a block and a new instruction that is a quasi-jump instruction. In this approach, we would either need to synthesize fake block arguments for the JIT entry block parameters or add some kind of new instruction that reads the argument i passed in. (suggested by Iain Ireland, as seen in the IBM COBOL compiler) Approach 3 is to duplicate the entire CFG per entrypoint. This would return us to having one entry block per CFG at the expense of code duplication. It handles the problem pretty cleanly but then forces code duplication. I think I want the duplication to be opt-in instead of having it be the only way we support multiple entrypoints. What if it increases memory too much? The specialization probably would make the generated code faster, though. (suggested by Ben Titzer) None of these approaches feel great to me. The probable candidate is 2.b where we have instructions. That gives us flexibility to also later add full specialization without forcing it. Cameron Zwarich also notes that this this is an analogue to the common problem people have when implementing the reverse: postdominators. This is because often functions have multiple return IR instructions. He notes the usual solution is to transform them into branches to a single return instruction. Do you have this problem? What does your compiler do? We use extended basic blocks (EBBs), but this doesn’t matter for this post. It makes dominators and predecessors slightly more complicated (now you have dominating instructions ), but that’s about it as far as I can tell. We’ll see how they fare in the face of more complicated analysis later.  ↩ Keyword parameters have some mix of caller/callee presence checks in the callee because they are passed in un-ordered. The caller handles simple constant defaults whereas the callee handles anything that may raise. Check out Kevin Newton’s awesome overview .  ↩ We use extended basic blocks (EBBs), but this doesn’t matter for this post. It makes dominators and predecessors slightly more complicated (now you have dominating instructions ), but that’s about it as far as I can tell. We’ll see how they fare in the face of more complicated analysis later.  ↩ Keyword parameters have some mix of caller/callee presence checks in the callee because they are passed in un-ordered. The caller handles simple constant defaults whereas the callee handles anything that may raise. Check out Kevin Newton’s awesome overview .  ↩

0 views

3D printing my laptop ergonomic setup

Apparently, one of my hobbies is making updates to my ergonomic setup, then blogging about it from an Amtrak train. I've gone and done it again. My setup stayed static for some time, but my most recent iteration ended up letting me down and I had to change it again. It gave me a lot of useful information and strongly shaped how I approached this iteration. This new one is closest to the first one I wrote about in 2024, but with some major improvements and reproducibility. First things first, though. Why am making I yet more changes to this setup? Besides my constant neurodivergent drive to make things perfect, my setups all kept causing me some problems. In chronological order, here are the problems and neat benefits of each setup I used for at least a few months. So my immediate previous version was heavy and tedious to setup. I had a trip coming up to Brooklyn, so I had to either make something more portable or leave my laptop at home. I decided to take my laptop, and did a design sprint to see if I can make my dream setup. At this point I'll probably be working on this setup forever, but I hope I can stop if I am able to satisfy all my goals at some point. My dream setup has these characteristics: So, you know, it's not like I want a lot out of this setup. It's not like these are kind of a lot to all fit into one thing. I'm sure it'll be a piece of cake. I use OpenSCAD for 3D modeling. It's pretty pleasant, though some things are hard in general (like roundovers and fillets on any more complicated shapes). My design to start is basically one of my previous versions: my split keyboard at adjustable width on a base, and a slot to hold my laptop vertically. I started by measuring important dimensions, like how far apart I wanted my keyboard halves and the dimensions of my laptop. Then I compared these to my 3D printer's print volume, and started working out how I'd have to print it. The rig is wider than my 3D printer, so I had to split it up into parts. The slot would fit as a separate piece if I oriented it diagonally. The base itself would have to be split into two separate halves. To join the halves and the slot, I decided to use dovetail joints. I'm familiar with them from woodworking, and I figured they'd give a strong join here as well. I added the library BOSL2 to generate the dovetails, and these were pretty easy to model in. Then I also made some keyboard mounts, which I attach using a camera tripod mount (the Keyboardio Model 100 has threading for this). This is where I ended up for my initial design. When I printed the first pieces, I ran into a problem. The pieces came out alright, mostly, but there was this wavy defect on the top of it! It ended up being (I think) that the print was not adhering well to the printbed. This was easily solved by washing it with some water and dish soap, then prints started coming out beautifully. The other problem was that the sliders and rails worked too smoothly, and I realized that I'd need to have some way to lock the keyboard in place or it would slide around in a difficult to use way. I punted on this, and printed the whole thing. I knew I'd need another iteration on it for material reasons: I am printing the prototype from PLA, since it's easy to work with, but I wanted to print the final one from PETG for slightly better heat resistance. So, onwards, and with a clean printbed, I was able to make the full first prototype! It was 3 parts which took 2-3.5 hours each to print, for a total print time of under 12 hours. I assembled the pieces and glued them together. At this point I was able to use the setup to work on itself, which was really satisfying. I did need to make the keyboard lock in place for carrying it, but it was fairly stable on my desk at least. Now it was time to make a few tweaks, and print the whole thing in PETG for its heat resistance. I did a few things this iteration: I carved out a honeycomb pattern on the base to reduce weight and filament; I added a nubbin and detentes to the keyboard slider to lock it in place where I want (in 10mm increments); I lengthened the keyboard rails to go further in; and I widened the keyboard slot for a less snug fit. This time is when I met the challenge that is printing with PETG! I dried my filament and started doing some prototyping. I sliced apart chunks of my model to see if things fit together still, since that can change with materials. I also printed a test of my locking clicky mechanism for the keyboard, and good thing: it needed design changes, but the second print worked great (I modified the first with a knife until it fit, then measured the remaining material, and modeled that). Then I printed it. And it came out pretty well! I mean, I had major stringing and bed adherence issues the first time I tried it, but with thorough bed cleaning and a nozzle wipe, it came out cleanly. I had one spot with a minor quality issue, but it's on the bottom and not visible. And it's working out really well! Mostly! The good things here are what make it usable. It is lightweight (about 280 grams), which is comparable to my lightest previous setup but that one fell apart promptly. It seems durable; we'll see over time, but it did survive multiple backpack loadings and a trip to Brooklyn today, where I hauled it around the city with me. And it's pretty fast to deploy: I can put it together in 15 seconds. The keyboard width is very easy to adjust, and it's solidly in place where it won't slide by accident. The laptop screen is at a good height. It's reproducible: others could print it as well, with access to the files. (I'm considering making them open source, but I don't think they're quite ready to share. It needs some iteration first.) And I quite like the way it looks. However, it's not all good. I want to make some changes to it soon, after a break from the long print times and iterations. Here's the list to address: I don't know if addressing those is all feasible, or if it will satisfy my dream setup. But I do know by now that I'll not be done with this for a long, long time. Everyone needs a hobby, apparently this is one of mine. It's been surprisingly rewarding to work on my own ergonomic setup like this. I have made this setup specifically for health reasons: without it, I cannot use a laptop without severe nerve pain, and I rather like being able to work from anywhere. I have a very uncommon setup in that I'm able to use my Keyboardio Model 100 from a train; I've not seen that before. The amazing thing about 3D printers is enabling this kind of solution. I made my previous versions in my workshop out of mostly wood. It took time and iteration was a big challenge. With a 3D printer, it's doable to design it and even send it off to someone else to print. And we can make exactly what we need, at relatively low cost. It's a technology that truly changes things in making custom tailored solutions far more accessible. As far as I know, the main laptops that do this are the Framework 13 and some Lenovo Thinkpads. No Apple laptop does this. It's a big constraint and I haven't been able to design it out of my setup. I'm starting to wonder if the ticket is a headless small form factor computer with a portable monitor. ↩ I am annoyed at this, because it limits my keyboard options and I would love something lighter. Don't get me wrong, I love my Model 100. But I'm uncomfortable relying only on one keyboard from one company. ↩ My first one was difficult to adjust the keyboard width . You had to flip it over and loosen hardware from the bottom. It was also a little heavy . There's a limit to how far I can reduce weight when using a Keyboardio Model 100, but we can get closer. However, this rig was very fast to set up. It also did keep my keyboard at a good width. My second one used hinges made from fabric and hook-and-loop fasteners, which was neat but ultimately it fell apart , it was tedious to adjust , and it took a long time to set up . The big benefit of this setup was that it was extremely light . This was helpful when I was suffering from a lot of fatigue and POTS. My third one had a neat hinging mechanism which was useful for smaller spaces but wasn't much faster to set up . It used a smaller lighter keyboard, but ultimately that keyboard ended up relapsing my nerve pain . My fourth one, not previously written about, was... way too heavy . It was also a little tedious to setup , but the weight was its biggest problem. I made that one from off-the-shelf parts (mostly), with the goal of making something reproducible for others . And it worked with any laptop , not just ones with a 180 degree hinge like mine [1] . But, with how heavy and annoying it was, it's not worth reproducing . relatively lightweight : it's not going to get super light with both a laptop and my keyboard, but I want to minimize the weight beyond those solid mount for my Keyboardio Model 100 : this keyboard is, vexingly [2] , the only keyboard that keeps my nerve pain in remission. I need to use it. good laptop screen height : another problem with laptop use generally is that the screen is usually too low or the keyboard is too high. I want to make sure the screen is at a reasonable height so that I don't wreck my body through poor posture. durability : it needs to be pretty durable since I'm going to use this rig for travel. I don't abuse my laptop or my setup, but it has to stand up to regularly being taken in and out of a bag and being used in random places. It has to stand up to a variety of environmental conditions, too. as easy as opening my laptop : a lot of ergonomic problems stem from ergonomic setups being inconvenient , so if I can reduce that inconvenience, I can reduce the problems easily adjustable keyboard width : I shift around my keyboard position as my body asks for it, and having dynamic positioning helps me feel comfortable. I'd like to be able to do this with little fuss, or else I won't do it (see the previous point). mounting points for accessories : I use an eink tablet to take notes, and would love to be able to put it on a little mount on the rig. I also want to be able to mount USB hubs or the mic I use for Talon. Having options for attaching accessories would make it not just equivalent to a laptop, but far more flexible. reproducible : This setup gets a lot of comments from people, and it solves real problems for me that other people have as well. I want more people to be able to use it. interesting : whenever I take this thing out, I get comments on it. It's how I find other engineers and software folks: most people are all "ignore the lady with the weird rig" but y'all actually strike up conversations with me about it. (If you ever run into me in public, please do talk to me! Even if it looks like I'm working!) I don't want this social benefit to go away! attractive aesthetic : I've been fine using my homebrew wood setups, but they're so obviously homemade and don't look good. My dream is that it would look like it's not homemade, and would simply look like it's how the computer is intended to be used. Replacements for the camera z-mounts : I'd like to 3d print something for this, and it will be the first iteration I make. The z-mounts are over a pound of metal together, so I could bring down the weight a bit more this way. However, it may be not worth it. Add non-slip feet and extra rails on the bottom : I'd like to raise it off the surface it's on a little bit and add some rails on the bottom for a little more rigidity. Make it more rigid : it is a little bit floppy, but not to the point of being distracting when using it. I'd like it to feel a little sturdier, especially if anyone else were going to use it. Add attachment points for accessories : on Friday, someone at Recurse Center saw my coffee perched in the middle and he suggested a cupholder. I'd like that, or mounts for my mic or USB hub or myriad other things. I can use the honeycomb grid for attachment points, if I add those rails/feet on the bottom to raise it all up a little bit. Make it modular and customizable : it only works today if you have a split keyboard with a tripod mount on the bottom of it. So, that's not great for people who don't have the exact same keyboard I do! And if you have other laptops, well, it would need to be adjusted for that. I want to address this before releasing the files. (If you do have the hardware that makes this useful for you today, let me know. I'm happy to help people out with that, I just don't want to do a big public release.) As far as I know, the main laptops that do this are the Framework 13 and some Lenovo Thinkpads. No Apple laptop does this. It's a big constraint and I haven't been able to design it out of my setup. I'm starting to wonder if the ticket is a headless small form factor computer with a portable monitor. ↩ I am annoyed at this, because it limits my keyboard options and I would love something lighter. Don't get me wrong, I love my Model 100. But I'm uncomfortable relying only on one keyboard from one company. ↩

0 views
Phil Eaton 1 months ago

LLMs and your career

The most conservative way to build a career as a software developer is 1) to be practical and effective at problem solving but 2) not to treat all existing code as a black box. 1 means that as a conservative developer you should generally use PostgreSQL or MySQL (or whatever existing database), Rails or .NET (or whatever existing framework), and adapt code from Stack Overflow or LLMs. 2 means that you're curious and work over time to better understand how web servers and databases and operating systems and the browser actually work so that you can make better decisions for your own problems as you adapt other people's code and ideas. Zooming out, coding via LLM is not fundamentally different from coding with Rails or coding by perusing Stack Overflow. It's faster and more direct but it's still potentially just a human mindlessly adapting existing code. The people who were only willing to look at existing frameworks and libraries and applications as black boxes were already not the most competitive when it came to finding and retaining work. And on the other hand, the most technically interesting companies always wanted to hire developers who understood fundamentals because they're 1) operating at such a scale that the way the application is written matters or they're 2) building PostgreSQL or MySQL or Rails or .NET or Stack Overflow or LLMs, etc. The march of software has always been to reduce the need for (ever larger sizes of) SMBs (and teams within non-SMBs) to hire developers to solve problems or increase productivity. LLMs are part of that march. That doesn't change that at some point companies (or teams) need to hire developers because the business or its customer base has become too complex or too large. The jobs that were dependent on fundamentals of software aren't going to stop being dependent on fundamentals of software. And if more non-developers are using LLMs it's going to mean all the more stress on tools and applications and systems that rely on fundamentals of software. All of this is to say that if you like doing software development, I don't think interesting software development jobs are going to go away. So keep learning and keep building compilers and databases and operating systems and keep looking for companies that have compiler and database and operating system products, or companies with other sorts of interesting problems where fundamentals matter due to their scale. LLMs and your career pic.twitter.com/lxu1HLF2LC

1 views
Manuel Moreale 1 months ago

Web, Social Networks, Social Web

The other day, a podcast episode caught my attention. It was titled “Can We Build a Better Social Network”, and it was a collaboration between Hard Fork and Search Engine. I thought it was just a discussion about the state of social networks, but then I read the description of the episode: Over the past year, we've been working with the podcast "Search Engine" on a project that reimagines what the internet can be. What if instead of rage-baiting, a social platform incentivized friendly interaction and good-faith discussion? Today, we're bringing "Hard Fork" listeners an episode we made with the "Search Engine" team called "The Fediverse Experiment", where we end up creating our own social media platform. A year of work? Creating a social media platform? Reimagining the internet? Sounds ambitious, and also very interesting. As you probably know, calling me a skeptic of social media would be an understatement, but I’m still very much intrigued by people who want to try different approaches, and so I started listening. Not even 5 minutes in, the conversation was already off the rails, and they were saying things that made absolutely no sense. «So the fediverse is a way for people to take back the internet for themselves.» I’m sorry what? «It's a way to have a identity and connect to other things that are important to you online and just not worry about having to fight through a Google algorithm or a Facebook algorithm. In fact, you could bring your own algorithm if you want to. I'm already doing such a bad job of explaining what the Fedverse is.» Ok at least they were aware that it was an awful explanation. The first interesting bit of the podcast is at around 7 minutes, where they say something I find so infuriatingly wrong that I was about to stop listening. The story these people told me went like this. Basically all of them, as different as they were from one another, had a shared view of what had gone wrong with our internet. The way they saw it in the nineties, even in the early two thousands, our internet had truly been an open place. Infinite websites, infinite message boards populated by all sorts of people with all sorts of values, free to live how they wanted in the little neighborhoods they'd made. If you wanted to move homes on that internet, say switch your email from Yahoo to Gmail, it was mildly annoying, but not a huge deal. So far, so good. But then social media arrived. To access those platforms, you usually needed a dedicated account. Once you started posting on that account, you were now in a game to build as large a following as possible. Already, the fuck? First, even to access earlier platforms, you needed a dedicated account. Heck, you needed accounts for everything. Forums, message boards, you name it. Also, «Once you started posting on that account, you were now in a game to build as large a following as possible» ? Says who? This is what social media became over time, sure, but social media didn’t start this way, and in the early days, it sure wasn’t only a matter of amassing an audience. But the architects of the Fediverse, they had a more radical idea. The vision they held was that they could take control of social media out of the hands of the Musks and Zuckerbergs and reroute it back towards more open internet where no mogul would ever have the same kind of power they do now. Did you spot the shift? We started with “our internet had truly been an open place”, and now we’re trying to take back control of social media. I don’t know about you, but to me, the internet ≠ social media. Wild take, I know. Anyway, they then embark on this journey of, their words not mine, «finish building the fediverse» and I can only hope it was said jokingly. The whole episode is a wild ride if you know anything about these topics, and the very underwhelming outcome of all this is that what they built was…a Mastodon instance. And they’re not even self-hosting it. What they “built” is a Mastodon instance hosted by masto.host and, of course, since this is 2026, they had to use AI somehow to do it. Sigh… If the episode was titled “We have set up a Mastodon server”, I’d not have bothered listening to it. That said, listening to the episode made me realize how some people have a very narrow view of what the internet is and can be from a social interaction standpoint. Imagine a social platform that’s not controlled by a single billionaire. A platform that’s not powered by a closed-source algorithm. Usernames are unique , the underlying protocol powering it is flexible and very robust. Your profile page is infinitely customizable, and no two profiles need to look the same. It supports DMs and chats . A platform where you can post videos, photos, audio, 3D content, you name it, and where you can follow other people’s pages and be sure that no algorithm will hide that content from you. A platform that's not censored or moderated by arbitrary rules set by a Silicon Valley billionaire. How good does that sound to you? Because to me, a platform like that looks like a dream, if only we could figure out a way to build it. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
André Arko 1 months ago

Announcing <code>rv clean-install</code>

Originally posted on the Spinel blog . As part of our quest to build a fast Ruby project tool , we’ve been hard at work on the next step of project management: installing gems. As we’ve learned over the last 15 years of working on Bundler and RubyGems, package managers are really complicated! It’s too much to try to copy all of rbenv, and ruby-build, and RubyGems, and Bundler, all at the same time. Since we can’t ship everything at once, we spent some time discussing the first project management feature we should add after Ruby versions. Inspired by and , we decided to build . Today, we’re releasing the command as part of version 0.4. So, what is a clean install? In this case, clean means “from a clean slate”. You can use to install the packages your project needs after a fresh checkout, or before running your tests in CI. It’s useful by itself, and it’s also concrete step towards managing a project and its dependencies. Even better, it lays a lot of the groundwork for future gem management functionality, including downloading, caching, and unpacking gems, compiling native gem extensions, and providing libraries that can be loaded by Bundler at runtime. While we don’t (yet!) handle adding, removing, or updating gem versions, we’re extremely proud of the progress that we’ve made, and we’re looking forward to improving based on your feedback. Try running today, and see how it goes. Is it fast? Slow? Are there errors? What do you want to see next? Let us know what you think .

0 views
Tenderlove Making 1 months ago

Pixoo64 Ruby Client

I bought a Pixoo64 LED Display to play around with, and I love it! It connects to WiFi and has an on-board HTTP API so you can program it. I made a Ruby client for it that even includes code to convert PNG files to the binary format the sign wants. One cool thing is that the display can be configured to fetch data from a remote server, so I configured mine to fetch PM2.5 and CO2 data for my office. Here’s what it’s looking like so far: Yes, this is how I discovered I need to open a window 😂

0 views
Rodney Brooks 1 months ago

Predictions Scorecard, 2026 January 01

Nothing is ever as good as it first seems and nothing is ever as bad as it first seems. — A best memory paraphrase of advice given to me by Vice Admiral Joe Dyer, former chief test pilot of the US Navy and former Commander of NAVAIR. [You can follow me on social media: @rodneyabrooks.bsky.social and see my publications etc., at https://people.csail.mit.edu/brooks ] This is my eighth annual update on how my dated predictions from January 1 st , 2018 concerning (1) self driving cars , (2) robotics, AI , and machine learning , and (3) human space travel , have held up. I promised then to review them at the start of the year every year until 2050 (right after my 95 th birthday), thirty two years in total. The idea was to hold myself accountable for those predictions. How right or wrong was I? The summary is that my predictions held up pretty well, though overall I was a little too optimistic. That is a little ironic, as I think that many people who read my predictions back on  January 1 st , 2018 thought that I was very pessimistic compared to the then zeitgeist. I prefer to think of myself as being a realist. And did I see LLMs coming? No and yes. Yes, I did say that something new and big that everyone accepted as the new and big thing in AI would come along no earlier than 2023, and that the key paper for it’s success had already been written by before I made my first predictions. And indeed LLMs were generally accepted as the next big thing in 2023 (I was lucky on that date), and the key paper, Attention Is All You Need , was indeed already written, and had first appeared in June of 2017. I wrote about this extensively in last year’s scorecard . But no, I had no idea it would be LLMs at the time of my correct prediction that something big would appear. And that lack of specificity on the details of exactly what will be invented and when is the case with all my predictions from the first day of 2018. I did not claim to be clairvoyant about exactly what would happen, rather I was making predictions about the speed of new research ideas, the speed of hype generation, the speed of large scale deployments of new technologies, and the speed of fundamental changes propagating through the world’s economy. Those speeds are very different and driven by very different realities. I think that many people get confused by that and make the mistake of jumping between those domains of reality, thinking all the speeds will be the same.  In my case my estimates of those speeds are informed by watching AI and robotics professionally, for 42 years at the time of my predictions. I became a graduate student in Artificial Intelligence in January of 1976, just shy of 20 years after the initial public outing of the term Artificial Intelligence at the summer workshop in 1956 at Dartmouth. And now as of today I have been in that field for 50 years. I promised to track my predictions made eight years ago today for 32 years. So I am one quarter of the way there. But the density of specific years of events or marking percentages of adoption that I predicted start to fall off right around now. Sometime during 2026 I will bundle up all my comments over the eight years specifically mentioning years that have now passed, and put them in an archival mid-year post. Then I will get rid of the three big long tables that dominate the body of this annual post, and have short updates on the sparse dates for the next 24 years. I will continue to summarize what has happened in self-driving cars generally, including electrification progress and the forever promised flying cars, along with AI and robotics, and human space flight. But early in 2025 I made five new predictions for the coming ten years, without specific dates, but which summarize what I think will happen.  I will track these predictions too. What I Nearly Got Wrong The day before my original prediction post in 2018 the price of Bitcoin had opened at $12,897.70 and topped out at $14,377.40 and 2017 had been the first year it had ever traded at over $1,000. The price seemed insane to me as Bitcoin wasn’t being used for the task for which it had been designed. The price seemed to me then, and now, to be purely about speculation. I almost predicted when it would be priced at $200, on the way down. But, fortunately, I checked myself as I realized that the then current state of the market made no sense to me and so any future state may not either. Besides, I had no experience or expertise in crypto pricing. So I left that prediction out. I had no basis to make a prediction. That was a wise decision, and I revisit that reasoning as I make new predictions now, and implore myself to only make predictions in fields where I know something. What Has Surprised Me, And That I Missed 8 Years Ago I made some predictions about the future of SpaceX although I didn’t always label them as being about SpaceX. A number of my predictions were in response to pronouncements by the CEO of SpaceX. My predictions were much more measured and some might say even pessimistic. Those predictions so far have turned out to be more optimistic than how reality has unfolded. I had made no specific predictions about Falcon 9, though I did make predictions about the subsequent SpaceX launch family, now called Starship, but then known as BFR, which eight years later has not gotten into orbit. In the meantime SpaceX has scaled the Falcon 9 launch rate at a phenomenal speed, and the magnitude of the growth is very surprising. Eight years ago, Falcon 9 had been launched 46 times, all successful, over the previous eight years, and it had recently had a long run of successful landings of the booster whenever attempted. At that time five launches had been on a previously used booster, but there had been no attempts to launch Falcon Heavy with its three boosters strapped together. Now we are eight years on from those first eight years of Falcon 9 launches. The scale and success rate of the launches has made each individual launch an unremarkable event, with humans being launched a handful of times per year. Now the Falcon 9 score card stands at 582 launches with only one failed booster, and there have been 11 launches of the three booster Falcon Heavy, all successful. That is a sustained growth rate of 38% year over year for eight years. And that it is a very high sustained deployment growth rate for any complex technology. There is no other modern rocket with such a volume of launches that comes even close to the Falcon 9 record.  And I certainly did not foresee this volume of launches. About half the launches have had SpaceX itself as the customer, starting in February 2018, launching an enormous satellite constellation (about two thirds of all satellites ever orbited) to support Starlink bringing internet to everywhere on the surface of Earth. But… there is one historical rocket, a suborbital one which has a much higher record of use than Falcon 9 over a much briefer period. The German V-2 was the first rocket to fly above the atmosphere and the first ballistic missile to be used to deliver bombs. It was fueled with ethanol and liquid oxygen, and was steered by an analog computer that also received inputs from radio guide signals–it was the first operational liquid fueled rocket. It was developed in Germany in the early 1940’s and after more than a thousand test launches was first put into operation on September 7 th , 1944, landing a bomb on Paris less than two weeks after the Allied liberation of that city. In the remaining 8 months of the war 3,172 armed V-2 rockets were launched at targets in five countries — 1,358 were targeted at London alone. My Color Scheme and Past Analysis The acronyms I used for predictions in my original post were as follows. NET year means it will not happen before that year (No Earlier Than) BY year means I predict that it will happen by that year. NIML , Not In My Lifetime, i.e., not before 2050. As time passes mentioned years I color then as accurate , too pessimistic , or  too optimistic . Last year I added hemming and hawing . This is for when something looks just like what I said would take a lot longer has happened, but the underlying achievement is not what everyone expected, and is not what was delivered. This is mostly for things that were talked about as being likely to happen with no human intervention and it now appears to happen that way, but in reality there are humans in the loop that the companies never disclose. So the technology that was promised to be delivered hasn’t actually been delivered but everyone thinks it has been. When I quote myself I do so in orange , and when I quote others I do so in blue . I have not changed any of the text of the first three columns of the prediction tables since their publication on the first day of 2018. I only change the text in the fourth column to say what actually happened.  This meant that by four years ago that fourth column was getting very long and skinny, so I removed them and started with fresh comments two years ago. I have kept the last two year’s comments and added new ones, with yellow backgrounds, for this year, removing the yellow backgrounds from 2025 comments that were there last year. If you want to see the previous five years of comments you can go back to   the 2023 scorecard . On March 26 th I skeeted out five technology predictions, talking about developments over the next ten years through January 1st, 2036. Three weeks later I included them in a blog post . Here they are again. 1. Quantum Computers . The successful ones will emulate physical systems directly for specialized classes of problems rather than translating conventional general computation into quantum hardware. Think of them as 21st century analog computers. Impact will be on materials and physics computations. 2. Self Driving Cars . In the US the players that will determine whether self driving cars are successful or abandoned are #1 Waymo (Google) and #2 Zoox (Amazon). No one else matters. The key metric will be human intervention rate as that will determine profitability. 3. Humanoid Robots . Deployable dexterity will remain pathetic compared to human hands beyond 2036. Without new types of mechanical systems walking humanoids will remain too unsafe to be in close proximity to real humans. 4. Neural Computation . There will be small and impactful academic forays into neuralish systems that are well beyond the linear threshold systems, developed by 1960, that are the foundation of recent successes. Clear winners will not yet emerge by 2036 but there will be multiple candidates. 5. LLMs . LLMs that can explain which data led to what outputs will be key to non annoying/dangerous/stupid deployments. They will be surrounded by lots of mechanism to keep them boxed in, and those mechanisms, not yet invented for most applications, will be where the arms races occur. These five predictions are specifically about what will happen in these five fields during the ten years from 2026 through 2035, inclusive. They are not saying when particular things will happen, rather they are saying whether or not  certain things will happen in that decade. I will do my initial analysis of these five new predictions immediately below. For the next ten years I will expand on each of these reviews in this annual scorecard, along with reviews of my earlier predictions. The ten years for these predictions are up on January 1 st , 2036. I will have just turned 81 years old then, so let’s see if I am still coherent enough to do this. Quantum Computers The successful ones will emulate physical systems directly for specialized classes of problems rather than translating conventional general computation into quantum hardware. Think of them as 21st century analog computers. Impact will be on materials and physics computations. The original excitement about quantum computers was stimulated by a paper by Peter Shor in 1994 which gave a digital quantum algorithm to factor large integers much faster than a conventional digital computer. Factoring integers is often referred to as “the IFP” for the integer factorization problem . So what? The excitement around this was based on how modern cryptography, which provides our basic security for on-line commerce, works under the hood. Much of the internet’s security is based on it being hard to factor a large number. For instance in the RSA algorithm Alice tells everyone a large number (in different practical versions it has 1024, 2048, or 4096 bits) for which she knows its prime factors. But she tells people only the number not its factors. In fact she chose that number by multiplying together some very large prime numbers — very large prime numbers are fairly easy to generate (using the Miller-Rabin test). Anyone, usually known as Bob, can then use that number to encrypt a message intended for Alice. No one, neither Tom, Dick, nor Harry, can decrypt that message unless they can find the prime factors of Alice’s public number. But Alice knows them and can read the message intended only for her eyes. So… if you could find prime factors of large numbers easily then the backbone of digital security would be broken. Much excitement! Shor produced his algorithm in 1994. By the year 2001 a group at IBM had managed to find the prime factors of the number 15 using a digital quantum computer as published in Nature . All the prime factors. Both 3 and 5. Notice that 15 has only four bits, which is a lot smaller than the number of bits used in commercial RSA implementations, namely 1024, 2048, or 4096. Surely things got better fast.  By late 2024 the biggest numbers that had been factored by an actual digital quantum computer had 35 bits which allows for numbers no bigger than 34,359,738,367. That is way smaller than the size of the smallest numbers used in RSA applications. Nevertheless it does represent 31 doublings in magnitude of numbers factored in 23 years, so progress has been quite exponential. But it could take another 500 years of that particular version of exponential growth rate to get to conquering today’s smallest version of RSA digital security. In the same report the authors say that a conventional, but very large computer (2,000 GPUs along with a JUWELS booster , which itself has 936 compute nodes each consisting of four NVIDIA A100 Tensor Core GPUs themselves each hosted by 48 dual threaded AMD EPYC Rome cores–that is quite a box of computing) simulating a quantum computer running Shor’s algorithm had factored a 39 bit number finding that 549,755,813,701 = 712,321 × 771,781, the product of two 20 bit prime numbers. That was its limit. Nevertheless, an actual digital quantum computer can still be outclassed by one simulated on conventional digital hardware. The other early big excitement for digital quantum computers was Grover’s search algorithm, but work on that has not been as successful as for Shor’s IFP solution. Digital quantum computation nirvana has not yet been demonstrated. Digital quantum computers work a little like regular digital computers in that there is a control mechanism which drives the computer through a series of discrete steps. But today’s digital quantum computers suffer from accumulating errors in quantum bits. Shor’s algorithm assumes no such errors. There are techniques for correcting those errors but they slow things down and cause other problems. One way that digital quantum computers may get better is if new methods of error correction emerge. I am doubtful that something new will emerge, get fully tested, and then make it into production at scale all within the next ten years. So we may not see a quantum (ahem) leap in performance of quantum digital computers in the next decade. Analog quantum computers are another matter. They are not switched, but instead are configured to directly simulate some physical system and the quantum evolution and interactions of components of that system. They are an embodied quantum model of that system. And they are ideally suited to solving these sorts of problems and cannot be emulated by conventional digital systems as they can be in the 39 bit number case above. I find people working on quantum computers are often a little squirrelly about whether their computer acts more like a digital or analog computer, as they like to say they are “quantum” only.  The winners over the next 10 years will be ones solving real problems in materials science and other aspects of chemistry and physics. Self Driving Cars In the US the players that will determine whether self driving cars are successful or abandoned are #1 Waymo (Google) and #2 Zoox (Amazon). No one else matters. The key metric will be human intervention rate as that will determine profitability. Originally the term “self driving car” was about any sort of car that could operate without a driver on board, and without a remote driver offering control inputs. Originally they were envisioned as an option for privately owned vehicles used by individuals, a family car where no person needed to drive, but simply communicated to the car where it should take them. That conception is no longer what people think of when self driving cars are mentioned. Self driving cars today refer to taxi-services that feel like Uber or Lyft, but for which there is not a  human driver, just paying passengers. In the US the companies that have led in this endeavor have changed over time. The first leader was Cruise, owned by GM. They were the first to have a regular service in the downtown area of a major city (San Francisco), and then in a number of other cities, where there was an app that anyone could download and then use their service. They were not entirely forthcoming with operational and safety problems, including when they dragged a person, who had just been hit by a conventionally driven car, for tens of feet under one of their vehicles. GM suspended operations in late 2023 and completely disbanded it in December 2024. Since then Waymo (owned by Google) has been the indisputable leading deployed service. Zoox (owned by Amazon) has been a very distant, but operational, second place. Tesla (owned by Tesla) has put on a facade of being operational, but it is not operational in the sense of the other two services, and faces regulatory headwinds that both Waymo and Zoox have long been able to satisfy. They are not on a path to becoming a real service. See my traditional section on self driving cars below, as it explains in great detail the rationale for these evaluations. In short, Waymo looks to have a shot at succeeding and it is unlikely they will lose first place in this race. Zoox may also cross the finish line, and it is very unlikely that anyone will beat them.  So if both of Waymo and Zoox fail, for whatever reason, the whole endeavor will grind to a halt in the US. But what might go wrong that makes one of these companies fail. We got a little insight into that in the last two weeks of 2025. On Saturday December 20 th of 2025 there was an extended power outage in San Francisco that started small in the late morning but by nightfall had spread to large swaths of the city.  And lots and lots of normally busy intersections were by that time blocked by many stationary Waymos. Traffic regulations in San Francisco are that when there is an intersection which has traffic lights that are all dark, that intersection should be treated as though it has stop signs at every entrance. Human drivers who don’t know the actual regulation tend to fall back to that behavior in any case. It seemed that Waymos were waiting indefinitely for green lights that never came, and at intersections through which many Waymos were routed there were soon enough waiting Waymos that the intersections were blocked.  Three days later, on December 23 rd , Waymo issued an explanation on their blog site , which includes the following: Navigating an event of this magnitude presented a unique challenge for autonomous technology. While the Waymo Driver is designed to handle dark traffic signals as four-way stops, it may occasionally request a confirmation check to ensure it makes the safest choice. While we successfully traversed more than 7,000 dark signals on Saturday, the outage created a concentrated spike in these requests. This created a backlog that, in some cases, led to response delays contributing to congestion on already-overwhelmed streets. We established these confirmation protocols out of an abundance of caution during our early deployment, and we are now refining them to match our current scale. While this strategy was effective during smaller outages, we are now implementing fleet-wide updates that provide the Driver with specific power outage context, allowing it to navigate more decisively. As the outage persisted and City officials urged residents to stay off the streets to prioritize first responders, we temporarily paused our service in the area. We directed our fleet to pull over and park appropriately so we could return vehicles to our depots in waves. This ensured we did not further add to the congestion or obstruct emergency vehicles during the peak of the recovery effort. The key phrase is that Waymos “request a confirmation check” at dark signals. This means that the cars were asking for a human to look at images from their cameras and manually tell them how to behave. With 7,000 dark signals and perhaps a 1,000 vehicles on the road, Waymo clearly did not have enough humans on duty to handle the volume of requests that were coming in. Waymo does not disclose whether any human noticed a rise in these incidents early in the day and more human staff were called in, or whether they simply did not have enough employees to make handling them all possible. At a deeper level it looks like they had a debugging feature in their code, and not enough people to supply real time support to handle the implications of that debugging feature. And it looks like Waymo is going to remove that debugging safety feature as a way of solving the problem.  This is not an uncommon sort of engineering failure during early testing. Normally one would hope that the need for that debugging feature had been resolved before large scale deployment. But, who are these human staff?  Besides those in Waymo control centers, it turns out there is a gig-work operation with an app named Honk  (the headline of the story is When robot taxis get stuck, a secret army of humans comes to the rescue ) whereby Waymo pays people around $20 to do minor fixups to stuck Waymos by, for instance, going and physically closing a door that q customer left open. Tow truck operators use the same app to find Waymos that need towing because of some more serious problem. It is not clear whether it was a shortage of those gig workers, or a shortage of people in the Waymo remote operations center that caused the large scale failures.  But it is worth noting that current generation Waymos need a lot of human help to operate as they do, from people in the remote operations center to intervene and provide human advice for when something goes wrong, to Honk gig-workers scampering around the city physically fixed problems with the vehicles, to people to clean the cars and plug them in to recharge when they return to their home base. For human operated ride services, traditional taxi companies or gig services such as Uber and Lyft, do not need these external services. There is a human with the car at all times who takes care of these things. The large scale failure on the 20 th did get people riled up about these robots causing large scale traffic snarls, and made them wonder about whether the same thing will happen when the next big earthquake hits San Francisco. Will the human support worker strategy be stymied by both other infrastructure failures (e.g., the cellular network necessary for Honk workers to communicate) or the self preservation needs of the human workers themselves? The Waymo blog post revealed another piece of strategy. This is one of three things they said that they would do to alleviate the problems: Expanding our first responder engagement: To date, we’ve trained more than 25,000 first responders in the U.S. and around the world on how to interact with Waymo. As we discover learnings from this and other widespread events, we’ll continue updating our first responder training. The idea is to add more responsibility to police and fire fighters to fix the inadequacies of the partial-only autonomy strategy for Waymo’s business model. Those same first responders will have more than enough on their plates during any natural disasters. Will it become a political issue where the self-driving taxi companies are taxed enough to provide more first responders? Will those costs ruin their business model? Will residents just get so angry that they take political action to shut down such ride services? Humanoid Robots Deployable dexterity will remain pathetic compared to human hands beyond 2036. Without new types of mechanical systems walking humanoids will remain too unsafe to be in close proximity to real humans. Despite this prediction it is worth noting that there is a long distance between current deployed dexterity and dexterity that is still pathetic. In the next ten years deployable dexterity may improve markedly, but not in the way the current hype for humanoid robots suggests.  I talk about his below in my annual section scoring my 2018 predictions on robotics, AI, and machine learning, in a section titled Dexterous Hands . Towards the end of 2025 I published a long blog post summarizing the status of, and problems remaining for  humanoid robots . I started building humanoid robots in my research group at MIT in 1992. My previous company, Rethink Robotics, founded in 2008, delivered thousands of upper body Baxter and Sawyer humanoid robots (built in the US) to factories between 2012 and 2018.  At the top of this blog page you can see a whole row of Baxter robots in China. A Sawyer robot that had operated in a factory in Oregon just  got shut down in late 2025 with 35,236 hours on its operations clock. You can still find many of Rethink’s humanoids in use in teaching and research labs around the world. Here is the cover of Science Robotics from November 2025, showing a Sawyer used in the research for   this article  out of Imperial College, London. Here is a slide from a 1998 powerpoint deck that I was using in my talks, six years after my graduate students and I had started building our first humanoid robot, Cog. It is pretty much the sales pitch that today’s humanoid companies use.  You are seeing here my version from almost twenty eight years ago. I point this out to demonstrate that I am not at all new to humanoid robotics and have worked on them for decades in both academia and in producing and selling humanoid robots that were deployed at scale (which no one else has done) doing real work. My blog post from September, details why the current learning based approaches to getting dexterous manipulation will not get there anytime soon. I argue that the players are (a) collecting the wrong data and (b) trying to learn the wrong thing. I also give an argument (c) for why learning might not be the right approach. My argument for (c) may not hold up, but I am confident that I am right on both (a) and (b), at least for the next ten years. I also outline in that blog post why the current (and indeed pretty much the only, for the last forty years) method of building bipeds and controlling them will remain unsafe for humans to be nearby. I pointed out that the danger is roughly cubicly proportional to the weight of the robot. Many humanoid robot manufacturers are introducing lightweight robots, so I think they have come to the same conclusion. But the side effect is that the robots can not carry much payload, and certainly can’t provide physical support to elderly humans, which is a thing that human carers do constantly — these small robots are just not strong enough. And elder care and in home care is one of the main arguments for having human shaped robots, adapted to the messy living environments of actual humans. Given that careful analysis from September I do not share the hype that surrounds humanoid robotics today. Some of it is downright delusional across many different levels. To believe the promises of many CEOs of humanoid companies you have to accept the following conjunction. The declarations being made about humanoid robots are just not plausible. We’ll see what actually happens over the next ten years, but it does seem that the fever is starting to crack at the edges. Here are two news stories from the last few days of 2025. From The Information on December 22 nd there is a story about how humanoid robot companies are wrestling with safety standards . All industrial and warehouse robots, whether stationary of mobile have a big red safety stop button, in order to comply with regulatory safety standards. The button cuts the power to the motors. But cutting power to the motors of a balancing robot might make them fall over and cause more danger and damage to people nearby.  For the upper torso humanoid robots Baxter and Sawyer from my company Rethink Robotics we too had a safety stop button that cut power to all the motors in the arms. It was a collaborative robot and often a person, or part of their limbs or body could be under an arm and it would have been dangerous for the arms to fall quickly on cutoff of power. To counter this we developed a unique circuit that required no active power, which made it so that the back current generated by a motor when powered off acted as a very strong brake. Perhaps there are similar possible solutions for humanoid robots and falling, but they need to be invented yet. On December 25 th the Wall Street Journal had a story headlined “Even the Companies Making Humanoid Robots Think They’re Overhyped” , with a lede of “Despite billions in investment, startups say their androids mostly aren’t useful for industrial or domestic work yet” . Here are the first two paragraphs of the story: Billions of dollars are flowing into humanoid robot startups, as investors bet that the industry will soon put humanlike machines in warehouses, factories and our living rooms. Many leaders of those companies would like to temper those expectations. For all the recent advances in the field, humanoid robots, they say, have been overhyped and face daunting technical challenges before they move from science experiments to a replacement for human workers. And then they go on to quote various company leaders: “We’ve been trying to figure out how do we not just make a humanoid robot, but also make a humanoid robot that does useful work,” said Pras Velagapudi, chief technology officer at Agility Robotics. Then talking about a recent humanoid robotics industry event the story says: On stage at the summit, one startup founder after another sought to tamp down the hype around humanoid robots. “There’s a lot of great technological work happening, a lot of great talent working on these, but they are not yet well defined products,” said Kaan Dogrusoz, a former Apple engineer and CEO of Weave Robotics. Today’s humanoid robots are the right idea, but the technology isn’t up to the premise, Dogrusoz said. He compared it to Apple’s most infamous product failure, the Newton hand-held computer. There are more quotes from other company leaders all pointing out the difficulties in making real products that do useful work. Reality seems to be setting in as promised delivery dates come and go by. Meanwhile here is what I said at the end of my September blog post about humanoid robots and teaching them dexterity.  I am not at all negative about a great future for robots, and in the nearish term. It is just that I completely disagree with the hype arguing that building robots with humanoid form magically will make robots useful and deployable. These particular paragraphs followed where I had described there, as I do again in this blog post, how the meaning of self driving cars has drifted over time. Following that pattern, what it means to be a humanoid robot  will change over time. Before too long (and we already start to see this) humanoid robots will get wheels for feet, at first two, and later maybe more, with nothing that any longer really resembles human legs in gross form.  But they will still be called humanoid robots . Then there will be versions which variously have one, two, and three arms. Some of those arms will have five fingered hands, but a lot will have two fingered parallel jaw grippers. Some may have suction cups. But they will still be called humanoid robots . Then there will be versions which have a lot of sensors that are not passive cameras, and so they will have eyes that see with active light, or in non-human frequency ranges, and they may have eyes in their hands, and even eyes looking down from near their crotch to see the ground so that they can locomote better over uneven surfaces. But they will still be called humanoid robots . There will be many, many robots with different forms for different specialized jobs that humans can do. But they will all still be called humanoid robots . As with self driving cars, most of the early players in humanoid robots, will quietly shut up shop and disappear. Those that remain will pivot and redefine what they are doing, without renaming it, to something more achievable and with, finally, plausible business cases. The world will slowly shift, but never fast enough to need a change of name from humanoid robots. But make no mistake, the successful humanoid robots of tomorrow will be very different from those being hyped today. Neural Computation There will be small and impactful academic forays into neuralish systems that are well beyond the linear threshold systems, developed by 1960, that are the foundation of recent successes. Clear winners will not yet emerge by 2036 but there will be multiple candidates. Current machine learning techniques are largely based on having millions, and more recently tens (to hundreds?) of billions, of linear threshold units. They look like this. Each of these units have a fixed number of inputs, where some numerical value comes in, and it is multiplied by a weight, usually a floating point number, and the results of all of the multiplications are summed, along with an adjustable threshold , which is usually negative, and then the sum goes through some sort of squishing function to produce a number between zero and one, or in this case minus one and plus one, as the output. In this diagram, which, by the way is taken from Bernie Widrow’s technical report from 1960, the output value is either minus one or plus one, but in modern systems it is often a number from anywhere in that, or another, continuous interval. This was based on previous work, including that of Warren McCulloch and Walter Pitts’ 1943 formal model of a neuron, Marvin Minsky’s 1954 Ph.D. dissertation  on using reinforcement for learning in a machine based on model neurons, and Frank Rosenblatt’s 1957  use of weights  (see page 10) in an analog implementation of a neural model. These are what current learning mechanisms have at their core. These! A model of  biological neurons that was developed in a brief moment of time from 83 to 65 years ago.  We use these today.  They are extraordinarily primitive models of neurons compared to what neuroscience has learned in the subsequent sixty five years. Since the 1960s higher levels of organization have been wrapped around these units. In 1979 Kunihiko Fukushima published (at the International Joint Conference on Artificial Intelligence, IJCAI 1979, Tokyo — coincidentally the first place where I published in an international venue) his first English language description of convolutional neural networks ( CNN s), which allowed for position invariant recognition of shapes (in his case, hand written digits), without having to learn about those shapes in every position within images. Then came backpropagation , a method where a network can be told the correct output it should have produced, and by propagating the error backwards through the derivative of the quantizer in the diagram above (note that the quantizer shown there is not differentiable–a continuous differentiable quantizer function is needed to make the algorithm work), a network can be trained on examples of what it should produce. The details of this algorithm, are rooted in the chain rule of Gottfried Leibniz in 1676 through a series of modern workers from around 1970 through about 1982. Frank Rosenblatt (see above) had talked about a “back-propagating error correction” in 1962, but did not know how to implement it. In any case, the linear threshold neurons, CNNs, and backpropagation are the basis of modern neural networks. After an additional 30 years of slow but steady progress they burst upon the scene as deep learning , and unexpectedly crushed many other approaches to computer vision — the research field of getting computers to interpret the contents of an image. Note that “deep” learning refers to there being lots of layers (around 12 layers in 2012) of linear threshold neurons rather than the smaller number of layers (typically two or three) that had been used previously. Now LLMs are built on top of these sorts of networks with many more layers, and many subnetworks.  This is what got everyone excited about Artificial Intelligence, after 65 years of constant development of the field. Despite their successes with language, LLMs come with some serious problems of a purely implementation nature. First , the amount of examples that need to be shown to a network to learn to be facile in language takes up enormous amounts of computation, so the that costs of training new versions of such networks is now measured in the billions of dollars, consuming an amount of electrical power that requires major new investments in electrical generation, and the building of massive data centers full of millions of the most expensive CPU/GPU chips available. Second , the number of adjustable weights shown in the figure are counted in the hundreds of billions meaning they occupy over a terabyte of storage. RAM that is that big is incredibly expensive, so the models can not be used on phones or even lower cost embedded chips in edge devices, such as point of sale terminals or robots. These two drawbacks mean there is an incredible financial incentive to invent replacements for each of (1) our humble single neuron models that are close to seventy  years old, (2) the way they are organized into networks, and (3) the learning methods that are used. That is why I predict that there will be lots of explorations of new methods to replace our current neural computing mechanisms. They have already started and next year I will summarize some of them. The economic argument for them is compelling. How long they will take to move from initial laboratory explorations to viable scalable solutions is much longer than everyone assumes. My prediction is there will be lots of interesting demonstrations but that ten years is too small a time period for a clear winner to emerge. And it will take much much longer for the current approaches to be displaced. But plenty of researchers will be hungry to do so. LLMs that can explain which data led to what outputs will be key to non annoying/dangerous/stupid deployments. They will be surrounded by lots of mechanism to keep them boxed in, and those mechanisms, not yet invented for most applications, will be where the arms races occur. The one thing we have all learned, or should have learned, is that the underlying mechanism for Large Language Models does not answer questions directly. Instead, it gives something that sounds like an answer to the question. That is very different from saying something that is accurate. What they have learned is not facts about the world but instead a probability distribution of what word is most likely to come next given the question and the words so far produced in response. Thus the results of using them, uncaged, is lots and lots of confabulations that sound like real things, whether they are or not. We have seen all sorts of stories about lawyers using LLMs to write their briefs, judges using them to write their opinions, where the LLMs have simply made up precedents and fake citations (that sound plausible) for those precedents. And there are lesser offenses that are still annoying but time consuming. The first time I used ChatGPT was when I was retargeting the backend of a dynamic compiler that I had used on half a dozen architectures and operating systems over a thirty year period, and wanted to move it to the then new Apple M1 chips. The old methods of changing a chunk of freshly compiled binary from data as it was spit out by the compiler, into executable program, no longer worked, deliberately so as part of Apple’s improved security measures. ChatGPT gave me detailed instructions on what library calls to use, what their arguments were, etc. The names looked completely consistent with other calls I knew within the Apple OS interfaces. When I tried to use them from C, the C linker complained they didn’t exist. And then when I asked ChatGPT to show me the documentation it groveled that indeed they did not exist and apologized. So we all know we need guard rails around LLMs to make them useful, and that is where there will be lot of action over the next ten years. They can not be simply released into the wild as they come straight from training. This is where the real action is now. More training doesn’t make things better necessarily. Boxing things in does. Already we see companies trying to add explainability to what LLMs say. Google’s Gemini now gives real citations with links, so that human users can oversee what they are being fed. Likewise, many companies are trying to box in what their LLMs can say and do. Those that can control their LLMs will be able to deliver useable product. A great example of this is the rapid evolution of coding assistants over the last year or so. These are specialized LLMs that do not give the same sort of grief to coders that I experienced when I first tried to use generic ChatGPT to help me. Peter Norvig, former chief scientist of Google, has recently produced a great report on his explorations of the new offerings. Real progress has been made in this high impact, but narrow use field. New companies will become specialists in providing this sort of boxing in and control of LLMs. I had seen an ad on a Muni bus in San Francisco for one such company, but it was too fleeting to get a photo. Then I stumbled upon  this tweet that has three such photos of different ads from the same company, and here is one of them: The four slogans on the three buses in the tweet are:  Get your AI to behave, When your AI goes off leash ,  Get your AI to work , and  Evaluate, monitor, and guardrail your AI . And “the AI” is depicted as a little devil of sorts that needs to be made to behave. This is one of my three traditional sections where I update one of my three initial tables of prediction from   my predictions  exactly eight years ago today. In this section I talk about self driving cars, driverless taxi services, and what that means, my own use of driverless taxi services in the previous year, adoption of electric vehicles in the US, and flying cars and taxis, and what those terms mean. No entries in the table specifically involve 2025 or 2026, and the status of  predictions that are further out in time remain the same. I have only put in one new comment, about how many cities in the US will have self-driving (sort of) taxi services in 2026 and that comment is highlighted, A Brief Recap of what “Self Driving” Cars Means and Meant This is a much abridged and updated version of what I wrote exactly one year ago today. The definition, or common understanding, of what self driving cars  really means has changed since my post on predictions eight years ago.  At that time self driving cars meant that the cars would drive themselves to wherever they were told to go with no further human control inputs. It was implicit that it meant level 4 driving. Note that there is also a higher level of autonomy, level 5, that is defined. Note that in the second row of content, it says that there will be no need for a human to take over for either level 4 or level 5. For level 4 there may be pre-conditions on weather and within a supported geographic area. Level 5 eliminates pre-conditions and geographic constraints. So far no one is claiming to have level 5. However the robot taxi services such as Cruise (now defunct), Waymo, currently operating in five US cities, and Zoox, currently operating in two cities with limited service (Las Vegas and San Francisco), all relied, or rely, on having remote humans who the car can call on to help get them out of situations they cannot handle. That is not what level 4 promises. To an outside observer it looks like level 4, but it is somewhat less than that in reality. This is not the same as a driver putting their hands back on the steering wheel in real time, but it does mean that there is sometimes a remote human giving high level commands to the car. The companies do not advertise how often this happens, but it is believed to be every few miles of driving. The Tesla self driving taxis in Austin have a human in the passenger seat to intervene when there is a safety concern. One of the motivations for self driving cars was that the economics of taxis, cars that people hire at any time for a short ride of a few miles from where they are to somewhere else of their choosing, would be radically different as there would be no driver. Systems which do require remote operations assistance to get full reliability cut into that economic advantage and have a higher burden on their ROI calculations to make a business case for their adoption and therefore their time horizon to scaling across geographies. Actual self-driving is now generally accepted to be much harder than every one believed . As a reminder of how strong the hype was and the certainty of promises that it was just around the corner here is a snapshot of a whole bunch of predictions by major executives from 2017. I have shown this many times before but there are three new annotations here for 2025 in the lines marked by a little red car. The years in parentheses are when the predictions were made. The years in blue are the predicted years of achievement. When a blue year is shaded pink it means that it did not come to pass by then. The predictions with orange arrows are those that I had noticed had later been retracted. It is important to note that every prediction that said something would happen by a year up to and including 2025 did not come to pass by that year.  In fact none of those have even come to pass by today. NONE . Eighteen of the twenty predictions were about things that were supposed to have happened by now, some as long as seven years ago. NONE of them have happened yet. My Own Experiences with Waymo in 2025 I took two dozen rides with Waymo in San Francisco this year. There is still a longer wait than for an Uber at most times, at least for where I want to go. My continued gripe with Waymo is that it selects where to pick me up, and it rarely drops me right at my house — but without any indication of when it is going to choose some other drop off location for me. The other interaction I had was in early November when I felt like I was playing bull fighter, on foot, to a Waymo vehicle.  My house is on a very steep hill in San Francisco, with parallel parking on one side and ninety degree parking on the other side. It is rare that two cars can pass each other traveling in opposite directions without one having to pull over into some empty space somewhere. In this incident I was having a multi-hundred pound pallet of material deliverd to my home. There was a very big Fedex truck parked right in front of my house, facing uphill, and the driver/operator was using a manual pallet jack to get it onto the back lift gate, but the load was nine feet long so it hung out past the boundary of the truck. An unoccupied Waymo came down the hill and was about to try to squeeze past the truck on that side. Perhaps it would have made it through if there was no hanging load. So I ran up to just above the truck on the slope and tried to get the Waymo to back up by walking straight at it. Eventually it backed up and pulled in a little bit and sat still. Within a minute it tried again. I pushed it back with my presence again. Then a third time. Let’s be clear it would have been a dangerous situation if it had done what it was trying to do and could have injured the Fedex driver who it had not seen at all. But any human driver would have figured out what was going on and that the Fedex truck would never go down the hill backwards but would eventually drive up the hill. Any human driver would have replanned and turned around. After the third encounter the Waymo stayed still for a while. Then it came to life and turned towards the upwards direction, and when it was at about a 45 degree angle to the upward line of travel it stopped for a few seconds. Then it started again and headed up and away.  I infer that eventually the car had called for human help, and when the human got to it, they directed it where on the road to go to (probably with a mouse click interface) and once it got there it paused and replanned and then headed in the appropriate direction that the human had made it already face. Self Driving Taxi Services There have been three self driving taxi services in the US in various stages of play over the last handful of years, though it turns out, as pointed out above that all of them have remote operators. They are Waymo, Cruise, and Zoox. Cruise died in both 2023 and 2024, and is now dead, deceased, an ex self driving taxi service. Gone. I see its old cars driving around the SF Bay Area, with their orange paint removed, and with humans in the driver seat. On the left below are two photos I took on May 30th at a recharge station. “Birdie” looked just like an old Cruise self driving taxi, but without an orange paint. I hunted around around in online stories about Cruise and soon found another “Birdie”, with orange paint, and the same license plate. So GM are using them to gather data, perhaps for training their level 3 driving systems. Tesla announced to much hoopla that they were starting a self driving taxi service this year, in Austin.  It  requires a safety person to be sitting in the front passenger seat  at all times. Under the certification with which they operate, on occasion that front seat person is required to move to the driver’s seat. Then it just becomes a regular Tesla with a person driving it and FSD enabled. The original fleet was just 30 vehicles, with at least seven accidents reported by Tesla by October, even with the front seat Tesla person. In October the CEO announced that the service would expand to 500 vehicles in Austin in 2025. By November he had changed to saying they would double the fleet.  That makes 60 vehicles. I have no information that it actually happened. He also said he wanted to expand the “Robotaxi” service to Phoenix, San Francisco, Miami, Las Vegas, Dallas, and Houston by the end of 2025. It appears that Tesla can not get permits to run even supervised (mirroring the Austin deployment) in any of those cities. And no, they are not operating in any of those cities and now 2025 has reached its end. In mid-December there were confusing reports saying that Tesla now had Model Y’s driving in Austin without a human safety monitor on board  but that the Robotaxi service for paying customers (who are still people vetted by Tesla) resumed their human safety monitors. So that is about three or four years behind Waymo in San Francisco, and not at all at scale. The CEO of Tesla has also announced (there are lots of announcements and they are often very inconsistent…) that actually the self driving taxis will be a new model with no steering wheel nor other driver controls. So they are years away from any realistic deployment. I will not be surprised if it never happens as the lure of humanoids completely distracts the CEO.  If driving with three controls, (1) steering angle of the front wheels, (2) engine torque (on a plus minus continuum), and (3) brake pedal pressure, are too hard to make actually work safely for real, how hard can it be to have a program control a heavy unstable balancing platform with around 80 joints in hips and waist, two legs, two arms and five articulated fingers on each hand? Meanwhile Waymo had  raised $5.6B to expand to new cities in 2025 . It already operated in parts of San Francisco, Los Angeles, and Phoenix. During 2025 it expanded to Austin and Atlanta, the cities it had promised. It also increased its geographic reach in its existing cities and surrounding metropolitan areas.  In the original three cities users have a Waymo app on their phone and specifically summon a Waymo. In the new cities however they used a slightly different playbook. In both Austin and Atlanta people use their standard Uber app.  They can update their preference to say that they prefer to get a Waymo rather than a human driven car, but there is no guarantee that a Waymo is what they will get. And any regular user of the Uber app in those cities may be offered a Waymo, but they do get an option to decline and to continue to wait for a human driven offer. In the San Francisco area, beyond the city itself, Waymo first expanded by operating in Palo Alto, in a geographically separate area. Throughout the year one could see human operated Waymos driving in locations all along the peninsula from San Francisco to Palo Alto and further south to San Jose. By November Waymo had announced driverless operations throughout that complete corridor,  an area of 260 square miles, but not quite yet on the freeways–the Waymos are operating on specific stretches of both 101 and 280, but only for customers who have specifically signed up for that possibility. Waymo is now also promising to operate at the two airports, San Jose and San Francisco. The San Jose airport came first, and San Francisco airport is operating in an experimental mode with a human in the front seat. Waymo has announced that it will expand to five more cities in the US during 2026; Miami, Dallas, Houston, San Antonio, and Orlando. It seems likely, given their step by step process, and their track record of meeting their promises that Waymo has a good shot at getting operations running in these five cities, doubling their total number of US cities to 10. Note that although it does very occasionally snow in five of these ten cities (Atlanta, Austin, Houston, San Antonio, and Orlando) it is usually only a dusting. It is not yet clear whether Waymo will operate when it does snow. It does not snow in the other five cities, and in San Francisco Waymo is building to be a critical part of the transportation infrastructure. How well that would work if a self driving taxi service was subject to tighter restrictions than human driven services due to weather could turn into a logistical nightmare for the cities themselves. In the early days of Cruise they did shut down whenever there was a hint of fog in San Francisco, and that is a common occurrence. It was annoying for me, but Cruise never reached the footprint size in San Francisco that Waymo now enjoys. No promises yet from Waymo about when it might start operating in cities that do commonly have significant snow accumulations. In May of 2025 Waymo announced a bunch of things in one press release . First, that they had 1,500 Jaguar-based vehicles at that time, operating in San Francisco, Los Angeles, Phoenix, and Austin. Second, that they were no longer taking deliveries of any more Jaguars from Jaguar, but that they were now building two thousand  of their own Jaguars in conjunction with Magna (a tier one auto supplier that also builds small run models of big brands — e.g., they build all the Mini Coopers that BMW sells) in Mesa, Arizona. Third, that they would also start building, in late 2025, versions of the Zeekr RT, a vehicle that they co-designed with Chinese company Geely, that can be built with no steering wheel or other controls for humans, but with sensor systems that are self-cleaning. It is hard to track exactly how many Waymos are deployed, but in August 2025, this website , citing various public disclosures by Waymo, put together the following estimates for the five cities in which Waymo was operating. No doubt those numbers have increased by now.  Meanwhile Waymo has annualized revenues of about $350M and is considering an IPO with a valuation of around $100B.  With numbers like those it can probably raise significant growth capital independently from its parent company. The other self driving taxi system deployed in the US is Zoox  which is currently operating only in small geographical locations within Las Vegas and San Francisco. Their deployment vehicles have no steering wheel or other driver controls–they have been in production for many years. I do notice, by direct observation as I drive and walk around San Francisco, that Zoox has recently enlarged the geographic areas where its driverful vehicles operate, collecting data across all neighborhoods. So far the rides are free on Zoox, but only for people who have gone through an application process with the company. Zoox is following a pattern established by both Cruise and Waymo. It is roughly four years behind Cruise and two years behind Waymo, though it is not clear that it has the capital available to scale as quickly as either of them. All three companies that have deployed actual uncrewed self driving taxi services in the US have been partially or fully owned by large corporations. GM owned Cruise, Waymo is partially spun out of Google/Alphabet, and Zoox is owned by Amazon. Cruise failed. If any other company wants to compete with Waymo or Zoox, even in cities where they do not operate, it is going to need a lot of capital. Waymo and Zoox are out in front. If one or both of them fail, or lose traction and fail to grow, and grow very fast, it will be near to impossible for other companies to raise the necessary capital. So it is up to Waymo and Zoox.  Otherwise, no matter how well the technology works, the dream of  driverless taxis is going to be shelved for many years. Electric Cars In my original predictions I said that electric car (and I meant battery electric, not hybrids) sales would reach 30% of the US total no earlier than 2027.  A bunch of people on twitter claimed I was a pessimist. Now it looks like I was an extreme optimist as it is going to take a real growth spurt to reach even 10% in 2026, i.e., earlier than 2027. Here is  the report   that I use to track EV sales — it is updated every few weeks. In this table I have collected the quarterly numbers that are finalized. The bottom row is the percentage of new car sales that were battery electric. Although late in 2024 EV sales were pushing up into the high eight percentage points they have dropped back into the sevens this year in the first half of the year. Then they picked up to 10.5% in the third quarter of 2025, but that jump was expected as the Federal electric vehicle (EV) tax credits ended for all new and used vehicles purchased after  September 30, 2025 , as part of the “One Big Beautiful Bill Act”.   People bought earlier than they might have in order to get that tax credit, so the industry is expecting quite a slump in the fourth quarter, but it will be a couple more months before the sales figures are all in.  YTD 2025 is still under 8.5%, and is likely to end at under 8%. The trends just do not look like we will get to EVs reaching 12% of US cars being sold in 2027, even with a huge uptick. 30% is just not going to happen. As for which brands are doing better than others, Tesla’s sales dropped a lot more than the rest of the market. Brand winners were GM, Hyundai, and Volkswagen. The US experience is not necessarily the experience across the world. For instance Norway reached 89% fully electric vehicles of all sold in 2024, largely due to taxes on gasoline powered car purchases. But that is a social choice of the people of Norway, not at all driven by oil availability. With a population of 5.6 million compared to the US with 348 million, and domestic oil production of 2.1 million barrels per day, compared to the US with 13.4 million b/d, Norway has a per capita advantage of almost ten times as much oil per person (9.7 to be more precise). Electrification levels of cars is a choice that a country makes. Flying Cars The next two paragraphs are reproduced from last’s years scorecard. Flying cars are another category where the definitions have changed. Back when I made my predictions it meant a vehicle that could both drive on roads and fly through the air.  Now it has come to mean an electric multi-rotor helicopter than can operate like a taxi between various fixed landing locations. Often touted are versions that have no human pilot. These are known as eVTOLs, for “electric vertical take off & landing”. Large valuations have been given to start ups who make nice videos of their electric air taxis flying about. But on inspection one sees that they don’t have people in them. Often, you might notice, even those flights are completely over water rather than land. I wrote about the lack of videos of viable prototypes back in November 2022. The 2022 post referred to in the last sentence was trying to make sense of a story about a German company, Volocoptor, receiving a $352M Series E investment. The report from pitchbook predicted world wide $1.5B in revenue in the eVTOL taxi service market for 2025.  I was bewildered as I could not find a single video, as of the end of 2022, of a demo of an actual flight profile with actual people in an actual eVTOL of the sort of flights that the story claimed would be generating that revenue in just 3 years. I still can’t find such a video. And the actual revenue for actual flights in 2025 turned out to be $0.0B (and there are no rounding errors there — it was $0) and Volocoptor has gone into receivership , with a “reorganization success” in March 2025. In my November 2022 blog post above I talked about another company, Lilium, which came the closest to having a video of a real flight, but it was far short of carrying people and it did not fly as high as is needed for air taxi service. At the time Lilium had 800 employees.  Since then Lilium has declared bankruptcy not once  (December 2024), but twice  (February 2025), after the employees had been working for some time without pay. But do not fear. There are other companies on the very edge of succeeding. Oh, and an edge means that sometimes you might fall off of it. Here is an interesting report on the two leading US eVTOL companies, Archer and Joby Aviation, both aiming at the uncrewed taxi service market; both with valuations in the billions, and both missing just one thing. A for real live working prototype. The story focuses on a pivotal point, the moment when an eVTOL craft has risen vertically, and now needs to transition to forward motion. In particular it points out that Archer has never demonstrated that transition, even with a pilot onboard, and during 2025 they cancelled three scheduled demonstrations at three different air shows. They did get some revenue in 2025 by selling a service forward to the city of Abu Dhabi, but zero revenue for actual operations–they have no actual operations.  They promise that for this year, 2026, with revenue producing flights in the second half of the year. Joby Aviation did manage to demonstrate the transition maneuver in April of 2025. And in November they made a point to point flight in Dubai, i.e., their test vehicle managed to take off somewhere and land at a different place. The fact that there were press releases for these two human piloted pretty basic capabilities for an air taxi service suggests to me that they are still years away from doing anything that is an actual taxi service (and with three announced designated place to land and take off from it seems more like a rail network with three stations rather than a taxi service–again slippery definitions do indeed slip and slide). And many more years away from a profitable service. But perhaps it is naive of me to think that a profitable business is the goal. As with many such technology demonstrators the actual business model seems to be getting cities to spend lots of money on a Kabuki theater technology show, to give credit to the city as being technology forward. Investors, meanwhile invest in the air taxi company thinking it is going to be a real transportation business. But what about personal transport that you own, not an eVTOL taxi service at all,but an eVTOL that you can individually own, hop into whenever you want and fly it anywhere? In October there was a story in the Wall Street Journal: “ I Test Drove a Flying Car. Get Ready, They’re Here. ” The author of the story spent three days training to be the safety person in a one seat Pivotal Helix (taking orders at  $190,000 a piece, though not yet actually delivering them; also take a look at how the vehicles lurch as they go through the pilot commanded transition maneuver). It is a one seater so the only person in the vehicle has to be the safety person in case something fails. He reports: After three hellish days in a drooling, Dramamine-induced coma, I failed my check ride. The next month he tried again. This time he had  a prescription for the anti-emetic Zofran and a surplus-store flight suit . The flight suit was to collect his vomit and save his clothes.  After four more days of training (that is seven total days of training), he qualified and finally took his first flight, and mercifully did not live up to his call sign of “Upchuck Yeager”.  $\190,000 to buy the plane, train for seven days, vomit wildly, have to dress in a flight suit, and be restricted to take off and landing and only fly over privately owned agricultural land or water. This is not a consumer product, and this is not a flying car that is here, despite the true believer headline. Two years ago I ended my review of flying cars with: Don’t hold your breath. They are not here. They are not coming soon. Last year I ended my review with: Nothing has changed. Billions of dollars have been spent on this fantasy of personal flying cars.  It is just that, a fantasy, largely fueled by spending by billionaires. There are a lot of people spending money from all the investments in these companies, and it is a real dream that they want to succeed for many of them. But it is not happening, even at a tiny scale, anytime soon. We are peak popular hype in all of robotics, AI, and machine learning. In January 1976, exactly fifty years ago, I started work on a Masters in machine learning. I have seen a lot of hype and crash cycles in all aspects of AI and robotics, but this time around is the craziest.  Perhaps it is the algorithms themselves that are running all our social media that have contributed to this. But it does not mean that the hype is justified, or that the results over the next decade will pay back the massive investments that are going in to AI and robotics right now. The current hype is about two particular technologies, with the assumption that these particular technologies are going to deliver on all the competencies we might ever want.  This has been the mode of all the hype cycles that I have witnessed in these last fifty years. One of the current darling technologies is large X models for many values of X (including VLMs and VLAs), largely, at the moment, using massive data sets, and transformers as their context and sequencing method. The other, isn’t even really a technology, but just a dream of a form of a technology and that  is robots with humanoid form. I have now put these two things in my five topics of my new predictions shared at the beginning of this post and will talk about them explicitly for each of the next ten years. Back in 2018 I did not talk about either of these technologies in my predictions, but rather talked about competences and capabilities.  I fear that I may have been overly optimistic about many of these and in the table below I point out that my predicted time of arrival has now come, but the capabilities or competencies have not.  I’m sure that many true believers in the two technologies mentioned above will have very short time scales on when they say this will be achieved. I pre-emptively disagree with them. Capabilities and Competences The predictions that are commented upon in the table above are all about when we would see robots and AI systems doing some things that simple creatures can do and others that any child of age nine or perhaps less can do without any difficulty. Even children aged three or four can navigate around cluttered houses without damaging them (that is different from when they may  want to damage them). They can get up and down single stairs, and even full stair cases on two legs without stumbling (or resorting to four limb walking as a two year old might). By age four they can open doors with door handles and mechanisms they have never seen before, and safely close those doors behind them. They can do this when they enter a particular house for the first time. They can wander around and up and down and find their way. One of the many promises about humanoid robots is that they too will be able to do this. But that is not what they can do today. But wait, you say, “I’ve seem them dance and somersault, and even bounce off walls.” Yes, you have seen humanoid robot theater. All those things are done on hard surfaces, and anything specific beyond walking has been practiced and optimized by reinforcement learning, for exactly the situation of the floors and walls as they are. There is no real-time sensing and no ability to wander in previously unseen environments, especially not those with slipping hazards such as towels or sheets of cardboard on the floor.  Children can do so easily.  While four legged robots are much better at it than humanoid robots, they are wider than people, and still have significant foot slipping problems, and cannot open random doors themselves as children can. A nine year old child can pretty much do any task (but with less weighty packages) than any delivery driver can do.  That includes climbing out of a van, walking up and down slopes, going up and down previously unseen external staircases, sometimes ending in a dark porch or vestibule area, then putting the package on the ground, or putting it into a drop bin after grasping and pulling on the handle — again never having encountered that particular design of bin and handle. All this can be done immediately upon seeing the scene for the first time. We have not seen anything remotely like that in a lab demo for robots, despite my hope from eight years ago that by now such would have been demonstrated. And again, here a four legged robot might be able to do the walking and stair climbing, but it won’t be able to manipulate the package. Also note that humans doing these tasks don’t just carry single packages out in front of them with two outstretched arms, but often use their elbows, their hips, and their bellies to support multiple packages as they locomote. Elder care is a commonly quote target market for robots, and with good reason given the current and growing demographic inversions in much of the world. There are far fewer younger people relative to the number of older people than there have been historically, and so less people to provide elder care.  In providing care to the very elderly, there is a need to support those people physically, both passively, providing compliant support for them to lean on, and actively, getting people into and out of bed, into and out of bathtubs or shower enclosures, and getting people onto and off of toilets. And sometimes wiping their bums. There are no force sensing and control capabilities on any of today’s robots which are remotely capable of doing any of these sorts of things safely and comfortably. And machine learning is not going to provide those capabilities. There are many fundamental design, materials, and engineering problems to solve to make these things possible.  A bitter lesson, perhaps, for those who think that more data will solve everything. But the other unresolved capability that I have in my predictions table above is an agent that understands the world in an ongoing way as we all understand it.  That includes knowing what to expect to be the same as it was yesterday, and will be tomorrow, and what has changed about the world since yesterday or is likely to change today or tomorrow. Such an understanding of the world will be important for any deployable systems that can take care of real and vulnerable humans, including the elderly. And the young. And the rest of us. In summary, I thought that more progress would be made on many of these problems than has been achieved over the last eight years. That lack of progress is going to have real, and negative, impact on the quality of life of the newly elderly for the next couple of decades. Ouch! VCs, please take note: there are real pulls on having technologies that can help the elderly, and being in there first with something that can actually deliver value in the next three to five years will be a come with a very large upside. World Models Lots of people are talking about world models and their importance, as add ons to LLMs, as mechanisms for agentic AI to exploit, and for allowing robots to do real tasks. These aspirations are probably reasonable to have, and successfully working on them can have real impacts. Unfortunately the talkers are not the doers, and not the deployers, and not the people who have to solve real problems. And so they all have different, and convenient for themselves, understandings of what world models are.  That, along with the worship of big data and the belief that machine learning will solve all problems means we have a big mess, with people jumping to “solutions” before they understand the problems. Some people are even claiming that they will build world models by learning them from having agents play video games.  But how do those video games work? They have a coded geometry-based world model, with a little physics engine. It is already built!  Using machine learning (and tens of millions of dollars) to extract it rather than just looking at the source code (and perhaps buying or licensing that code) is just wacky. Expect more confusion and lots and lots of reinvention. This fever has quite a ways to go before today’s memes and slogans get replaced by the next generation of memes and slogans, with perhaps some good work coming out in a rational interregnum. We can hope. Situatedness vs Embodiment One of the new things that people are getting excited about is Embodied Intelligence .  I agree that it is worth being excited about, as it is what I have spent the last forty years work on.  It is certainly about robots being in the world. But since 1991 I have made a distinction between two concepts where a machine, or creature can be either, neither, or both situated and embodied . Here are the exact definitions that I wrote for these back then: [Situatedness] The robots are situated in the world—they do not deal with abstract descriptions, but with the here and now of the world directly in-fluencing the behavior of the system. [Embodiment] The robots have bodies and experience the world directly—their actions are part of a dynamic with the world and have immediate feed-back on their own sensations. At first glance they might seem very similar.  And they are, but they are also importantly different. And, spoiler alert, I think much of the work at companies, large and small, right now, is trying abstract out the embodiment of a robot, turning it into a machine that is merely situated. An algorithm, written as code, to find the greatest common divisor of two numbers, when running, is neither situated nor embodied. A robot that is thrown into the air with just an inertial measurement unit (IMU) as its sensor that moves its limbs about to zero out rotations and then is caught by a net is embodied but not situated. A robot that has a physical face that can make expressions with it, a voice synthesizer, cameras, and microphones and that can talk to a person giving appropriate responses both with its choice of words and with appropriate prosody and facial expressions, to some purpose and in response to how the person talks and moves, is situated but not really embodied. Embodied in its presence yes, but not embodied in any physical interactions with its environment. A robot that can roll around without hitting stationary objects, wherever they are, nor hitting moving people or other vehicles, that can go to a location specified by a warehouse management system, that responds safely to people grabbing it anywhere, and can give a person who grabs its control handle agency over it going wherever the person pushes it with a light touch no matter how much weight it is currently carrying, is both embodied and situated. [And yes, this is what our Carter robots do at Robust.AI .] These are just point examples of the four classes of entities that come from having or not having the two properties of situatedness and embodiment. Real robots that do real work in dynamic human occupied environments must be both situated and embodied. For instance, a robot that is to help with in home elder care needs to be aware of the situation in the world in order to know what to do to help the person.  It needs to be able to open doors with different handles and latching mechanisms, and then control the inertia of the closing door so that the environment is both safe and quiet for the person. The robot needs to be able to accommodate the person reaching for it dynamically, looking for support that so that they don’t fall. The robot needs to able to take things handed to it by the person, and pass things to the person in a way which is both safe and makes it easy for the person to grasp. Etc., etc. In short the robot needs to control forces and inertias in the world and to be responsive to them, at that same time as it is acting in a way that can be understood as sentient. Being both situated and embodied is still a challenge to robots in the world.   [[Now here is the most important sentence of this whole blog post.]] I think the training regimes that being used for both locomotion and dexterity are either ignoring or trying to zero out the embodiment of physical robots, their inertias and forces, reducing them to merely being situated, just apps with legs and arms, characters in video games, not the reality of real physical beings that the tasks we want them to do requires. Dexterous Hands I talked about the challenges for dexterity earlier this year. In the table above I have a new comment this year saying that there has been improvement in the dexterity of suction based grippers but not for articulated grippers. Suction grippers have plastic suction cups which themselves are compliant. Under the force of the suction they can change shape, to a degree, to accommodate unknown shapes in the thing being grasped (sucked up to).  They also allow for a little bit of torsional rotation about the axis of sucking and a bit of rocking of the suction cup in the two degrees of freedom in the plane orthogonal to the suction axis. While suction cups have evolved to better pick things up and so are common for handling packaged goods, the companies that package materials to be shipped through automated systems choose versions of plastics for bags that won’t be sheared open by the suction pulling against outer parts of such cups. The result is that the control of the embodied action of grasping can become much more a simply situated action. Once the pick orientation and vacuum gripper selection has been made it is really an open loop as all the work is done by the indiscriminate force of suction and the mutual compliance of the gripper and the grippee. Above I had argued against do this with a general purpose humanoid hand. It makes no sense there as the adaptability of the hand is its most precious attribute. But here in a special purpose hand, a suction gripper, it actually simplifies things within the specialization of task, and here a purely situated hand may make sense. And it may be possible to train it with purely visual data. So what does this tell us?  It says that there is plenty of room for mechanical design, and simpler computational embodied control for all sorts of grippers and things in the world that need to be gripped. The end of Moore’s Law, at least the version that said we could reduce feature size on silicon by a factor two every year, opened up a new golden era of chip design. The winners (through early luck and then dogged determination), matched untraditional designs to new problems (machine learning) and achieved speedups (and corporate valuations) that were unheard of. In the last 10 years we have moved from general purpose silicon to special purpose silicon for our most high volume computations.  That was not on most people’s expectation list twenty years ago. So too today, with stalled capabilities from full human hand emulation efforts through machine learning from visual observation, there is a rich array of more specialized manipulation tasks where special purpose grippers, clever interplay of materials and force applications, geometric planning, specialized sensing, and maybe even some machine learning may lead to enormous application markets. For instance, a specialized robot body, hands (of some sort), arms, and support limbs or wheels that can safely manipulate an elderly human could have enormous impact on elder care around the world. A single human care-giver along with one human-manipulator robot could provide a lot more care for a frail elderly person than the care-giver alone could do. Special purpose manipulators for fruits, or for some range of small mechanical parts, or clothing, could each open enormous markets for automation in particular handling tasks for each of them. And countless other specialities. Economic pull is out there.  Being the smart academic researcher, entrepreneur, or technology investor, may lead to enormous new types of deployable automation. The new dexterity may turn out to be special purpose. And eventually we may come to understand that just because the hands we know best happen to be our own, does not mean that our own hands are the best for the majority of tasks in our human world. Humanoid romanticism may not be our future after all. Looking at the missions and numbers over the last three years it appears that human spaceflight is at a steady plateau, with, by the way, far fewer people going into orbit that in the time of the Space Shuttle.  Underneath though, there is a lot of churn, a likely new player, and the return of humans to lunar distances for the first time in 54 years. Below is the updated scoring of my 2018 predictions for human spaceflight. There are six new comments in this table, but no new specific calling of predicted dates as right or wrong. It is now clear to me that I was way too optimistic in regard to my predictions for Mars, even though I was wildly out of step and much more pessimistic then the predictions coming out of SpaceX eight years ago. Given how slow things have turned out trying to land people on the Moon, the hoped for crewed colony on the Moon (think of it as ISS (International Space Station) on the lunar surface) may well slip to what I had predicted for Mars.  Mars is going to take much longer than the Moon. Following the table there are the detailed numbers and trends on both orbital crewed flights, and suborbital crewed flights. Things will change from stasis in 2026.  A crewed flight to the Moon is scheduled to happen in a matter of weeks, with the vehicle already stacked, now.  And suborbital crewed flights may possibly have quite an uptick in 2026.  Following those two sections I have more on Boeing’s Starliner, SpaceX’ Starship, and Blue Origin’s New Glenn, NASA and the Moon, and what is going to happen with space stations given the scheduled end of life of the ISS in 2030. Orbital Crewed Flights In both 2024 and 2025 the US put 16 people into orbit and Russian and China put 6 people each into orbit; 28 people total went to orbit in each year. We have gone from a historical low of only eight people going to orbit in 2020 to a steady-ish state of roughly 28 people per year now. That may jump up to over 30 people in 2026 because of the additional Artemis II flight to the Moon, following checkout in LEO (Low Earth Orbit).  But even with that bump there may be other pressures which keep it from rising above the high twenties for 2026 We are certainly not seeing steady growth in the number of humans getting launched to orbit, and the numbers are significantly lower than the hey days of Shuttle launches in the nineties and early two thousands. There is no growth trend visible, and the long promised exponential growth of people going to orbital space has not even made a brief guest appearance. Here is a more detailed history for the last six years where the first line in each box says how many crewed launches of the particular vehicle there were, and the second line, in square brackets says how many people, total, were onboard those flights. Wherever there are three numbers separated by forward slashes you have to sum the numbers to get the total. The three countries with current crewed orbital launch capabilities are the US, Russia, and China. All Chinese flights are state astronauts (or taikonauts) and all of them go to the Chinese space station. And there are no tourists, so far, on Chinese flights, so we just have single numbers for both launches and people. All the state astronauts for both the US and Russia go to the International Space Station (ISS), but a state player (in Russia) and a non-state player in the US (SpaceX) have also launched tourist flights in the last six years. So for those two countries we have three numbers separated by slashes for both launches and people. The first of the three numbers refers to purely state launches to the ISS (note that the US and Russia both launch each others state astronauts to the ISS so that both countries have astronauts up-to-date trained on the other’s launch systems, in case of emergencies arising at some point). The second number in the triples is space tourists whose destinations have also been the ISS, while the third number (for both launches and people) is for tourist flights that have been independent of going to the ISS — there have been a total of three of these, all launched by SpaceX. Two of those three flights were purchased personally by Jared Issacman, who has now been sworn in as the NASA administrator just two weeks ago. The one year in the last six where Russia has launched space tourists (after being the leaders in this endeavor early in the century) was 2021, where two flights of Soyuz to the ISS had one regular state cosmonaut and two space tourists. And, there was one slightly wobbly other launch of a Soyuz in 2024, not called out in the table, where a flight attendant from the state airline of Belarus was sent as a cosmonaut from that country to the ISS on a Russian Soyuz. That was most likely an event orchestrated by Russia to keep support from Belarus for their war against Ukraine. Ugly. The term tourist needs some explanation. The people (as with suborbital Blue Origin flights) are a mixture of private people paying the experience (or having some other individual pay for them) or they are astronauts from countries that do not have their own launch capability. In the case of the three tourist flights to the ISS on a SpaceX Dragon, all have been paid for by the company Axiom, with a former NASA astronaut in command. The three others on each of those flights are people in the fledgling astronaut program of other countries who have paid Axiom for the seats. Axiom has commercial relationships with both SpaceX and NASA for the use of the Flacon 9 launch vehicle, the Dragon craft and use fee of the ISS. Suborbital Crewed Flights Virgin Galactic is on a multi-year hiatus on flights as they develop new flight vehicles, but they may well fly again in 2026. Thus, for the last year, only Blue Origin has been launching  tourists (again a mixture of private individuals and astronauts from other countries that have not yet developed their own crewed launch capability, but may be aiming at doing so) suborbital flights. Blue Origin also sells uncrewed launches for experiments that need to be exposed to the environment of space and/or operation in microgravity, if only for a few minutes. In 2025 Blue Origin had seven launches each with six people on board. Previously they had had three crewed launches in each of 2021, 2022, and 2024, each with six people on board, with a hiatus in 2023. Blue Origin has been quite careful with forward projections for both suborbital and orbital flights, so when they say what they intend to do and when, they are likely to come close to achieving that promise. Recently they said that they are going to introduce three new flight vehicles starting in 2026 to run their suborbital flights, that they are looking at developing a second launch site, somewhere else than Texas, and that they believe they have the customer demand to support one flight per week. They do not disclose what they charge for the flights. Nor did they give any firm dates for reaching these goals. But I think it is likely that we will see a jump in the number of flights in 2026, In December of 2025 I was at an event centered on solar system orbital dynamics and met a sub-orbital tourist there. He has already paid for and flown above the Kármán line on Virgin Galactic. Now he has paid for a Blue Origin sub-orbital flight and is waiting for a launch assignment. There is definitely a market for these flights, it remains to be seen whether the prices and demand combine in a way that makes it profitable for seat suppliers to keep doing it. Boeing’s Starliner (not to be confused with the SpaceX Starship) When it was first announced, in 2010, Boeing’s Starliner was originally scheduled to fly a human test crew in 2018. It was supposed send the crew to the ISS, then it would be under contract to launch six crews to the ISS, much as SpaceX has already launched 11 regular crews to the ISS. In mid 2024 it delivered a human test crew to ISS,  Barry Wilmore and Sunita Williams, but after much analysis of anomalies it returned to Earth without them. NASA bumped two crew members from the next crew going on a SpaceX flight to the ISS to provide room for their return, on that SpaceX Dragon, which they did after an unexpected extra nine months on top of their originally scheduled week at the ISS. Last year in my yearly update I said: We do not know at this point, but I think it would not be a huge surprise if Starliner never flies again. It turns out it is going to fly again ! Including potentially twice in 2026. But there are some changes. The six missions which were contracted to take astronauts on regular assignment to the ISS were called Starliner-1 through Starliner-6 . The contract with NASA has been modified to make the last two flights future options rather than sure things. And Starliner-1 scheduled for the first half of 2026 will be un-crewed again. Then the three remaining flights in the modified contract would each take four astronauts on regular rotations to the ISS. There is one little hiccup. Sunita Williams is the only active astronaut, not committed to other current or upcoming missions, who has trained to fly on Starliner. She now has over 600 days in space and another six month mission to the ISS would take her over radiation exposure limits. SpaceX Falcon 9 I gave the statistics for Falcon 9 in the introduction, talking about what has surprised me in the last 8 years. When I made my predictions Falcon 9 had been launched 46 times over 8 years. Only five of those launches re-used a previously flown first stage, and only in the previous year had successful landings of the first stage become reliable. Now Falcon 9s are getting launched at a sustained rate of more than three per week, all attempts at landing boosters are successful, and typically each booster flies over 20 times. Just phenomenal unmatched reliability and performance. NASA, Artemis, and Returning to the Moon I am jumping ahead of Starship (SpaceX) and New Glenn (Blue Origin) to talk about NASA’s plan to get people back to the lunar surface, and perhaps setting up a more or less permanent outpost there. This is how the ISS has been continuously occupied for 25 years, rotating crew members in and out twice a year. (China’s space station follows the same model, but with only 3 occupants compared to 7 for ISS). 2026 promises to be a big year for humanity and the Moon. No one has been beyond low Earth orbit (LEO) since the Apollo 17 mission had three people go to lunar orbit and two of them landed in December 1972, fifty three years ago. In November 2022 the first launch of NASA’s SLS (Space Launch System) occurred taking its uncreewed Orion capsule in a looping orbit past the Moon and back. It approached the surface of the Moon in each direction, and then came back to Earth and splashed down. Note that this was the FIRST flight of both the multi-stage rocket, and the habitable capsule. It all worked FIRST time.  Everything was built by contractors, but it underwent NASA’s methodology to make sure things worked rather than failed. The first stage consists of a liquid fueled rocket using four RS-25 engines, the same as the three engines on the Space Shuttle. It also has two solid fuel boosters strapped on, larger versions of the Space Shuttle solid fuel boosters. The second stage is essentially an off the shelf stage from the past Delta program. There will be a third stage added for the fourth and subsequent flights.  This is a derivative vehicle, with a long history of successful use of its components. When Vice President Mike Pence announced the details of the program in 2019 the landing of people on the Moon was supposed to happen in 2024.  Things have slipped a little since then. The first crewed mission to the vicinity of the Moon (no landing) Artemis II had slipped to April 2026, but now it has been pulled forward to February 2026 (next month!), when a crew of four will spend over ten days in space on Artemis II in a flight to the Moon approaching to within 4,600 miles, then in a free return manner (no need to have working engines) they will head back towards Earth.  All their energy will be removed by heat shields hitting the Earth’s atmosphere and then by the use of 11 parachutes, finally splashing down in the ocean. Note that on all 9 flights to the Moon of the Apollo program, the spacecraft came much closer to the Moon than this, and 8 of the flights went into orbit at around 60 to 70 miles above the surface. So this is a more conservative mission than those of Apollo. Things at this stage are looking good for Artemis to fly in February 2026. The next step of the Artemis is where things get wobbly. Rather than 2024, the first landing of astronauts on the Moon is currently scheduled for 2027. But that is not going to happen. Here is what the architecture of the mission currently looks like: Here we see the problem with the schedule, even with it currently slipped to landing two astronauts on the Moon in 2027. The architecture uses the SLS and Orion to get the astronauts to lunar orbit. Given there is a lunar flyby with astronauts onboard, scheduled for just two months from now (and the rocket is already stacked for that mission) that looks like a reasonable interpolation from existing progress. The problem with the new plan is the landing vehicle and getting it to lunar orbit.  It is all based on SpaceX’s Starship. So far, Starship has had 11 flights, six of which have been successful in reaching their own goals, and 5 of which have been failures.  But there has not yet, in eleven flights, been a goal of getting anything into orbit. And just in 2025 two vehicles have been destroyed by failures on the ground when the tanks have been pressure tested. In the section on Starship below I will talk more about what I see as conflicting product requirements which together doom Starship to a very long development process. For comparison, the Saturn V which took astronauts to the Moon nine times had a total of 13 flights , every one of which got payloads to Earth orbit. Two were uncrewed tests (and there were problems with the second and third stages on the second of these test flights). Its very first crewed flight (Apollo 8) took people to the Moon. and a total of 9 launches got people to the Moon. The other two flights were (Apollo 9) a crewed flight to test the Lunar Lander and orbital rendezvous in Earth orbit, and the uncrewed launch of the first space station, Skylab. Now look again at the plan for the Artemis III mission.  It requires multiple (reported numbers range from 14 to somewhere into the twenties) launches of the Starship to orbit. One of those launches uses the Super Heavy Booster and a special version of the second stage actual Starship, known as Starship HLS (Human Landing System).  That special version is expendable after it lands astronauts on the Moon, hosts them for perhaps two weeks, then brings them back to lunar orbit where they transfer to NASA’s Orion. Then it sends itself off into heliocentric orbit for all eternity. The HLS version is special in two ways. First it does not have to get back to Earth and so doesn’t need heat shields and does not need the three in-atmosphere Raptors for soft landing on Earth (see the section on Starship below).  That is good for all the mass equations. But it does, or might, have a second set of engines for landing on the Moon that are attached halfway up its body so that they cause less lunar dust to fly around as it lands. We have not yet seen a prototype of that version, not even a public rendering as far as I can tell. I have talked to people who are in regular communication with people inside SpaceX.  They report not a peep about what work has been done to design or build the lander.  That is not good for the current public schedule. BUT the really, really bad thing is that the lunar lander stage will use up most its fuel getting into Earth orbit — it is the second stage of the rocket after all. So it cannot get to the Moon unless it is refueled.  That will be done by sending up regular versions of the Starship second stage, all on reusable Super Heavy Boosters. They too will use up most of their fuel getting to orbit, and will need to keep some to get back to Earth to be reused on another flight. But it will have a little margin and its extra fuel will be transferred to the lunar landing Starship in orbit. No one has ever demonstrated transfer of liquid fuel in space. Because of the way the numbers work out it takes somewhere in the teens of these refueling operations, and depending on how quickly certified higher performance engines can be developed and tested for both the Super Heavy Booster and Starship itself, that number of refueling flights might range into the twenties. As an engineer this architecture looks to me like trouble, and with an impossible future. I am sure it will not happen in 2027, and I have doubts that it ever will. The acting administrator of NASA, Sean Duffy who is also the head of the US Department of Transportation, was worried about this too, and in October of 2025 he reopened bidding on the contract for a crewed lander for the Moon that collects and returns its crew from Orion in lunar orbit. The day after this announcement SpaceX said they were working on a simplified architecture to land people in the Moon. They have given no details of what this architecture looks like, but here are some options proposed by the technical press. A couple of weeks later the President announced the renomination of Jared Isaacman to be the NASA administrator, having withdrawn his nomination a few months before. Isaacman is a private citizen who personally paid for, and flew on, two of the three SpaceX crewed missions which have not flown to the ISS. He was confirmed to the NASA position on December 17 th , 2025, just two weeks ago. At the very least expect turbulence, both political and technical, in getting astronauts landed on the Moon. And see a possible surprise development below. SpaceX Starship (not to be confused with Boeing’s Starliner) Starship is SpaceX’s superheavy two stage rocket, designed to put 150(?) tons of payload into orbit, with components having been under development since 2012, going through extensive redesigns along the way. There have also been three major designs, builds, and tests of the Raptor engines that power both stages. This is how Wikipedia currently introduces them: Raptor is a family of rocket engines developed and manufactured by SpaceX. It is the third rocket engine in history designed with a full-flow staged combustion fuel cycle, and the first such engine to power a vehicle in flight. The engine is powered by cryogenic liquid methane and liquid oxygen, a combination known as methalox. SpaceX’s super-heavy-lift Starship uses Raptor engines in its Super Heavy booster and in the Starship second stage. Starship missions include lifting payloads to Earth orbit and is also planned for missions to the Moon and Mars. The engines are being designed for reuse with little maintenance. Currently the Raptor 3 version is expected to be used for operational Starship launches, and it comes in two versions. There are 33 Raptors in the first stage designed to operate optimally in the atmosphere, along with three such engines in the second stage, which also houses three vacuum optimized Raptors. The first stage engines and the second stage vacuum engines are designed to get payloads to orbit. The vacuum engines on the second stage would also be used for further operations on the way to the Moon and descending towards the surface there. And for non-expendable second stages they would be used for the initial de-orbit burn for landing the second stage Starship back on Earth. After using the heat shields to burn off some more energy  as it enters the atmosphere the second set of engines, the atmosphere optimized Raptors, are used to slow it down to a soft landing. Other systems for returning to Earth have used different tradeoffs. The Space Shuttle used its wings to slow down to very high horizontal landing speed, and then a combination of a drag parachute after touchdown and brakes on the wheels to get down to zero velocity. US capsules, such as Mercury, Gemini, Apollo, Orion, and Dragon have all used heat shields followed by parachutes during vertical fall, and lastly dropped into the sea for dampening the final residual velocity. (Soyuz, Starliner, and New Shepard all use last second retro rockets before hitting the ground, rather than water.) This means that unlike all the other solutions Starship has to carry a complete set of engines into orbit just for use during landing, along with enough fuel and oxidant to land. This is a high performance price for the thing that flies in space, mostly. The engines on the Starship first stage, like those on Falcon 9 and Blue Origin’s New Glenn, do get to space but never get to more than a small fraction of orbital speed, so returning them to Earth is a much, much, lower performance price than Starship’s second stage return of engines and fuel. The 2025 flights of Starship were, on average, better than the 2024 flights, but two vehicles destroyed themselves before getting to the flight stage, and still nothing got into orbit. How close is it to working?  I don’t know.  But I do keep tabs on promises that have been made. In November of 2024 the President of SpaceX  said “I would not be surprised if we fly 400 Starship launches in the next four years” .  A year ago today I said in response: “Looking at the success of Falcon 9 it is certainly plausible that I may live to see 400 Starship launches in a four year period, but I am quite confident that it will not happen in the next four years (2025 through 2028)” . We are a quarter of the way through her predicted time frame and we have gone from being 400 orbital launches away from her goal down to being a mere 400 away. Blue Origin Gets to Orbit The suborbital tourist flights that Blue Origin operates are not its main business. It has ambitions to compete head to head with SpaceX. But it is almost 600 launches behind, how can it be competitive? In 2025 Blue Origin made clear that it is not to be dismissed. From zero orbital launches at the start of 2025 to having two orbiters on their way to Mars (SpaceX has not yet done that) and showing that it can land a booster that has very very close to the performance of Falcon Heavy’s three booster configuration when landing all three boosters. And it may well do a soft landing on the Moon in 2026 (SpaceX won’t come close to that goal for a number of years). In February Blue Origin launched its first New Glenn rocket. It’s first stage is powered by seven BE-4 engines (“Blue Engine 4”), a methane burning engine that is more powerful than the Raptor 3 which will power new versions of SpaceX’s Starship. New Glenn reached orbit on its first attempt, and delivered a Blue Origin payload to space (a test version of their Blue Ring for in-space communications). The first stage attempted to land on Blue Origin’s Jacklyn landing platform at sea but failed. The BE-4 had previously powered two United Launch Alliance Vulcan rockets to orbit under a deal where Blue Origin sells engines to ULA. The second stage of New Glenn is powered by two BE-3 engines, which are a variant of the single engine used on Blue Origin’s New Shepard. In their second launch, in November, Blue Origin not only delivered three paid payloads to orbit (two of which are headed to Mars, where they will orbit the planet and carry out science experiments for UC Berkeley  on what happened to Mars’ atmosphere), but then the first stage (much larger than the first stage of a Falcon 9) landed on Jacklyn with an unrivaled level of precise control. Blue Origin plans to reduce the time spent hovering in future landings to reduce preserved fuel needs now that it has mastered return from orbit vertical landing. (Recall that they have landed dozens of New Shepard vertical landings on return from non-orbital flights.) Soon after this impressive second outing for New Glenn, Blue Origin announced a number of upgrades. They renamed the base vehicle that has now flown twice to be “New Glenn 7×2” where 7 and 2 refer to the number of first stage and second stage engines respectively.  They also announced that those flight engines would be upgraded to levels of thrust and duration that had already been demonstrated in ground tests. These are the new total thrust numbers, in pounds force. Additionally New Origin announced a new heavier, taller, and with larger payload faring, version, the “New Glenn 9×4” with two extra engines on each stage. Looking up from below the first stage the engine arrangement goes from the one on the left to the one on the right. And here is who the two variants look compared to the Saturn V which took humans to the Moon in 1969. The kicker to these successes is that the New Glenn 7×2 with a reusable first stage is very nearly equivalent to the Falcon Heavy when its three first stage boosters are reused. The reusable New Glenn 9×4 beats Falcon Heavy on all measures even when all three of Falcon Heavy are sacrificed and not recovered.  I can’t quite get all the numbers but this table makes the comparisons with the numbers I can find. Note that a “tonne” is the British spelling for a metric ton, which is 1,000Kg. That is approximately 2,206 lbs, which is 206 lbs more than a US ton, and 34 lbs less than a British ton. Meanwhile expectations are high for another launch of a New Glenn, the 7×2 version, sometime early in the new year. There has been no announcement from Blue Origin, nor any indication of the payload. But there is a general feeling that it may actually be a launch of Blue Origin’s Blue Moon Mark 1, an all up single launch mission to soft land on the Moon.  It was announced almost a year ago that Blue Origin has a deal to deliver a NASA payload to the Moon in the Blue Moon Pathfinder mission no earlier than 2026. The Mark 1 uses a BE-7 engine to soft land. Here is where things get interesting for a re-appraisal of how NASA astronauts might first land on the Moon again. Blue Origin is already under contract with NASA to land two astronauts on the Moon for a 30 day stay in 2030 using their much larger Blue Moon Mark 2.  The Mark 2 and Mark 1 share control systems and avionics, so a successful landing of Mark 1 will boost confidence in the Mark 2.  The architecture for the 2030 mission involves multiple launches. A NASA SLS launches a crewed Orion capsule to the vicinity of the Moon. A New Glenn gets a Mark 2 Blue Moon to an orbit that approaches the Moon. A “ Cislunar Transporter ” is launched separately and it gets fueled in LEO. Then it heads off to the same orbit as the Mark 2 and refuels it. The Mark 2 and the transporter both use three Blue Origin BE-7 engines  which are now fully operational . Then the astronauts transfer to the Mark 2 to land on the Moon.  Note that this architecture uses in flight refueling as does the SpaceX version, though with far fewer launches involved. BUT, soon after then NASA administrator Sean Duffy announced the re-opening of the contract for the lander for Artemis III, it appeared  that he was considering having Blue Origin use their Mark 1 version for the crewed mission. Whether that enthusiasm survives the changing of the guard to Jared Isaacman, the new and current NASA administrator, remains to be seen.  And whether Blue Origin can pull off a rendezvous in lunar orbit, to pick up and return the crew members going to the lunar surface, from an orbiting Orion capsule is also an open question.  I think the key idea with this option is to remove the need for any in flight refueling for the first crewed landing. There is going to be some stiff competition between SpaceX and Blue Origin. Either might win. New space stations The International Space Station will be at end of life in 2030 after continuous human habitation for almost thirty years. The other space station currently in orbit is the Chinese Tiangong station. Expect to see a real pick up in the building of space stations over the next few years, in anticipation of the end of the ISS. The Russian Orbital Service Station (ROS) is scheduled to begin construction, by Roscosmos, in orbit in 2027.  There is risk to this plan from the deterioration of the Russian economy. India plans to start building their  Bharatiya Antariksh Station (BAS) in 2028 and for it to be fully operational in 2035. India has had uncrewed orbital capability since 1980, and sent its first uncrewed mission to Mars in 2013. For BAS it is developing crewed launch capability. In 2025 India sent one of its own astronauts to the ISS on a SpaceX Dragon under an agreement with the company Axiom. A consortium of countries (US, Canada, Japan, European Union, and the United Arab Emirates) are collaborating on building the Lunar Gateway , a space station orbiting the Moon. Launch of the first module is scheduled for 2027 on a SpaceX Falcon Heavy. Blue Origin is competing for additional components and launches for the Gateway. A host of private companies plan on launching smaller private space stations in the near term, with one claiming it will do so in May 2026. This is going to be an active frontier, and may lead to more humans going on orbital flights than the current status quo of about 28 per year. Their robots have not demonstrated any practical work (I don’t count dancing in a static environment doing exactly the same set of moves each time as practical work). The demonstrated grasping, usually just a pinch grasp,  in the videos they show is at a rate which is painfully slow and not something that will be useful in practice. They claim that their robots will learn human-like dexterity but they have not shown any videos of multi-fingered dexterity where humans can and do grasp things that are unseen, and grasp and simultaneously manipulate multiple small objects with one hand. And no demonstrations of using the body with the hands which is how humans routinely carry many small things or one or two heavy things. They show videos of non tele-operated manipulation, but all in person demonstrations of manipulation are tele-operated. Their current plans for robots working in customer homes all involve a remote person tele-operating the robot. Their robots are currently unsafe for humans to be close to when they are walking. Their robots have no recovery from falling and need human intervention to get back up. Their robots have a battery life measured in minutes rather than hours. Their robots cannot currently recharge themselves. Unlike human carers for the elderly, humanoids are not able to provide any physical assistance to people that provides stabilizing support for the person walking, getting into and out of bed physical assistance, getting on to and off of a toilet, physical assistance, or indeed any touch based assistance at all. The CEOs claim that there robots will be able to do everything, or many things, or a lot of things, that a human can do in just a few short years. They currently do none. The CEOs claim a rate of adoption of these humanoid robots into homes and industries at a rate that is multiple orders of magnitude faster than any other technology in human history, including mainframe computers, and home computers and the mobile phones, and the internet. Many orders of magnitude faster. Here is a CEO of a humanoid robot company saying that they will be in 10% of US households by 2030. Absolutely no technology (even without the problems above) has ever come close to scaling at that rate.

0 views
Jameel Ur Rahman 2 months ago

The story of OnlineOTP

A few months ago I faced an annoying problem. I wanted to redeem my Cathay Pacific Miles but I was unable to log into my account. The SMS OTP never arrived. My account was tied to my Sri Lankan number, which I’ve had for many years, yet I never received their OTP. After wasting an inordinate amount of time, first with their chat support, then with the call support, I was told this was a “known problem”. When I browsed through /r/SriLanka I immediately noticed this was a recurring problem that has gone back for more than a year on a number of services. I really wanted to scratch this itch. I knew from past experience that jumping to building was not the best solution, but at the same time I wanted to ride the momentum of this idea. Two weeks later I had an MVP ready to go. Powered by Ruby on Rails, Tailwind, Render, Twilio, Resend, Tally, Hopes and Wishes. The Security guy in me was crying as I built this product. The Pragmatist in me was satisfied that I had a use case when the product flow was broken. The Entrepreneur in me watched me scratch this itch knowing the pitfall I was knowingly putting myself into, after all I hadn’t validated this product yet. With this I went live! ... in Beta . My hope was that the survey would get me my initial customers and help me validate this product. I was all too willing to keep this product live for a year in case I got a single Beta user. 1 month later I had 5 survey submissions I gave out 4 beta codes And got 0 signups that redeemed a code Fun Fact: Someone created an account on my site before I could 😅 After I went live in Beta, itch scratching satisfied, the Entrepreneur in me finally got a hold of the steering wheel and went to work. I talked to a number of people and one very helpful interested customer who reached out to me on LinkedIn. When trying to list every real world situation where someone might need OnlineOTP, or less gloriously "SMS to email" I came up with a surprising number of usecases. Expats who lose access to home-country services Professionals who must verify accounts across multiple countries Travelers who need OTP reliably and without roaming fees People in countries with unreliable carriers People with privacy or security concerns I went on the hunt. I stalked through forums trying to find users who face this problem and to pick up how they solved this problem for themselves. I had some success And some failures Overall I came to the overwhelming conclusion that I had a problem, but not really a solution that would reliably work. Here’s a snippet from a report I wrote to my coach. The majority of people want to receive OTP from financial institutions like banks. Banks do not like virtual numbers as it somewhat defeats the purpose of a multi factor authentication. Which means as a product calling itself OnlineOTP, I can not guarantee service quality as banks may not send their OTPs to VOIP numbers even though they accept it during registration. #strike1 The TAM is quite small and will get smaller as The people who want the solution seem to either be travellers who feel this momentary pain and then they are back home As banks move towards the industry standard of two factor auth via apps, passkeys or authenticator apps this will reduce in value. People don’t want to let others get access to OTPs. especially since it’s coming from banks. Trust factor issue. #strike2 When I initially started the project I got an “OK” from Twilio for my usecase, but before I went past Beta I wanted to be doubly sure and this time there was a lot of push back and a polite no, that this is against their acceptable use policy. Researching other providers I found that almost all of them have terms that imply they won’t be happy with reselling phone numbers or using it to receive OTPs. #strike3 I think I’m fairly sure a real problem exists but I don’t think the solution I’ve come up with is the right solution. At the moment there doesn’t seem to technically be a way to provide SMS to Email without becoming a Telecom Provider myself (MVNO specifically), which is not practical. A bit disapointing. That said, I regret nothing. It’s been fun going through this process even though it's resulting in me shutting down a product a month after launch. I’m just glad no one redeemed a beta code, as I would be honour bound to support the product for at least a year then. With this blog post, I'm closing up OnlineOTP. Excited to see what 2026 holds. Happy New Year! Cool Logo… Check Shareworthy Landing page… Check SEO… Check Focusing on the problem…Check Functioning buy a VoIP number and then get SMS to Email… Check Handling edge cases when buying a number… Check [Entrepreneur: What why?] Live dashboard showing SMS as you receive it… Check [Entrepreneur: seriously?] [Security guy: mate you're receiving OTPs… you should be self destructing it instead.] Mandatory FAQ explaining caveats with this product… Check I'm a UK expat living in Malaysia who needs a UK phone number that can receive SMS while in Malaysia. I'm a Certified Public Accountant based in the Philippines with clients in Singapore and Hong Kong. I am unable to reliably receive SMS OTP to process payments while sitting in Philippines I'm a Virtual Assistant who manages their client's accounts remotely and needs OTP access to complete tasks. I'm a Freelancer who needs a local number in multiple countries to access region-specific apps. I'm a business owner who manages accounts in multiple regions and needs OTPs from each region forwarded to one inbox. I'm a businessman who wants to receive OTPs on my Canadian phone number without having to pay Roaming Charges while I travel. And I travel frequently. I'm a back packer on a tour around the world who uses a temporary number but still needs to reliably access OTPs from their local bank. I'm a digital nomad who cycles through countries every few months and can't maintain SMS reliability. I'm a cruise passenger relying on ship WiFi and unable to receive SMS at sea. (Or Flight). I'm a traveler who temporarily uses a local SIM card but still needs OTPs from my home-country number. I'm a Sri Lankan who has a local Sri Lankan number who does not reliably receive SMS from Cathay Pacific on my local phone number. I'm someone living in a rural area where cellular coverage is weak, but email over WiFi works. People accessing platforms that require local numbers I'm an online seller/buyer who needs verification codes from marketplaces that only text local numbers. (If I remember correctly, Carousell in Singapore had that issue when I tried to buy something from it when I visited SG) I'm someone who wants an international virtual number for privacy but needs guaranteed SMS delivery. I'm someone who frequently relocates and prefers a stable, long-term virtual number. I'm a business founder who doesn't want to expose their personal number to dozens of SaaS platforms.

0 views
Herman's blog 2 months ago

Discovery and AI

I browse the discovery feed on Bear daily, both as part of my role as a moderator, and because it's a space I love, populated by a diverse group of interesting people. I've read the posts regarding AI-related content on the discovery feed, and I get it. It's such a prevalent topic right now that it feels inescapable, available everywhere from Christmas dinner to overheard conversation on the subway. It's also becoming quite a polarising one, since it has broad impacts on society and the natural environment. This conversation also raises the question about popular bloggers and how pre-existing audiences should affect discoverability. As with all creative media, once you have a big-enough audience it becomes self-perpetuating that you get more visibility. Think Spotify's 1%. Conveniently, Bear is small enough that bloggers with no audience can still be discovered easily and it's something I'd like to preserve on the platform. In this post I'll try and explain my thinking on these matters, and clear up a few misconceptions. First off, posts that get many upvotes through a large pre-existing audience, or from doing well on Hacker News do not spend disproportionately more time on the discovery feed. Due to how the algorithm works, after a certain number of upvotes, more upvotes have little to no effect. Even a post with 10,000 upvotes won't spend more than a week on page #1. I want Trending to be equally accessible to all bloggers on Bear. While this cap solves the problem of sticky posts, there is a second, less pressing issue: If a blogger has a pre-existing audience, say in the form of a newsletter or Twitter account, some of their existing audience will likely upvote, and that post has a good chance of feature on the Trending page. One of the potential solutions I've considered is either making upvotes available to logged in users only, or Bear account holders receive extra weighting in their upvotes. However, due to how domains work each blog is a new website according to the browser, and so logins don't persist between blogs. This would require logging in to upvote on each site, which isn't feasible. While I moderate Bear for spam, AI-generated content, and people breaking the Code of Conduct, I don't moderate by topic. That removes the egalitarian nature of the platform and puts up topic rails like an interest-group forum or subreddit. While I'm not particularly interested in AI as a topic, I don't feel like it's my place to remove it, in the same way that I don't feel particularly strongly about manga. There is a hide blog feature on the discovery page. If you don't want certain blogs showing up in your feed, add them to the hidden textarea to never see them again. Similarly to how Bear gives bloggers the ability to create their own tools within the dashboard, I would like to lean into this kind of extensibility for the discovery feed, with hiding blogs being the start. Curation instead of exclusion. This post is just a stream of consciousness of my thoughts on the matter. I have been contemplating this, and, as with most things, it's a nuanced problem to solve. If you have any thoughts or potential solutions, send me an email. I appreciate your input. Enjoy the last 2 days of 2025!

0 views
Tenderlove Making 2 months ago

Can Bundler Be as Fast as uv?

At RailsWorld earlier this year, I got nerd sniped by someone. They asked “why can’t Bundler be as fast as uv?” Immediately my inner voice said “YA, WHY CAN’T IT BE AS FAST AS UV????” My inner voice likes to shout at me, especially when someone asks a question so obvious I should have thought of it myself. Since then I’ve been thinking about and investigating this problem, going so far as to give a presentation at XO Ruby Portland about Bundler performance . I firmly believe the answer is “Bundler can be as fast as uv” (where “as fast” has a margin of error lol). Fortunately, Andrew Nesbitt recently wrote a post called “How uv got so fast” , and I thought I would take this opportunity to review some of the highlights of the post and how techniques applied in uv can (or can’t) be applied to Bundler / RubyGems. I’d also like to discuss some of the existing bottlenecks in Bundler and what we can do to fix them. If you haven’t read Andrew’s post, I highly recommend giving it a read . I’m going to quote some parts of the post and try to reframe them with RubyGems / Bundler in mind. Andrew opens the post talking about rewriting in Rust: uv installs packages faster than pip by an order of magnitude. The usual explanation is “it’s written in Rust.” That’s true, but it doesn’t explain much. Plenty of tools are written in Rust without being notably fast. The interesting question is what design decisions made the difference. This is such a good quote. I’m going to address “rewrite in Rust” a bit later in the post. But suffice to say, I think if we eliminate bottlenecks in Bundler such that the only viable option for performance improvements is to “rewrite in Rust”, then I’ll call it a success. I think rewrites give developers the freedom to “think outside the box”, and try techniques they might not have tried. In the case of , I think it gave the developers a good way to say “if we don’t have to worry about backwards compatibility, what could we achieve?”. I suspect it would be possible to write a uv in Python (PyUv?) that approaches the speeds of uv, and in fact much of the blog post goes on to talk about performance improvements that aren’t related to Rust. pip’s slowness isn’t a failure of implementation. For years, Python packaging required executing code to find out what a package needed. I didn’t know this about Python packages, and it doesn’t really apply to Ruby Gems so I’m mostly going to skip this section. Ruby Gems are tar files, and one of the files in the tar file is a YAML representation of the GemSpec. This YAML file declares all dependencies for the Gem, so RubyGems can know, without evaling anything, what dependencies it needs to install before it can install any particular Gem. Additionally, RubyGems.org provides an API for asking about dependency information, which is actually the normal way of getting dependency info (again, no required). There’s only one other thing from this section I’d like to quote: PEP 658 (2022) put package metadata directly in the Simple Repository API, so resolvers could fetch dependency information without downloading wheels at all. Fortunately RubyGems.org already provides the same information about gems. Reading through the number of PEPs required as well as the amount of time it took to get the standards in place was very eye opening for me. I can’t help but applaud folks in the Python community for doing this. It seems like a mountain of work, and they should really be proud of themselves. I’m mostly going to skip this section except for one point: Ignoring requires-python upper bounds. When a package says it requires python<4.0, uv ignores the upper bound and only checks the lower. This reduces resolver backtracking dramatically since upper bounds are almost always wrong. Packages declare python<4.0 because they haven’t tested on Python 4, not because they’ll actually break. The constraint is defensive, not predictive. I think this is very very interesting. I don’t know how much time Bundler spends on doing “required Ruby version” bounds checking, but it feels like if uv can do it, so can we. I really love that Andrew pointed out optimizations that could be made that don’t involve Rust. There are three points in this section that I want to pull out: Parallel downloads. pip downloads packages one at a time. uv downloads many at once. Any language can do this. This is absolutely true, and is a place where Bundler could improve. Bundler currently has a problem when it comes to parallel downloads, and needs a small architectural change as a fix. The first problem is that Bundler tightly couples installing a gem with downloading the gem. You can read the installation code here , but I’ll summarize the method in question below: The problem with this method is that it inextricably links downloading the gem with installing it. This is a problem because we could be downloading gems while installing other gems, but we’re forced to wait because the installation method couples the two operations. Downloading gems can trivially be done in parallel since the files are just archives that can be fetched independently. The second problem is the queuing system in the installation code. After gem resolution is complete, and Bundler knows what gems need to be installed, it queues them up for installation. You can find the queueing code here . The code takes some effort to understand. Basically it allows gems to be installed in parallel, but only gems that have already had their dependencies installed. So for example, if you have a dependency tree like “gem depends on gem which depends on gem ” ( ), then no gems will be installed (or downloaded) in parallel. To demonstrate this problem in an easy-to-understand way, I built a slow Gem server . It generates a dependency tree of ( depends on , depends on ), then starts a Gem server. The Gem server takes 3 seconds to return any Gem, so if we point Bundler at this Gem server and then profile Bundler, we can see the impact of the queueing system and download scheme. In my test app, I have the following Gemfile: If we profile Bundle install with Vernier, we can see the following swim lanes in the marker chart: The above chart is showing that we get no parallelism during installation. We spend 3 seconds downloading the gem, then we install it. Then we spend 3 seconds downloading the gem, then we install it. Finally we spend 3 seconds downloading the gem, and we install it. Timing the process shows we take over 9 seconds to install (3 seconds per gem): Contrast this with a Gemfile containing , , and , which have no dependencies, but still take 3 seconds to download: Timing for the above Gemfile shows it takes about 4 seconds: We were able to install the same number of gems in a fraction of the time. This is because Bundler is able to download siblings in the dependency tree in parallel, but unable to handle other relationships. There is actually a good reason that Bundler insists dependencies are installed before the gems themselves: native extensions. When installing native extensions, the installation process must run Ruby code (the file). Since the could require dependencies be installed in order to run, we must install dependencies first. For example depends on , but is only used during the installation process, so it needs to be installed before can be compiled and installed. However, if we were to decouple downloading from installation it would be possible for us to maintain the “dependencies are installed first” business requirement but speed up installation. In the case, we could have been downloading gems and at the same time as gem (or even while waiting on to be installed). Additionally, pure Ruby gems don’t need to execute any code on installation. If we knew that we were installing a pure Ruby gem, it would be possible to relax the “dependencies are installed first” business requirement and get even more performance increases. The above case could install all three gems in parallel since none of them execute Ruby code during installation. I would propose we split installation in to 4 discrete steps: Downloading and unpacking can be done trivially in parallel. We should unpack the gem to a temporary folder so that if the process crashes or the machine loses power, the user isn’t stuck with a half-installed gem. After we unpack the gem, we can discover whether the gem is a native extension or not. If it’s not a native extension, we “install” the gem simply by moving the temporary folder to the “correct” location. This step could even be a “hard link” step as discussed in the next point. If we discover that the gem is a native extension, then we can “pause” installation of that gem until its dependencies are installed, then resume (by compiling) at an appropriate time. Side note: , a Bundler alternative , works mostly in this manner today. Here is a timing of the case from above: Lets move on to the next point: Global cache with hardlinks. pip copies packages into each virtual environment. uv keeps one copy globally and uses hardlinks I think this is a great idea, but I’d actually like to split the idea in two. First, RubyGems and Bundler should have a combined, global cache, full stop. I think that global cache should be in , and we should store files there when they are downloaded. Currently, both Bundler and RubyGems will use a Ruby version specific cache folder. In other words, if you do on two different versions of Ruby, you get two copies of Rails and all its dependencies. Interestingly, there is an open ticket to implement this , it just needs to be done. The second point is hardlinking on installation. The idea here is that rather than unpacking the gem multiple times, once per Ruby version, we simply unpack once and then hard link per Ruby version. I like this idea, but I think it should be implemented after some technical debt is paid: namely implementing a global cache and unifying Bundler / RubyGems code paths. On to the next point: PubGrub resolver Actually Bundler already uses a Ruby implementation of the PubGrub resolver. You can see it here . Unfortunately, RubyGems still uses the molinillo resolver . In other words you use a different resolver depending on whether you do or . I don’t really think this is a big deal since the vast majority of users will be doing most of time. However, I do think this discrepancy is some technical debt that should be addressed, and I think this should be addressed via unification of RubyGems and Bundler codebases (today they both live in the same repository, but the code isn’t necessarily combined). Lets move on to the next section of Andrew’s post: Andrew first mentions “Zero-copy deserialization”. This is of course an important technique, but I’m not 100% sure where we would utilize it in RubyGems / Bundler. I think that today we parse the YAML spec on installation, and that could be a target. But I also think we could install most gems without looking at the YAML gemspec at all. Thread-level parallelism. Python’s GIL forces parallel work into separate processes, with IPC overhead and data copying. This is an interesting point. I’m not sure what work pip needed to do in separate processes. Installing a pure Ruby, Ruby Gem is mostly an IO bound task, with some ZLIB mixed in. Both of these things (IO and ZLIB processing) release Ruby’s GVL, so it’s possible for us to do things truly in parallel. I imagine this is similar for Python / pip, but I really have no idea. Given the stated challenges with Python’s GIL, you might wonder whether Ruby’s GVL presents similar parallelism problems for Bundler. I don’t think so, and in fact I think Ruby’s GVL gets kind of a bad rap. It prevents us from running CPU bound Ruby code in parallel. Ractors address this, and Bundler could possibly leverage them in the future, but since installing Gems is mostly an IO bound task I’m not sure what the advantage would be (possibly the version solver, but I’m not sure what can be parallelized in there). The GVL does allow us to run IO bound work in parallel with CPU bound Ruby code. CPU bound native extensions are allowed to release the GVL , allowing Ruby code to run in parallel with the native extension’s CPU bound code. In other words, Ruby’s GVL allows us to safely run work in parallel. That said, the GVL can work against us because releasing and acquiring the GVL takes time . If you have a system call that is very fast, releasing and acquiring the GVL could end up being a large percentage of that call. For example, if you do , and the buffer is very small, you could encounter a situation where GVL book keeping is the majority of the time. A bummer is that Ruby Gem packages usually contain lots of very small files, so this problem could be impacting us. The good news is that this problem can be solved in Ruby itself, and indeed some work is being done on it today . No interpreter startup. Every time pip spawns a subprocess, it pays Python’s startup cost. Obviously Ruby has this same problem. That said, we only start Ruby subprocesses when installing native extensions. I think native extensions make up the minority of gems installed, and even when installing a native extension, it isn’t Ruby startup that is the bottleneck. Usually the bottleneck is compilation / linking time (as we’ll see in the next post). Compact version representation. uv packs versions into u64 integers where possible, making comparison and hashing fast. This is a cool optimization, but I don’t think it’s actually Rust specific. Comparing integers is much faster than comparing version objects. The idea is that you take a version number, say , and then pack each part of the version in to a single integer. For example, we could represent as and as , etc. It should be possible to use this trick in Ruby and encode versions to integer immediates, which would unlock performance in the resolver. Rust has an advantage here - compiled native code comparing u64s will always be faster than Ruby, even with immediates. However, I would bet that with the YJIT or ZJIT in play, this gap could be closed enough that no end user would notice the difference between a Rust or Ruby implementation of Bundler. I started refactoring the object so that we might start doing this, but we ended up reverting it because of backwards compatibility (I am jealous of in that regard). I think the right way to do this is to refactor the solver entry point and ensure all version requirements are encoded as integer immediates before entering the solver. We could keep the API as “user facing” and design a more internal API that the solver uses. I am very interested in reading the version encoding scheme in uv. My intuition is that minor numbers tend to get larger than major numbers, so would minor numbers have more dedicated bits? Would it even matter with 64 bits? I’m going to quote Andrew’s last 2 paragraphs: uv is fast because of what it doesn’t do, not because of what language it’s written in. The standards work of PEP 518, 517, 621, and 658 made fast package management possible. Dropping eggs, pip.conf, and permissive parsing made it achievable. Rust makes it a bit faster still. pip could implement parallel downloads, global caching, and metadata-only resolution tomorrow. It doesn’t, largely because backwards compatibility with fifteen years of edge cases takes precedence. But it means pip will always be slower than a tool that starts fresh with modern assumptions. I think these are very good points. The difference is that in RubyGems and Bundler, we already have the infrastructure in place for writing a “fast as uv” package manager. The difficult part is dealing with backwards compatibility, and navigating two legacy codebases. I think this is the real advantage the uv developers had. That said, I am very optimistic that we could “repair the plane mid-flight” so to speak, and have the best of both worlds: backwards compatibility and speed. I mentioned at the top of the post I would address “rewrite it in Rust”, and I think Andrew’s own quote mostly does that for me. I think we could have 99% of the performance improvements while still maintaining a Ruby codebase. Of course if we rewrote it in Rust, you could squeeze an extra 1% out, but would it be worthwhile? I don’t think so. I have a lot more to say about this topic, and I feel like this post is getting kind of long, so I’m going to end it here. Please look out for part 2, which I’m tentatively calling “What makes Bundler / RubyGems slow?” This post was very “can we make RubyGems / Bundler do what uv does?” (the answer is “yes”). In part 2 I want to get more hands-on by discussing how to profile Bundler and RubyGems, what specifically makes them slow in the real world, and what we can do about it. I want to end this post by saying “thank you” to Andrew for writing such a great post about how uv got so fast . Download the gem Unpack the gem Compile the gem Install the gem

0 views