Posts in Ruby (20 found)
André Arko 1 weeks ago

Announcing <code>rv clean-install</code>

Originally posted on the Spinel blog . As part of our quest to build a fast Ruby project tool , we’ve been hard at work on the next step of project management: installing gems. As we’ve learned over the last 15 years of working on Bundler and RubyGems, package managers are really complicated! It’s too much to try to copy all of rbenv, and ruby-build, and RubyGems, and Bundler, all at the same time. Since we can’t ship everything at once, we spent some time discussing the first project management feature we should add after Ruby versions. Inspired by and , we decided to build . Today, we’re releasing the command as part of version 0.4. So, what is a clean install? In this case, clean means “from a clean slate”. You can use to install the packages your project needs after a fresh checkout, or before running your tests in CI. It’s useful by itself, and it’s also concrete step towards managing a project and its dependencies. Even better, it lays a lot of the groundwork for future gem management functionality, including downloading, caching, and unpacking gems, compiling native gem extensions, and providing libraries that can be loaded by Bundler at runtime. While we don’t (yet!) handle adding, removing, or updating gem versions, we’re extremely proud of the progress that we’ve made, and we’re looking forward to improving based on your feedback. Try running today, and see how it goes. Is it fast? Slow? Are there errors? What do you want to see next? Let us know what you think .

0 views

Pixoo64 Ruby Client

I bought a Pixoo64 LED Display to play around with, and I love it! It connects to WiFi and has an on-board HTTP API so you can program it. I made a Ruby client for it that even includes code to convert PNG files to the binary format the sign wants. One cool thing is that the display can be configured to fetch data from a remote server, so I configured mine to fetch PM2.5 and CO2 data for my office. Here’s what it’s looking like so far: Yes, this is how I discovered I need to open a window 😂

0 views
Rodney Brooks 1 weeks ago

Predictions Scorecard, 2026 January 01

Nothing is ever as good as it first seems and nothing is ever as bad as it first seems. — A best memory paraphrase of advice given to me by Vice Admiral Joe Dyer, former chief test pilot of the US Navy and former Commander of NAVAIR. [You can follow me on social media: @rodneyabrooks.bsky.social and see my publications etc., at https://people.csail.mit.edu/brooks ] This is my eighth annual update on how my dated predictions from January 1 st , 2018 concerning (1) self driving cars , (2) robotics, AI , and machine learning , and (3) human space travel , have held up. I promised then to review them at the start of the year every year until 2050 (right after my 95 th birthday), thirty two years in total. The idea was to hold myself accountable for those predictions. How right or wrong was I? The summary is that my predictions held up pretty well, though overall I was a little too optimistic. That is a little ironic, as I think that many people who read my predictions back on  January 1 st , 2018 thought that I was very pessimistic compared to the then zeitgeist. I prefer to think of myself as being a realist. And did I see LLMs coming? No and yes. Yes, I did say that something new and big that everyone accepted as the new and big thing in AI would come along no earlier than 2023, and that the key paper for it’s success had already been written by before I made my first predictions. And indeed LLMs were generally accepted as the next big thing in 2023 (I was lucky on that date), and the key paper, Attention Is All You Need , was indeed already written, and had first appeared in June of 2017. I wrote about this extensively in last year’s scorecard . But no, I had no idea it would be LLMs at the time of my correct prediction that something big would appear. And that lack of specificity on the details of exactly what will be invented and when is the case with all my predictions from the first day of 2018. I did not claim to be clairvoyant about exactly what would happen, rather I was making predictions about the speed of new research ideas, the speed of hype generation, the speed of large scale deployments of new technologies, and the speed of fundamental changes propagating through the world’s economy. Those speeds are very different and driven by very different realities. I think that many people get confused by that and make the mistake of jumping between those domains of reality, thinking all the speeds will be the same.  In my case my estimates of those speeds are informed by watching AI and robotics professionally, for 42 years at the time of my predictions. I became a graduate student in Artificial Intelligence in January of 1976, just shy of 20 years after the initial public outing of the term Artificial Intelligence at the summer workshop in 1956 at Dartmouth. And now as of today I have been in that field for 50 years. I promised to track my predictions made eight years ago today for 32 years. So I am one quarter of the way there. But the density of specific years of events or marking percentages of adoption that I predicted start to fall off right around now. Sometime during 2026 I will bundle up all my comments over the eight years specifically mentioning years that have now passed, and put them in an archival mid-year post. Then I will get rid of the three big long tables that dominate the body of this annual post, and have short updates on the sparse dates for the next 24 years. I will continue to summarize what has happened in self-driving cars generally, including electrification progress and the forever promised flying cars, along with AI and robotics, and human space flight. But early in 2025 I made five new predictions for the coming ten years, without specific dates, but which summarize what I think will happen.  I will track these predictions too. What I Nearly Got Wrong The day before my original prediction post in 2018 the price of Bitcoin had opened at $12,897.70 and topped out at $14,377.40 and 2017 had been the first year it had ever traded at over $1,000. The price seemed insane to me as Bitcoin wasn’t being used for the task for which it had been designed. The price seemed to me then, and now, to be purely about speculation. I almost predicted when it would be priced at $200, on the way down. But, fortunately, I checked myself as I realized that the then current state of the market made no sense to me and so any future state may not either. Besides, I had no experience or expertise in crypto pricing. So I left that prediction out. I had no basis to make a prediction. That was a wise decision, and I revisit that reasoning as I make new predictions now, and implore myself to only make predictions in fields where I know something. What Has Surprised Me, And That I Missed 8 Years Ago I made some predictions about the future of SpaceX although I didn’t always label them as being about SpaceX. A number of my predictions were in response to pronouncements by the CEO of SpaceX. My predictions were much more measured and some might say even pessimistic. Those predictions so far have turned out to be more optimistic than how reality has unfolded. I had made no specific predictions about Falcon 9, though I did make predictions about the subsequent SpaceX launch family, now called Starship, but then known as BFR, which eight years later has not gotten into orbit. In the meantime SpaceX has scaled the Falcon 9 launch rate at a phenomenal speed, and the magnitude of the growth is very surprising. Eight years ago, Falcon 9 had been launched 46 times, all successful, over the previous eight years, and it had recently had a long run of successful landings of the booster whenever attempted. At that time five launches had been on a previously used booster, but there had been no attempts to launch Falcon Heavy with its three boosters strapped together. Now we are eight years on from those first eight years of Falcon 9 launches. The scale and success rate of the launches has made each individual launch an unremarkable event, with humans being launched a handful of times per year. Now the Falcon 9 score card stands at 582 launches with only one failed booster, and there have been 11 launches of the three booster Falcon Heavy, all successful. That is a sustained growth rate of 38% year over year for eight years. And that it is a very high sustained deployment growth rate for any complex technology. There is no other modern rocket with such a volume of launches that comes even close to the Falcon 9 record.  And I certainly did not foresee this volume of launches. About half the launches have had SpaceX itself as the customer, starting in February 2018, launching an enormous satellite constellation (about two thirds of all satellites ever orbited) to support Starlink bringing internet to everywhere on the surface of Earth. But… there is one historical rocket, a suborbital one which has a much higher record of use than Falcon 9 over a much briefer period. The German V-2 was the first rocket to fly above the atmosphere and the first ballistic missile to be used to deliver bombs. It was fueled with ethanol and liquid oxygen, and was steered by an analog computer that also received inputs from radio guide signals–it was the first operational liquid fueled rocket. It was developed in Germany in the early 1940’s and after more than a thousand test launches was first put into operation on September 7 th , 1944, landing a bomb on Paris less than two weeks after the Allied liberation of that city. In the remaining 8 months of the war 3,172 armed V-2 rockets were launched at targets in five countries — 1,358 were targeted at London alone. My Color Scheme and Past Analysis The acronyms I used for predictions in my original post were as follows. NET year means it will not happen before that year (No Earlier Than) BY year means I predict that it will happen by that year. NIML , Not In My Lifetime, i.e., not before 2050. As time passes mentioned years I color then as accurate , too pessimistic , or  too optimistic . Last year I added hemming and hawing . This is for when something looks just like what I said would take a lot longer has happened, but the underlying achievement is not what everyone expected, and is not what was delivered. This is mostly for things that were talked about as being likely to happen with no human intervention and it now appears to happen that way, but in reality there are humans in the loop that the companies never disclose. So the technology that was promised to be delivered hasn’t actually been delivered but everyone thinks it has been. When I quote myself I do so in orange , and when I quote others I do so in blue . I have not changed any of the text of the first three columns of the prediction tables since their publication on the first day of 2018. I only change the text in the fourth column to say what actually happened.  This meant that by four years ago that fourth column was getting very long and skinny, so I removed them and started with fresh comments two years ago. I have kept the last two year’s comments and added new ones, with yellow backgrounds, for this year, removing the yellow backgrounds from 2025 comments that were there last year. If you want to see the previous five years of comments you can go back to   the 2023 scorecard . On March 26 th I skeeted out five technology predictions, talking about developments over the next ten years through January 1st, 2036. Three weeks later I included them in a blog post . Here they are again. 1. Quantum Computers . The successful ones will emulate physical systems directly for specialized classes of problems rather than translating conventional general computation into quantum hardware. Think of them as 21st century analog computers. Impact will be on materials and physics computations. 2. Self Driving Cars . In the US the players that will determine whether self driving cars are successful or abandoned are #1 Waymo (Google) and #2 Zoox (Amazon). No one else matters. The key metric will be human intervention rate as that will determine profitability. 3. Humanoid Robots . Deployable dexterity will remain pathetic compared to human hands beyond 2036. Without new types of mechanical systems walking humanoids will remain too unsafe to be in close proximity to real humans. 4. Neural Computation . There will be small and impactful academic forays into neuralish systems that are well beyond the linear threshold systems, developed by 1960, that are the foundation of recent successes. Clear winners will not yet emerge by 2036 but there will be multiple candidates. 5. LLMs . LLMs that can explain which data led to what outputs will be key to non annoying/dangerous/stupid deployments. They will be surrounded by lots of mechanism to keep them boxed in, and those mechanisms, not yet invented for most applications, will be where the arms races occur. These five predictions are specifically about what will happen in these five fields during the ten years from 2026 through 2035, inclusive. They are not saying when particular things will happen, rather they are saying whether or not  certain things will happen in that decade. I will do my initial analysis of these five new predictions immediately below. For the next ten years I will expand on each of these reviews in this annual scorecard, along with reviews of my earlier predictions. The ten years for these predictions are up on January 1 st , 2036. I will have just turned 81 years old then, so let’s see if I am still coherent enough to do this. Quantum Computers The successful ones will emulate physical systems directly for specialized classes of problems rather than translating conventional general computation into quantum hardware. Think of them as 21st century analog computers. Impact will be on materials and physics computations. The original excitement about quantum computers was stimulated by a paper by Peter Shor in 1994 which gave a digital quantum algorithm to factor large integers much faster than a conventional digital computer. Factoring integers is often referred to as “the IFP” for the integer factorization problem . So what? The excitement around this was based on how modern cryptography, which provides our basic security for on-line commerce, works under the hood. Much of the internet’s security is based on it being hard to factor a large number. For instance in the RSA algorithm Alice tells everyone a large number (in different practical versions it has 1024, 2048, or 4096 bits) for which she knows its prime factors. But she tells people only the number not its factors. In fact she chose that number by multiplying together some very large prime numbers — very large prime numbers are fairly easy to generate (using the Miller-Rabin test). Anyone, usually known as Bob, can then use that number to encrypt a message intended for Alice. No one, neither Tom, Dick, nor Harry, can decrypt that message unless they can find the prime factors of Alice’s public number. But Alice knows them and can read the message intended only for her eyes. So… if you could find prime factors of large numbers easily then the backbone of digital security would be broken. Much excitement! Shor produced his algorithm in 1994. By the year 2001 a group at IBM had managed to find the prime factors of the number 15 using a digital quantum computer as published in Nature . All the prime factors. Both 3 and 5. Notice that 15 has only four bits, which is a lot smaller than the number of bits used in commercial RSA implementations, namely 1024, 2048, or 4096. Surely things got better fast.  By late 2024 the biggest numbers that had been factored by an actual digital quantum computer had 35 bits which allows for numbers no bigger than 34,359,738,367. That is way smaller than the size of the smallest numbers used in RSA applications. Nevertheless it does represent 31 doublings in magnitude of numbers factored in 23 years, so progress has been quite exponential. But it could take another 500 years of that particular version of exponential growth rate to get to conquering today’s smallest version of RSA digital security. In the same report the authors say that a conventional, but very large computer (2,000 GPUs along with a JUWELS booster , which itself has 936 compute nodes each consisting of four NVIDIA A100 Tensor Core GPUs themselves each hosted by 48 dual threaded AMD EPYC Rome cores–that is quite a box of computing) simulating a quantum computer running Shor’s algorithm had factored a 39 bit number finding that 549,755,813,701 = 712,321 × 771,781, the product of two 20 bit prime numbers. That was its limit. Nevertheless, an actual digital quantum computer can still be outclassed by one simulated on conventional digital hardware. The other early big excitement for digital quantum computers was Grover’s search algorithm, but work on that has not been as successful as for Shor’s IFP solution. Digital quantum computation nirvana has not yet been demonstrated. Digital quantum computers work a little like regular digital computers in that there is a control mechanism which drives the computer through a series of discrete steps. But today’s digital quantum computers suffer from accumulating errors in quantum bits. Shor’s algorithm assumes no such errors. There are techniques for correcting those errors but they slow things down and cause other problems. One way that digital quantum computers may get better is if new methods of error correction emerge. I am doubtful that something new will emerge, get fully tested, and then make it into production at scale all within the next ten years. So we may not see a quantum (ahem) leap in performance of quantum digital computers in the next decade. Analog quantum computers are another matter. They are not switched, but instead are configured to directly simulate some physical system and the quantum evolution and interactions of components of that system. They are an embodied quantum model of that system. And they are ideally suited to solving these sorts of problems and cannot be emulated by conventional digital systems as they can be in the 39 bit number case above. I find people working on quantum computers are often a little squirrelly about whether their computer acts more like a digital or analog computer, as they like to say they are “quantum” only.  The winners over the next 10 years will be ones solving real problems in materials science and other aspects of chemistry and physics. Self Driving Cars In the US the players that will determine whether self driving cars are successful or abandoned are #1 Waymo (Google) and #2 Zoox (Amazon). No one else matters. The key metric will be human intervention rate as that will determine profitability. Originally the term “self driving car” was about any sort of car that could operate without a driver on board, and without a remote driver offering control inputs. Originally they were envisioned as an option for privately owned vehicles used by individuals, a family car where no person needed to drive, but simply communicated to the car where it should take them. That conception is no longer what people think of when self driving cars are mentioned. Self driving cars today refer to taxi-services that feel like Uber or Lyft, but for which there is not a  human driver, just paying passengers. In the US the companies that have led in this endeavor have changed over time. The first leader was Cruise, owned by GM. They were the first to have a regular service in the downtown area of a major city (San Francisco), and then in a number of other cities, where there was an app that anyone could download and then use their service. They were not entirely forthcoming with operational and safety problems, including when they dragged a person, who had just been hit by a conventionally driven car, for tens of feet under one of their vehicles. GM suspended operations in late 2023 and completely disbanded it in December 2024. Since then Waymo (owned by Google) has been the indisputable leading deployed service. Zoox (owned by Amazon) has been a very distant, but operational, second place. Tesla (owned by Tesla) has put on a facade of being operational, but it is not operational in the sense of the other two services, and faces regulatory headwinds that both Waymo and Zoox have long been able to satisfy. They are not on a path to becoming a real service. See my traditional section on self driving cars below, as it explains in great detail the rationale for these evaluations. In short, Waymo looks to have a shot at succeeding and it is unlikely they will lose first place in this race. Zoox may also cross the finish line, and it is very unlikely that anyone will beat them.  So if both of Waymo and Zoox fail, for whatever reason, the whole endeavor will grind to a halt in the US. But what might go wrong that makes one of these companies fail. We got a little insight into that in the last two weeks of 2025. On Saturday December 20 th of 2025 there was an extended power outage in San Francisco that started small in the late morning but by nightfall had spread to large swaths of the city.  And lots and lots of normally busy intersections were by that time blocked by many stationary Waymos. Traffic regulations in San Francisco are that when there is an intersection which has traffic lights that are all dark, that intersection should be treated as though it has stop signs at every entrance. Human drivers who don’t know the actual regulation tend to fall back to that behavior in any case. It seemed that Waymos were waiting indefinitely for green lights that never came, and at intersections through which many Waymos were routed there were soon enough waiting Waymos that the intersections were blocked.  Three days later, on December 23 rd , Waymo issued an explanation on their blog site , which includes the following: Navigating an event of this magnitude presented a unique challenge for autonomous technology. While the Waymo Driver is designed to handle dark traffic signals as four-way stops, it may occasionally request a confirmation check to ensure it makes the safest choice. While we successfully traversed more than 7,000 dark signals on Saturday, the outage created a concentrated spike in these requests. This created a backlog that, in some cases, led to response delays contributing to congestion on already-overwhelmed streets. We established these confirmation protocols out of an abundance of caution during our early deployment, and we are now refining them to match our current scale. While this strategy was effective during smaller outages, we are now implementing fleet-wide updates that provide the Driver with specific power outage context, allowing it to navigate more decisively. As the outage persisted and City officials urged residents to stay off the streets to prioritize first responders, we temporarily paused our service in the area. We directed our fleet to pull over and park appropriately so we could return vehicles to our depots in waves. This ensured we did not further add to the congestion or obstruct emergency vehicles during the peak of the recovery effort. The key phrase is that Waymos “request a confirmation check” at dark signals. This means that the cars were asking for a human to look at images from their cameras and manually tell them how to behave. With 7,000 dark signals and perhaps a 1,000 vehicles on the road, Waymo clearly did not have enough humans on duty to handle the volume of requests that were coming in. Waymo does not disclose whether any human noticed a rise in these incidents early in the day and more human staff were called in, or whether they simply did not have enough employees to make handling them all possible. At a deeper level it looks like they had a debugging feature in their code, and not enough people to supply real time support to handle the implications of that debugging feature. And it looks like Waymo is going to remove that debugging safety feature as a way of solving the problem.  This is not an uncommon sort of engineering failure during early testing. Normally one would hope that the need for that debugging feature had been resolved before large scale deployment. But, who are these human staff?  Besides those in Waymo control centers, it turns out there is a gig-work operation with an app named Honk  (the headline of the story is When robot taxis get stuck, a secret army of humans comes to the rescue ) whereby Waymo pays people around $20 to do minor fixups to stuck Waymos by, for instance, going and physically closing a door that q customer left open. Tow truck operators use the same app to find Waymos that need towing because of some more serious problem. It is not clear whether it was a shortage of those gig workers, or a shortage of people in the Waymo remote operations center that caused the large scale failures.  But it is worth noting that current generation Waymos need a lot of human help to operate as they do, from people in the remote operations center to intervene and provide human advice for when something goes wrong, to Honk gig-workers scampering around the city physically fixed problems with the vehicles, to people to clean the cars and plug them in to recharge when they return to their home base. For human operated ride services, traditional taxi companies or gig services such as Uber and Lyft, do not need these external services. There is a human with the car at all times who takes care of these things. The large scale failure on the 20 th did get people riled up about these robots causing large scale traffic snarls, and made them wonder about whether the same thing will happen when the next big earthquake hits San Francisco. Will the human support worker strategy be stymied by both other infrastructure failures (e.g., the cellular network necessary for Honk workers to communicate) or the self preservation needs of the human workers themselves? The Waymo blog post revealed another piece of strategy. This is one of three things they said that they would do to alleviate the problems: Expanding our first responder engagement: To date, we’ve trained more than 25,000 first responders in the U.S. and around the world on how to interact with Waymo. As we discover learnings from this and other widespread events, we’ll continue updating our first responder training. The idea is to add more responsibility to police and fire fighters to fix the inadequacies of the partial-only autonomy strategy for Waymo’s business model. Those same first responders will have more than enough on their plates during any natural disasters. Will it become a political issue where the self-driving taxi companies are taxed enough to provide more first responders? Will those costs ruin their business model? Will residents just get so angry that they take political action to shut down such ride services? Humanoid Robots Deployable dexterity will remain pathetic compared to human hands beyond 2036. Without new types of mechanical systems walking humanoids will remain too unsafe to be in close proximity to real humans. Despite this prediction it is worth noting that there is a long distance between current deployed dexterity and dexterity that is still pathetic. In the next ten years deployable dexterity may improve markedly, but not in the way the current hype for humanoid robots suggests.  I talk about his below in my annual section scoring my 2018 predictions on robotics, AI, and machine learning, in a section titled Dexterous Hands . Towards the end of 2025 I published a long blog post summarizing the status of, and problems remaining for  humanoid robots . I started building humanoid robots in my research group at MIT in 1992. My previous company, Rethink Robotics, founded in 2008, delivered thousands of upper body Baxter and Sawyer humanoid robots (built in the US) to factories between 2012 and 2018.  At the top of this blog page you can see a whole row of Baxter robots in China. A Sawyer robot that had operated in a factory in Oregon just  got shut down in late 2025 with 35,236 hours on its operations clock. You can still find many of Rethink’s humanoids in use in teaching and research labs around the world. Here is the cover of Science Robotics from November 2025, showing a Sawyer used in the research for   this article  out of Imperial College, London. Here is a slide from a 1998 powerpoint deck that I was using in my talks, six years after my graduate students and I had started building our first humanoid robot, Cog. It is pretty much the sales pitch that today’s humanoid companies use.  You are seeing here my version from almost twenty eight years ago. I point this out to demonstrate that I am not at all new to humanoid robotics and have worked on them for decades in both academia and in producing and selling humanoid robots that were deployed at scale (which no one else has done) doing real work. My blog post from September, details why the current learning based approaches to getting dexterous manipulation will not get there anytime soon. I argue that the players are (a) collecting the wrong data and (b) trying to learn the wrong thing. I also give an argument (c) for why learning might not be the right approach. My argument for (c) may not hold up, but I am confident that I am right on both (a) and (b), at least for the next ten years. I also outline in that blog post why the current (and indeed pretty much the only, for the last forty years) method of building bipeds and controlling them will remain unsafe for humans to be nearby. I pointed out that the danger is roughly cubicly proportional to the weight of the robot. Many humanoid robot manufacturers are introducing lightweight robots, so I think they have come to the same conclusion. But the side effect is that the robots can not carry much payload, and certainly can’t provide physical support to elderly humans, which is a thing that human carers do constantly — these small robots are just not strong enough. And elder care and in home care is one of the main arguments for having human shaped robots, adapted to the messy living environments of actual humans. Given that careful analysis from September I do not share the hype that surrounds humanoid robotics today. Some of it is downright delusional across many different levels. To believe the promises of many CEOs of humanoid companies you have to accept the following conjunction. The declarations being made about humanoid robots are just not plausible. We’ll see what actually happens over the next ten years, but it does seem that the fever is starting to crack at the edges. Here are two news stories from the last few days of 2025. From The Information on December 22 nd there is a story about how humanoid robot companies are wrestling with safety standards . All industrial and warehouse robots, whether stationary of mobile have a big red safety stop button, in order to comply with regulatory safety standards. The button cuts the power to the motors. But cutting power to the motors of a balancing robot might make them fall over and cause more danger and damage to people nearby.  For the upper torso humanoid robots Baxter and Sawyer from my company Rethink Robotics we too had a safety stop button that cut power to all the motors in the arms. It was a collaborative robot and often a person, or part of their limbs or body could be under an arm and it would have been dangerous for the arms to fall quickly on cutoff of power. To counter this we developed a unique circuit that required no active power, which made it so that the back current generated by a motor when powered off acted as a very strong brake. Perhaps there are similar possible solutions for humanoid robots and falling, but they need to be invented yet. On December 25 th the Wall Street Journal had a story headlined “Even the Companies Making Humanoid Robots Think They’re Overhyped” , with a lede of “Despite billions in investment, startups say their androids mostly aren’t useful for industrial or domestic work yet” . Here are the first two paragraphs of the story: Billions of dollars are flowing into humanoid robot startups, as investors bet that the industry will soon put humanlike machines in warehouses, factories and our living rooms. Many leaders of those companies would like to temper those expectations. For all the recent advances in the field, humanoid robots, they say, have been overhyped and face daunting technical challenges before they move from science experiments to a replacement for human workers. And then they go on to quote various company leaders: “We’ve been trying to figure out how do we not just make a humanoid robot, but also make a humanoid robot that does useful work,” said Pras Velagapudi, chief technology officer at Agility Robotics. Then talking about a recent humanoid robotics industry event the story says: On stage at the summit, one startup founder after another sought to tamp down the hype around humanoid robots. “There’s a lot of great technological work happening, a lot of great talent working on these, but they are not yet well defined products,” said Kaan Dogrusoz, a former Apple engineer and CEO of Weave Robotics. Today’s humanoid robots are the right idea, but the technology isn’t up to the premise, Dogrusoz said. He compared it to Apple’s most infamous product failure, the Newton hand-held computer. There are more quotes from other company leaders all pointing out the difficulties in making real products that do useful work. Reality seems to be setting in as promised delivery dates come and go by. Meanwhile here is what I said at the end of my September blog post about humanoid robots and teaching them dexterity.  I am not at all negative about a great future for robots, and in the nearish term. It is just that I completely disagree with the hype arguing that building robots with humanoid form magically will make robots useful and deployable. These particular paragraphs followed where I had described there, as I do again in this blog post, how the meaning of self driving cars has drifted over time. Following that pattern, what it means to be a humanoid robot  will change over time. Before too long (and we already start to see this) humanoid robots will get wheels for feet, at first two, and later maybe more, with nothing that any longer really resembles human legs in gross form.  But they will still be called humanoid robots . Then there will be versions which variously have one, two, and three arms. Some of those arms will have five fingered hands, but a lot will have two fingered parallel jaw grippers. Some may have suction cups. But they will still be called humanoid robots . Then there will be versions which have a lot of sensors that are not passive cameras, and so they will have eyes that see with active light, or in non-human frequency ranges, and they may have eyes in their hands, and even eyes looking down from near their crotch to see the ground so that they can locomote better over uneven surfaces. But they will still be called humanoid robots . There will be many, many robots with different forms for different specialized jobs that humans can do. But they will all still be called humanoid robots . As with self driving cars, most of the early players in humanoid robots, will quietly shut up shop and disappear. Those that remain will pivot and redefine what they are doing, without renaming it, to something more achievable and with, finally, plausible business cases. The world will slowly shift, but never fast enough to need a change of name from humanoid robots. But make no mistake, the successful humanoid robots of tomorrow will be very different from those being hyped today. Neural Computation There will be small and impactful academic forays into neuralish systems that are well beyond the linear threshold systems, developed by 1960, that are the foundation of recent successes. Clear winners will not yet emerge by 2036 but there will be multiple candidates. Current machine learning techniques are largely based on having millions, and more recently tens (to hundreds?) of billions, of linear threshold units. They look like this. Each of these units have a fixed number of inputs, where some numerical value comes in, and it is multiplied by a weight, usually a floating point number, and the results of all of the multiplications are summed, along with an adjustable threshold , which is usually negative, and then the sum goes through some sort of squishing function to produce a number between zero and one, or in this case minus one and plus one, as the output. In this diagram, which, by the way is taken from Bernie Widrow’s technical report from 1960, the output value is either minus one or plus one, but in modern systems it is often a number from anywhere in that, or another, continuous interval. This was based on previous work, including that of Warren McCulloch and Walter Pitts’ 1943 formal model of a neuron, Marvin Minsky’s 1954 Ph.D. dissertation  on using reinforcement for learning in a machine based on model neurons, and Frank Rosenblatt’s 1957  use of weights  (see page 10) in an analog implementation of a neural model. These are what current learning mechanisms have at their core. These! A model of  biological neurons that was developed in a brief moment of time from 83 to 65 years ago.  We use these today.  They are extraordinarily primitive models of neurons compared to what neuroscience has learned in the subsequent sixty five years. Since the 1960s higher levels of organization have been wrapped around these units. In 1979 Kunihiko Fukushima published (at the International Joint Conference on Artificial Intelligence, IJCAI 1979, Tokyo — coincidentally the first place where I published in an international venue) his first English language description of convolutional neural networks ( CNN s), which allowed for position invariant recognition of shapes (in his case, hand written digits), without having to learn about those shapes in every position within images. Then came backpropagation , a method where a network can be told the correct output it should have produced, and by propagating the error backwards through the derivative of the quantizer in the diagram above (note that the quantizer shown there is not differentiable–a continuous differentiable quantizer function is needed to make the algorithm work), a network can be trained on examples of what it should produce. The details of this algorithm, are rooted in the chain rule of Gottfried Leibniz in 1676 through a series of modern workers from around 1970 through about 1982. Frank Rosenblatt (see above) had talked about a “back-propagating error correction” in 1962, but did not know how to implement it. In any case, the linear threshold neurons, CNNs, and backpropagation are the basis of modern neural networks. After an additional 30 years of slow but steady progress they burst upon the scene as deep learning , and unexpectedly crushed many other approaches to computer vision — the research field of getting computers to interpret the contents of an image. Note that “deep” learning refers to there being lots of layers (around 12 layers in 2012) of linear threshold neurons rather than the smaller number of layers (typically two or three) that had been used previously. Now LLMs are built on top of these sorts of networks with many more layers, and many subnetworks.  This is what got everyone excited about Artificial Intelligence, after 65 years of constant development of the field. Despite their successes with language, LLMs come with some serious problems of a purely implementation nature. First , the amount of examples that need to be shown to a network to learn to be facile in language takes up enormous amounts of computation, so the that costs of training new versions of such networks is now measured in the billions of dollars, consuming an amount of electrical power that requires major new investments in electrical generation, and the building of massive data centers full of millions of the most expensive CPU/GPU chips available. Second , the number of adjustable weights shown in the figure are counted in the hundreds of billions meaning they occupy over a terabyte of storage. RAM that is that big is incredibly expensive, so the models can not be used on phones or even lower cost embedded chips in edge devices, such as point of sale terminals or robots. These two drawbacks mean there is an incredible financial incentive to invent replacements for each of (1) our humble single neuron models that are close to seventy  years old, (2) the way they are organized into networks, and (3) the learning methods that are used. That is why I predict that there will be lots of explorations of new methods to replace our current neural computing mechanisms. They have already started and next year I will summarize some of them. The economic argument for them is compelling. How long they will take to move from initial laboratory explorations to viable scalable solutions is much longer than everyone assumes. My prediction is there will be lots of interesting demonstrations but that ten years is too small a time period for a clear winner to emerge. And it will take much much longer for the current approaches to be displaced. But plenty of researchers will be hungry to do so. LLMs that can explain which data led to what outputs will be key to non annoying/dangerous/stupid deployments. They will be surrounded by lots of mechanism to keep them boxed in, and those mechanisms, not yet invented for most applications, will be where the arms races occur. The one thing we have all learned, or should have learned, is that the underlying mechanism for Large Language Models does not answer questions directly. Instead, it gives something that sounds like an answer to the question. That is very different from saying something that is accurate. What they have learned is not facts about the world but instead a probability distribution of what word is most likely to come next given the question and the words so far produced in response. Thus the results of using them, uncaged, is lots and lots of confabulations that sound like real things, whether they are or not. We have seen all sorts of stories about lawyers using LLMs to write their briefs, judges using them to write their opinions, where the LLMs have simply made up precedents and fake citations (that sound plausible) for those precedents. And there are lesser offenses that are still annoying but time consuming. The first time I used ChatGPT was when I was retargeting the backend of a dynamic compiler that I had used on half a dozen architectures and operating systems over a thirty year period, and wanted to move it to the then new Apple M1 chips. The old methods of changing a chunk of freshly compiled binary from data as it was spit out by the compiler, into executable program, no longer worked, deliberately so as part of Apple’s improved security measures. ChatGPT gave me detailed instructions on what library calls to use, what their arguments were, etc. The names looked completely consistent with other calls I knew within the Apple OS interfaces. When I tried to use them from C, the C linker complained they didn’t exist. And then when I asked ChatGPT to show me the documentation it groveled that indeed they did not exist and apologized. So we all know we need guard rails around LLMs to make them useful, and that is where there will be lot of action over the next ten years. They can not be simply released into the wild as they come straight from training. This is where the real action is now. More training doesn’t make things better necessarily. Boxing things in does. Already we see companies trying to add explainability to what LLMs say. Google’s Gemini now gives real citations with links, so that human users can oversee what they are being fed. Likewise, many companies are trying to box in what their LLMs can say and do. Those that can control their LLMs will be able to deliver useable product. A great example of this is the rapid evolution of coding assistants over the last year or so. These are specialized LLMs that do not give the same sort of grief to coders that I experienced when I first tried to use generic ChatGPT to help me. Peter Norvig, former chief scientist of Google, has recently produced a great report on his explorations of the new offerings. Real progress has been made in this high impact, but narrow use field. New companies will become specialists in providing this sort of boxing in and control of LLMs. I had seen an ad on a Muni bus in San Francisco for one such company, but it was too fleeting to get a photo. Then I stumbled upon  this tweet that has three such photos of different ads from the same company, and here is one of them: The four slogans on the three buses in the tweet are:  Get your AI to behave, When your AI goes off leash ,  Get your AI to work , and  Evaluate, monitor, and guardrail your AI . And “the AI” is depicted as a little devil of sorts that needs to be made to behave. This is one of my three traditional sections where I update one of my three initial tables of prediction from   my predictions  exactly eight years ago today. In this section I talk about self driving cars, driverless taxi services, and what that means, my own use of driverless taxi services in the previous year, adoption of electric vehicles in the US, and flying cars and taxis, and what those terms mean. No entries in the table specifically involve 2025 or 2026, and the status of  predictions that are further out in time remain the same. I have only put in one new comment, about how many cities in the US will have self-driving (sort of) taxi services in 2026 and that comment is highlighted, A Brief Recap of what “Self Driving” Cars Means and Meant This is a much abridged and updated version of what I wrote exactly one year ago today. The definition, or common understanding, of what self driving cars  really means has changed since my post on predictions eight years ago.  At that time self driving cars meant that the cars would drive themselves to wherever they were told to go with no further human control inputs. It was implicit that it meant level 4 driving. Note that there is also a higher level of autonomy, level 5, that is defined. Note that in the second row of content, it says that there will be no need for a human to take over for either level 4 or level 5. For level 4 there may be pre-conditions on weather and within a supported geographic area. Level 5 eliminates pre-conditions and geographic constraints. So far no one is claiming to have level 5. However the robot taxi services such as Cruise (now defunct), Waymo, currently operating in five US cities, and Zoox, currently operating in two cities with limited service (Las Vegas and San Francisco), all relied, or rely, on having remote humans who the car can call on to help get them out of situations they cannot handle. That is not what level 4 promises. To an outside observer it looks like level 4, but it is somewhat less than that in reality. This is not the same as a driver putting their hands back on the steering wheel in real time, but it does mean that there is sometimes a remote human giving high level commands to the car. The companies do not advertise how often this happens, but it is believed to be every few miles of driving. The Tesla self driving taxis in Austin have a human in the passenger seat to intervene when there is a safety concern. One of the motivations for self driving cars was that the economics of taxis, cars that people hire at any time for a short ride of a few miles from where they are to somewhere else of their choosing, would be radically different as there would be no driver. Systems which do require remote operations assistance to get full reliability cut into that economic advantage and have a higher burden on their ROI calculations to make a business case for their adoption and therefore their time horizon to scaling across geographies. Actual self-driving is now generally accepted to be much harder than every one believed . As a reminder of how strong the hype was and the certainty of promises that it was just around the corner here is a snapshot of a whole bunch of predictions by major executives from 2017. I have shown this many times before but there are three new annotations here for 2025 in the lines marked by a little red car. The years in parentheses are when the predictions were made. The years in blue are the predicted years of achievement. When a blue year is shaded pink it means that it did not come to pass by then. The predictions with orange arrows are those that I had noticed had later been retracted. It is important to note that every prediction that said something would happen by a year up to and including 2025 did not come to pass by that year.  In fact none of those have even come to pass by today. NONE . Eighteen of the twenty predictions were about things that were supposed to have happened by now, some as long as seven years ago. NONE of them have happened yet. My Own Experiences with Waymo in 2025 I took two dozen rides with Waymo in San Francisco this year. There is still a longer wait than for an Uber at most times, at least for where I want to go. My continued gripe with Waymo is that it selects where to pick me up, and it rarely drops me right at my house — but without any indication of when it is going to choose some other drop off location for me. The other interaction I had was in early November when I felt like I was playing bull fighter, on foot, to a Waymo vehicle.  My house is on a very steep hill in San Francisco, with parallel parking on one side and ninety degree parking on the other side. It is rare that two cars can pass each other traveling in opposite directions without one having to pull over into some empty space somewhere. In this incident I was having a multi-hundred pound pallet of material deliverd to my home. There was a very big Fedex truck parked right in front of my house, facing uphill, and the driver/operator was using a manual pallet jack to get it onto the back lift gate, but the load was nine feet long so it hung out past the boundary of the truck. An unoccupied Waymo came down the hill and was about to try to squeeze past the truck on that side. Perhaps it would have made it through if there was no hanging load. So I ran up to just above the truck on the slope and tried to get the Waymo to back up by walking straight at it. Eventually it backed up and pulled in a little bit and sat still. Within a minute it tried again. I pushed it back with my presence again. Then a third time. Let’s be clear it would have been a dangerous situation if it had done what it was trying to do and could have injured the Fedex driver who it had not seen at all. But any human driver would have figured out what was going on and that the Fedex truck would never go down the hill backwards but would eventually drive up the hill. Any human driver would have replanned and turned around. After the third encounter the Waymo stayed still for a while. Then it came to life and turned towards the upwards direction, and when it was at about a 45 degree angle to the upward line of travel it stopped for a few seconds. Then it started again and headed up and away.  I infer that eventually the car had called for human help, and when the human got to it, they directed it where on the road to go to (probably with a mouse click interface) and once it got there it paused and replanned and then headed in the appropriate direction that the human had made it already face. Self Driving Taxi Services There have been three self driving taxi services in the US in various stages of play over the last handful of years, though it turns out, as pointed out above that all of them have remote operators. They are Waymo, Cruise, and Zoox. Cruise died in both 2023 and 2024, and is now dead, deceased, an ex self driving taxi service. Gone. I see its old cars driving around the SF Bay Area, with their orange paint removed, and with humans in the driver seat. On the left below are two photos I took on May 30th at a recharge station. “Birdie” looked just like an old Cruise self driving taxi, but without an orange paint. I hunted around around in online stories about Cruise and soon found another “Birdie”, with orange paint, and the same license plate. So GM are using them to gather data, perhaps for training their level 3 driving systems. Tesla announced to much hoopla that they were starting a self driving taxi service this year, in Austin.  It  requires a safety person to be sitting in the front passenger seat  at all times. Under the certification with which they operate, on occasion that front seat person is required to move to the driver’s seat. Then it just becomes a regular Tesla with a person driving it and FSD enabled. The original fleet was just 30 vehicles, with at least seven accidents reported by Tesla by October, even with the front seat Tesla person. In October the CEO announced that the service would expand to 500 vehicles in Austin in 2025. By November he had changed to saying they would double the fleet.  That makes 60 vehicles. I have no information that it actually happened. He also said he wanted to expand the “Robotaxi” service to Phoenix, San Francisco, Miami, Las Vegas, Dallas, and Houston by the end of 2025. It appears that Tesla can not get permits to run even supervised (mirroring the Austin deployment) in any of those cities. And no, they are not operating in any of those cities and now 2025 has reached its end. In mid-December there were confusing reports saying that Tesla now had Model Y’s driving in Austin without a human safety monitor on board  but that the Robotaxi service for paying customers (who are still people vetted by Tesla) resumed their human safety monitors. So that is about three or four years behind Waymo in San Francisco, and not at all at scale. The CEO of Tesla has also announced (there are lots of announcements and they are often very inconsistent…) that actually the self driving taxis will be a new model with no steering wheel nor other driver controls. So they are years away from any realistic deployment. I will not be surprised if it never happens as the lure of humanoids completely distracts the CEO.  If driving with three controls, (1) steering angle of the front wheels, (2) engine torque (on a plus minus continuum), and (3) brake pedal pressure, are too hard to make actually work safely for real, how hard can it be to have a program control a heavy unstable balancing platform with around 80 joints in hips and waist, two legs, two arms and five articulated fingers on each hand? Meanwhile Waymo had  raised $5.6B to expand to new cities in 2025 . It already operated in parts of San Francisco, Los Angeles, and Phoenix. During 2025 it expanded to Austin and Atlanta, the cities it had promised. It also increased its geographic reach in its existing cities and surrounding metropolitan areas.  In the original three cities users have a Waymo app on their phone and specifically summon a Waymo. In the new cities however they used a slightly different playbook. In both Austin and Atlanta people use their standard Uber app.  They can update their preference to say that they prefer to get a Waymo rather than a human driven car, but there is no guarantee that a Waymo is what they will get. And any regular user of the Uber app in those cities may be offered a Waymo, but they do get an option to decline and to continue to wait for a human driven offer. In the San Francisco area, beyond the city itself, Waymo first expanded by operating in Palo Alto, in a geographically separate area. Throughout the year one could see human operated Waymos driving in locations all along the peninsula from San Francisco to Palo Alto and further south to San Jose. By November Waymo had announced driverless operations throughout that complete corridor,  an area of 260 square miles, but not quite yet on the freeways–the Waymos are operating on specific stretches of both 101 and 280, but only for customers who have specifically signed up for that possibility. Waymo is now also promising to operate at the two airports, San Jose and San Francisco. The San Jose airport came first, and San Francisco airport is operating in an experimental mode with a human in the front seat. Waymo has announced that it will expand to five more cities in the US during 2026; Miami, Dallas, Houston, San Antonio, and Orlando. It seems likely, given their step by step process, and their track record of meeting their promises that Waymo has a good shot at getting operations running in these five cities, doubling their total number of US cities to 10. Note that although it does very occasionally snow in five of these ten cities (Atlanta, Austin, Houston, San Antonio, and Orlando) it is usually only a dusting. It is not yet clear whether Waymo will operate when it does snow. It does not snow in the other five cities, and in San Francisco Waymo is building to be a critical part of the transportation infrastructure. How well that would work if a self driving taxi service was subject to tighter restrictions than human driven services due to weather could turn into a logistical nightmare for the cities themselves. In the early days of Cruise they did shut down whenever there was a hint of fog in San Francisco, and that is a common occurrence. It was annoying for me, but Cruise never reached the footprint size in San Francisco that Waymo now enjoys. No promises yet from Waymo about when it might start operating in cities that do commonly have significant snow accumulations. In May of 2025 Waymo announced a bunch of things in one press release . First, that they had 1,500 Jaguar-based vehicles at that time, operating in San Francisco, Los Angeles, Phoenix, and Austin. Second, that they were no longer taking deliveries of any more Jaguars from Jaguar, but that they were now building two thousand  of their own Jaguars in conjunction with Magna (a tier one auto supplier that also builds small run models of big brands — e.g., they build all the Mini Coopers that BMW sells) in Mesa, Arizona. Third, that they would also start building, in late 2025, versions of the Zeekr RT, a vehicle that they co-designed with Chinese company Geely, that can be built with no steering wheel or other controls for humans, but with sensor systems that are self-cleaning. It is hard to track exactly how many Waymos are deployed, but in August 2025, this website , citing various public disclosures by Waymo, put together the following estimates for the five cities in which Waymo was operating. No doubt those numbers have increased by now.  Meanwhile Waymo has annualized revenues of about $350M and is considering an IPO with a valuation of around $100B.  With numbers like those it can probably raise significant growth capital independently from its parent company. The other self driving taxi system deployed in the US is Zoox  which is currently operating only in small geographical locations within Las Vegas and San Francisco. Their deployment vehicles have no steering wheel or other driver controls–they have been in production for many years. I do notice, by direct observation as I drive and walk around San Francisco, that Zoox has recently enlarged the geographic areas where its driverful vehicles operate, collecting data across all neighborhoods. So far the rides are free on Zoox, but only for people who have gone through an application process with the company. Zoox is following a pattern established by both Cruise and Waymo. It is roughly four years behind Cruise and two years behind Waymo, though it is not clear that it has the capital available to scale as quickly as either of them. All three companies that have deployed actual uncrewed self driving taxi services in the US have been partially or fully owned by large corporations. GM owned Cruise, Waymo is partially spun out of Google/Alphabet, and Zoox is owned by Amazon. Cruise failed. If any other company wants to compete with Waymo or Zoox, even in cities where they do not operate, it is going to need a lot of capital. Waymo and Zoox are out in front. If one or both of them fail, or lose traction and fail to grow, and grow very fast, it will be near to impossible for other companies to raise the necessary capital. So it is up to Waymo and Zoox.  Otherwise, no matter how well the technology works, the dream of  driverless taxis is going to be shelved for many years. Electric Cars In my original predictions I said that electric car (and I meant battery electric, not hybrids) sales would reach 30% of the US total no earlier than 2027.  A bunch of people on twitter claimed I was a pessimist. Now it looks like I was an extreme optimist as it is going to take a real growth spurt to reach even 10% in 2026, i.e., earlier than 2027. Here is  the report   that I use to track EV sales — it is updated every few weeks. In this table I have collected the quarterly numbers that are finalized. The bottom row is the percentage of new car sales that were battery electric. Although late in 2024 EV sales were pushing up into the high eight percentage points they have dropped back into the sevens this year in the first half of the year. Then they picked up to 10.5% in the third quarter of 2025, but that jump was expected as the Federal electric vehicle (EV) tax credits ended for all new and used vehicles purchased after  September 30, 2025 , as part of the “One Big Beautiful Bill Act”.   People bought earlier than they might have in order to get that tax credit, so the industry is expecting quite a slump in the fourth quarter, but it will be a couple more months before the sales figures are all in.  YTD 2025 is still under 8.5%, and is likely to end at under 8%. The trends just do not look like we will get to EVs reaching 12% of US cars being sold in 2027, even with a huge uptick. 30% is just not going to happen. As for which brands are doing better than others, Tesla’s sales dropped a lot more than the rest of the market. Brand winners were GM, Hyundai, and Volkswagen. The US experience is not necessarily the experience across the world. For instance Norway reached 89% fully electric vehicles of all sold in 2024, largely due to taxes on gasoline powered car purchases. But that is a social choice of the people of Norway, not at all driven by oil availability. With a population of 5.6 million compared to the US with 348 million, and domestic oil production of 2.1 million barrels per day, compared to the US with 13.4 million b/d, Norway has a per capita advantage of almost ten times as much oil per person (9.7 to be more precise). Electrification levels of cars is a choice that a country makes. Flying Cars The next two paragraphs are reproduced from last’s years scorecard. Flying cars are another category where the definitions have changed. Back when I made my predictions it meant a vehicle that could both drive on roads and fly through the air.  Now it has come to mean an electric multi-rotor helicopter than can operate like a taxi between various fixed landing locations. Often touted are versions that have no human pilot. These are known as eVTOLs, for “electric vertical take off & landing”. Large valuations have been given to start ups who make nice videos of their electric air taxis flying about. But on inspection one sees that they don’t have people in them. Often, you might notice, even those flights are completely over water rather than land. I wrote about the lack of videos of viable prototypes back in November 2022. The 2022 post referred to in the last sentence was trying to make sense of a story about a German company, Volocoptor, receiving a $352M Series E investment. The report from pitchbook predicted world wide $1.5B in revenue in the eVTOL taxi service market for 2025.  I was bewildered as I could not find a single video, as of the end of 2022, of a demo of an actual flight profile with actual people in an actual eVTOL of the sort of flights that the story claimed would be generating that revenue in just 3 years. I still can’t find such a video. And the actual revenue for actual flights in 2025 turned out to be $0.0B (and there are no rounding errors there — it was $0) and Volocoptor has gone into receivership , with a “reorganization success” in March 2025. In my November 2022 blog post above I talked about another company, Lilium, which came the closest to having a video of a real flight, but it was far short of carrying people and it did not fly as high as is needed for air taxi service. At the time Lilium had 800 employees.  Since then Lilium has declared bankruptcy not once  (December 2024), but twice  (February 2025), after the employees had been working for some time without pay. But do not fear. There are other companies on the very edge of succeeding. Oh, and an edge means that sometimes you might fall off of it. Here is an interesting report on the two leading US eVTOL companies, Archer and Joby Aviation, both aiming at the uncrewed taxi service market; both with valuations in the billions, and both missing just one thing. A for real live working prototype. The story focuses on a pivotal point, the moment when an eVTOL craft has risen vertically, and now needs to transition to forward motion. In particular it points out that Archer has never demonstrated that transition, even with a pilot onboard, and during 2025 they cancelled three scheduled demonstrations at three different air shows. They did get some revenue in 2025 by selling a service forward to the city of Abu Dhabi, but zero revenue for actual operations–they have no actual operations.  They promise that for this year, 2026, with revenue producing flights in the second half of the year. Joby Aviation did manage to demonstrate the transition maneuver in April of 2025. And in November they made a point to point flight in Dubai, i.e., their test vehicle managed to take off somewhere and land at a different place. The fact that there were press releases for these two human piloted pretty basic capabilities for an air taxi service suggests to me that they are still years away from doing anything that is an actual taxi service (and with three announced designated place to land and take off from it seems more like a rail network with three stations rather than a taxi service–again slippery definitions do indeed slip and slide). And many more years away from a profitable service. But perhaps it is naive of me to think that a profitable business is the goal. As with many such technology demonstrators the actual business model seems to be getting cities to spend lots of money on a Kabuki theater technology show, to give credit to the city as being technology forward. Investors, meanwhile invest in the air taxi company thinking it is going to be a real transportation business. But what about personal transport that you own, not an eVTOL taxi service at all,but an eVTOL that you can individually own, hop into whenever you want and fly it anywhere? In October there was a story in the Wall Street Journal: “ I Test Drove a Flying Car. Get Ready, They’re Here. ” The author of the story spent three days training to be the safety person in a one seat Pivotal Helix (taking orders at  $190,000 a piece, though not yet actually delivering them; also take a look at how the vehicles lurch as they go through the pilot commanded transition maneuver). It is a one seater so the only person in the vehicle has to be the safety person in case something fails. He reports: After three hellish days in a drooling, Dramamine-induced coma, I failed my check ride. The next month he tried again. This time he had  a prescription for the anti-emetic Zofran and a surplus-store flight suit . The flight suit was to collect his vomit and save his clothes.  After four more days of training (that is seven total days of training), he qualified and finally took his first flight, and mercifully did not live up to his call sign of “Upchuck Yeager”.  $\190,000 to buy the plane, train for seven days, vomit wildly, have to dress in a flight suit, and be restricted to take off and landing and only fly over privately owned agricultural land or water. This is not a consumer product, and this is not a flying car that is here, despite the true believer headline. Two years ago I ended my review of flying cars with: Don’t hold your breath. They are not here. They are not coming soon. Last year I ended my review with: Nothing has changed. Billions of dollars have been spent on this fantasy of personal flying cars.  It is just that, a fantasy, largely fueled by spending by billionaires. There are a lot of people spending money from all the investments in these companies, and it is a real dream that they want to succeed for many of them. But it is not happening, even at a tiny scale, anytime soon. We are peak popular hype in all of robotics, AI, and machine learning. In January 1976, exactly fifty years ago, I started work on a Masters in machine learning. I have seen a lot of hype and crash cycles in all aspects of AI and robotics, but this time around is the craziest.  Perhaps it is the algorithms themselves that are running all our social media that have contributed to this. But it does not mean that the hype is justified, or that the results over the next decade will pay back the massive investments that are going in to AI and robotics right now. The current hype is about two particular technologies, with the assumption that these particular technologies are going to deliver on all the competencies we might ever want.  This has been the mode of all the hype cycles that I have witnessed in these last fifty years. One of the current darling technologies is large X models for many values of X (including VLMs and VLAs), largely, at the moment, using massive data sets, and transformers as their context and sequencing method. The other, isn’t even really a technology, but just a dream of a form of a technology and that  is robots with humanoid form. I have now put these two things in my five topics of my new predictions shared at the beginning of this post and will talk about them explicitly for each of the next ten years. Back in 2018 I did not talk about either of these technologies in my predictions, but rather talked about competences and capabilities.  I fear that I may have been overly optimistic about many of these and in the table below I point out that my predicted time of arrival has now come, but the capabilities or competencies have not.  I’m sure that many true believers in the two technologies mentioned above will have very short time scales on when they say this will be achieved. I pre-emptively disagree with them. Capabilities and Competences The predictions that are commented upon in the table above are all about when we would see robots and AI systems doing some things that simple creatures can do and others that any child of age nine or perhaps less can do without any difficulty. Even children aged three or four can navigate around cluttered houses without damaging them (that is different from when they may  want to damage them). They can get up and down single stairs, and even full stair cases on two legs without stumbling (or resorting to four limb walking as a two year old might). By age four they can open doors with door handles and mechanisms they have never seen before, and safely close those doors behind them. They can do this when they enter a particular house for the first time. They can wander around and up and down and find their way. One of the many promises about humanoid robots is that they too will be able to do this. But that is not what they can do today. But wait, you say, “I’ve seem them dance and somersault, and even bounce off walls.” Yes, you have seen humanoid robot theater. All those things are done on hard surfaces, and anything specific beyond walking has been practiced and optimized by reinforcement learning, for exactly the situation of the floors and walls as they are. There is no real-time sensing and no ability to wander in previously unseen environments, especially not those with slipping hazards such as towels or sheets of cardboard on the floor.  Children can do so easily.  While four legged robots are much better at it than humanoid robots, they are wider than people, and still have significant foot slipping problems, and cannot open random doors themselves as children can. A nine year old child can pretty much do any task (but with less weighty packages) than any delivery driver can do.  That includes climbing out of a van, walking up and down slopes, going up and down previously unseen external staircases, sometimes ending in a dark porch or vestibule area, then putting the package on the ground, or putting it into a drop bin after grasping and pulling on the handle — again never having encountered that particular design of bin and handle. All this can be done immediately upon seeing the scene for the first time. We have not seen anything remotely like that in a lab demo for robots, despite my hope from eight years ago that by now such would have been demonstrated. And again, here a four legged robot might be able to do the walking and stair climbing, but it won’t be able to manipulate the package. Also note that humans doing these tasks don’t just carry single packages out in front of them with two outstretched arms, but often use their elbows, their hips, and their bellies to support multiple packages as they locomote. Elder care is a commonly quote target market for robots, and with good reason given the current and growing demographic inversions in much of the world. There are far fewer younger people relative to the number of older people than there have been historically, and so less people to provide elder care.  In providing care to the very elderly, there is a need to support those people physically, both passively, providing compliant support for them to lean on, and actively, getting people into and out of bed, into and out of bathtubs or shower enclosures, and getting people onto and off of toilets. And sometimes wiping their bums. There are no force sensing and control capabilities on any of today’s robots which are remotely capable of doing any of these sorts of things safely and comfortably. And machine learning is not going to provide those capabilities. There are many fundamental design, materials, and engineering problems to solve to make these things possible.  A bitter lesson, perhaps, for those who think that more data will solve everything. But the other unresolved capability that I have in my predictions table above is an agent that understands the world in an ongoing way as we all understand it.  That includes knowing what to expect to be the same as it was yesterday, and will be tomorrow, and what has changed about the world since yesterday or is likely to change today or tomorrow. Such an understanding of the world will be important for any deployable systems that can take care of real and vulnerable humans, including the elderly. And the young. And the rest of us. In summary, I thought that more progress would be made on many of these problems than has been achieved over the last eight years. That lack of progress is going to have real, and negative, impact on the quality of life of the newly elderly for the next couple of decades. Ouch! VCs, please take note: there are real pulls on having technologies that can help the elderly, and being in there first with something that can actually deliver value in the next three to five years will be a come with a very large upside. World Models Lots of people are talking about world models and their importance, as add ons to LLMs, as mechanisms for agentic AI to exploit, and for allowing robots to do real tasks. These aspirations are probably reasonable to have, and successfully working on them can have real impacts. Unfortunately the talkers are not the doers, and not the deployers, and not the people who have to solve real problems. And so they all have different, and convenient for themselves, understandings of what world models are.  That, along with the worship of big data and the belief that machine learning will solve all problems means we have a big mess, with people jumping to “solutions” before they understand the problems. Some people are even claiming that they will build world models by learning them from having agents play video games.  But how do those video games work? They have a coded geometry-based world model, with a little physics engine. It is already built!  Using machine learning (and tens of millions of dollars) to extract it rather than just looking at the source code (and perhaps buying or licensing that code) is just wacky. Expect more confusion and lots and lots of reinvention. This fever has quite a ways to go before today’s memes and slogans get replaced by the next generation of memes and slogans, with perhaps some good work coming out in a rational interregnum. We can hope. Situatedness vs Embodiment One of the new things that people are getting excited about is Embodied Intelligence .  I agree that it is worth being excited about, as it is what I have spent the last forty years work on.  It is certainly about robots being in the world. But since 1991 I have made a distinction between two concepts where a machine, or creature can be either, neither, or both situated and embodied . Here are the exact definitions that I wrote for these back then: [Situatedness] The robots are situated in the world—they do not deal with abstract descriptions, but with the here and now of the world directly in-fluencing the behavior of the system. [Embodiment] The robots have bodies and experience the world directly—their actions are part of a dynamic with the world and have immediate feed-back on their own sensations. At first glance they might seem very similar.  And they are, but they are also importantly different. And, spoiler alert, I think much of the work at companies, large and small, right now, is trying abstract out the embodiment of a robot, turning it into a machine that is merely situated. An algorithm, written as code, to find the greatest common divisor of two numbers, when running, is neither situated nor embodied. A robot that is thrown into the air with just an inertial measurement unit (IMU) as its sensor that moves its limbs about to zero out rotations and then is caught by a net is embodied but not situated. A robot that has a physical face that can make expressions with it, a voice synthesizer, cameras, and microphones and that can talk to a person giving appropriate responses both with its choice of words and with appropriate prosody and facial expressions, to some purpose and in response to how the person talks and moves, is situated but not really embodied. Embodied in its presence yes, but not embodied in any physical interactions with its environment. A robot that can roll around without hitting stationary objects, wherever they are, nor hitting moving people or other vehicles, that can go to a location specified by a warehouse management system, that responds safely to people grabbing it anywhere, and can give a person who grabs its control handle agency over it going wherever the person pushes it with a light touch no matter how much weight it is currently carrying, is both embodied and situated. [And yes, this is what our Carter robots do at Robust.AI .] These are just point examples of the four classes of entities that come from having or not having the two properties of situatedness and embodiment. Real robots that do real work in dynamic human occupied environments must be both situated and embodied. For instance, a robot that is to help with in home elder care needs to be aware of the situation in the world in order to know what to do to help the person.  It needs to be able to open doors with different handles and latching mechanisms, and then control the inertia of the closing door so that the environment is both safe and quiet for the person. The robot needs to be able to accommodate the person reaching for it dynamically, looking for support that so that they don’t fall. The robot needs to able to take things handed to it by the person, and pass things to the person in a way which is both safe and makes it easy for the person to grasp. Etc., etc. In short the robot needs to control forces and inertias in the world and to be responsive to them, at that same time as it is acting in a way that can be understood as sentient. Being both situated and embodied is still a challenge to robots in the world.   [[Now here is the most important sentence of this whole blog post.]] I think the training regimes that being used for both locomotion and dexterity are either ignoring or trying to zero out the embodiment of physical robots, their inertias and forces, reducing them to merely being situated, just apps with legs and arms, characters in video games, not the reality of real physical beings that the tasks we want them to do requires. Dexterous Hands I talked about the challenges for dexterity earlier this year. In the table above I have a new comment this year saying that there has been improvement in the dexterity of suction based grippers but not for articulated grippers. Suction grippers have plastic suction cups which themselves are compliant. Under the force of the suction they can change shape, to a degree, to accommodate unknown shapes in the thing being grasped (sucked up to).  They also allow for a little bit of torsional rotation about the axis of sucking and a bit of rocking of the suction cup in the two degrees of freedom in the plane orthogonal to the suction axis. While suction cups have evolved to better pick things up and so are common for handling packaged goods, the companies that package materials to be shipped through automated systems choose versions of plastics for bags that won’t be sheared open by the suction pulling against outer parts of such cups. The result is that the control of the embodied action of grasping can become much more a simply situated action. Once the pick orientation and vacuum gripper selection has been made it is really an open loop as all the work is done by the indiscriminate force of suction and the mutual compliance of the gripper and the grippee. Above I had argued against do this with a general purpose humanoid hand. It makes no sense there as the adaptability of the hand is its most precious attribute. But here in a special purpose hand, a suction gripper, it actually simplifies things within the specialization of task, and here a purely situated hand may make sense. And it may be possible to train it with purely visual data. So what does this tell us?  It says that there is plenty of room for mechanical design, and simpler computational embodied control for all sorts of grippers and things in the world that need to be gripped. The end of Moore’s Law, at least the version that said we could reduce feature size on silicon by a factor two every year, opened up a new golden era of chip design. The winners (through early luck and then dogged determination), matched untraditional designs to new problems (machine learning) and achieved speedups (and corporate valuations) that were unheard of. In the last 10 years we have moved from general purpose silicon to special purpose silicon for our most high volume computations.  That was not on most people’s expectation list twenty years ago. So too today, with stalled capabilities from full human hand emulation efforts through machine learning from visual observation, there is a rich array of more specialized manipulation tasks where special purpose grippers, clever interplay of materials and force applications, geometric planning, specialized sensing, and maybe even some machine learning may lead to enormous application markets. For instance, a specialized robot body, hands (of some sort), arms, and support limbs or wheels that can safely manipulate an elderly human could have enormous impact on elder care around the world. A single human care-giver along with one human-manipulator robot could provide a lot more care for a frail elderly person than the care-giver alone could do. Special purpose manipulators for fruits, or for some range of small mechanical parts, or clothing, could each open enormous markets for automation in particular handling tasks for each of them. And countless other specialities. Economic pull is out there.  Being the smart academic researcher, entrepreneur, or technology investor, may lead to enormous new types of deployable automation. The new dexterity may turn out to be special purpose. And eventually we may come to understand that just because the hands we know best happen to be our own, does not mean that our own hands are the best for the majority of tasks in our human world. Humanoid romanticism may not be our future after all. Looking at the missions and numbers over the last three years it appears that human spaceflight is at a steady plateau, with, by the way, far fewer people going into orbit that in the time of the Space Shuttle.  Underneath though, there is a lot of churn, a likely new player, and the return of humans to lunar distances for the first time in 54 years. Below is the updated scoring of my 2018 predictions for human spaceflight. There are six new comments in this table, but no new specific calling of predicted dates as right or wrong. It is now clear to me that I was way too optimistic in regard to my predictions for Mars, even though I was wildly out of step and much more pessimistic then the predictions coming out of SpaceX eight years ago. Given how slow things have turned out trying to land people on the Moon, the hoped for crewed colony on the Moon (think of it as ISS (International Space Station) on the lunar surface) may well slip to what I had predicted for Mars.  Mars is going to take much longer than the Moon. Following the table there are the detailed numbers and trends on both orbital crewed flights, and suborbital crewed flights. Things will change from stasis in 2026.  A crewed flight to the Moon is scheduled to happen in a matter of weeks, with the vehicle already stacked, now.  And suborbital crewed flights may possibly have quite an uptick in 2026.  Following those two sections I have more on Boeing’s Starliner, SpaceX’ Starship, and Blue Origin’s New Glenn, NASA and the Moon, and what is going to happen with space stations given the scheduled end of life of the ISS in 2030. Orbital Crewed Flights In both 2024 and 2025 the US put 16 people into orbit and Russian and China put 6 people each into orbit; 28 people total went to orbit in each year. We have gone from a historical low of only eight people going to orbit in 2020 to a steady-ish state of roughly 28 people per year now. That may jump up to over 30 people in 2026 because of the additional Artemis II flight to the Moon, following checkout in LEO (Low Earth Orbit).  But even with that bump there may be other pressures which keep it from rising above the high twenties for 2026 We are certainly not seeing steady growth in the number of humans getting launched to orbit, and the numbers are significantly lower than the hey days of Shuttle launches in the nineties and early two thousands. There is no growth trend visible, and the long promised exponential growth of people going to orbital space has not even made a brief guest appearance. Here is a more detailed history for the last six years where the first line in each box says how many crewed launches of the particular vehicle there were, and the second line, in square brackets says how many people, total, were onboard those flights. Wherever there are three numbers separated by forward slashes you have to sum the numbers to get the total. The three countries with current crewed orbital launch capabilities are the US, Russia, and China. All Chinese flights are state astronauts (or taikonauts) and all of them go to the Chinese space station. And there are no tourists, so far, on Chinese flights, so we just have single numbers for both launches and people. All the state astronauts for both the US and Russia go to the International Space Station (ISS), but a state player (in Russia) and a non-state player in the US (SpaceX) have also launched tourist flights in the last six years. So for those two countries we have three numbers separated by slashes for both launches and people. The first of the three numbers refers to purely state launches to the ISS (note that the US and Russia both launch each others state astronauts to the ISS so that both countries have astronauts up-to-date trained on the other’s launch systems, in case of emergencies arising at some point). The second number in the triples is space tourists whose destinations have also been the ISS, while the third number (for both launches and people) is for tourist flights that have been independent of going to the ISS — there have been a total of three of these, all launched by SpaceX. Two of those three flights were purchased personally by Jared Issacman, who has now been sworn in as the NASA administrator just two weeks ago. The one year in the last six where Russia has launched space tourists (after being the leaders in this endeavor early in the century) was 2021, where two flights of Soyuz to the ISS had one regular state cosmonaut and two space tourists. And, there was one slightly wobbly other launch of a Soyuz in 2024, not called out in the table, where a flight attendant from the state airline of Belarus was sent as a cosmonaut from that country to the ISS on a Russian Soyuz. That was most likely an event orchestrated by Russia to keep support from Belarus for their war against Ukraine. Ugly. The term tourist needs some explanation. The people (as with suborbital Blue Origin flights) are a mixture of private people paying the experience (or having some other individual pay for them) or they are astronauts from countries that do not have their own launch capability. In the case of the three tourist flights to the ISS on a SpaceX Dragon, all have been paid for by the company Axiom, with a former NASA astronaut in command. The three others on each of those flights are people in the fledgling astronaut program of other countries who have paid Axiom for the seats. Axiom has commercial relationships with both SpaceX and NASA for the use of the Flacon 9 launch vehicle, the Dragon craft and use fee of the ISS. Suborbital Crewed Flights Virgin Galactic is on a multi-year hiatus on flights as they develop new flight vehicles, but they may well fly again in 2026. Thus, for the last year, only Blue Origin has been launching  tourists (again a mixture of private individuals and astronauts from other countries that have not yet developed their own crewed launch capability, but may be aiming at doing so) suborbital flights. Blue Origin also sells uncrewed launches for experiments that need to be exposed to the environment of space and/or operation in microgravity, if only for a few minutes. In 2025 Blue Origin had seven launches each with six people on board. Previously they had had three crewed launches in each of 2021, 2022, and 2024, each with six people on board, with a hiatus in 2023. Blue Origin has been quite careful with forward projections for both suborbital and orbital flights, so when they say what they intend to do and when, they are likely to come close to achieving that promise. Recently they said that they are going to introduce three new flight vehicles starting in 2026 to run their suborbital flights, that they are looking at developing a second launch site, somewhere else than Texas, and that they believe they have the customer demand to support one flight per week. They do not disclose what they charge for the flights. Nor did they give any firm dates for reaching these goals. But I think it is likely that we will see a jump in the number of flights in 2026, In December of 2025 I was at an event centered on solar system orbital dynamics and met a sub-orbital tourist there. He has already paid for and flown above the Kármán line on Virgin Galactic. Now he has paid for a Blue Origin sub-orbital flight and is waiting for a launch assignment. There is definitely a market for these flights, it remains to be seen whether the prices and demand combine in a way that makes it profitable for seat suppliers to keep doing it. Boeing’s Starliner (not to be confused with the SpaceX Starship) When it was first announced, in 2010, Boeing’s Starliner was originally scheduled to fly a human test crew in 2018. It was supposed send the crew to the ISS, then it would be under contract to launch six crews to the ISS, much as SpaceX has already launched 11 regular crews to the ISS. In mid 2024 it delivered a human test crew to ISS,  Barry Wilmore and Sunita Williams, but after much analysis of anomalies it returned to Earth without them. NASA bumped two crew members from the next crew going on a SpaceX flight to the ISS to provide room for their return, on that SpaceX Dragon, which they did after an unexpected extra nine months on top of their originally scheduled week at the ISS. Last year in my yearly update I said: We do not know at this point, but I think it would not be a huge surprise if Starliner never flies again. It turns out it is going to fly again ! Including potentially twice in 2026. But there are some changes. The six missions which were contracted to take astronauts on regular assignment to the ISS were called Starliner-1 through Starliner-6 . The contract with NASA has been modified to make the last two flights future options rather than sure things. And Starliner-1 scheduled for the first half of 2026 will be un-crewed again. Then the three remaining flights in the modified contract would each take four astronauts on regular rotations to the ISS. There is one little hiccup. Sunita Williams is the only active astronaut, not committed to other current or upcoming missions, who has trained to fly on Starliner. She now has over 600 days in space and another six month mission to the ISS would take her over radiation exposure limits. SpaceX Falcon 9 I gave the statistics for Falcon 9 in the introduction, talking about what has surprised me in the last 8 years. When I made my predictions Falcon 9 had been launched 46 times over 8 years. Only five of those launches re-used a previously flown first stage, and only in the previous year had successful landings of the first stage become reliable. Now Falcon 9s are getting launched at a sustained rate of more than three per week, all attempts at landing boosters are successful, and typically each booster flies over 20 times. Just phenomenal unmatched reliability and performance. NASA, Artemis, and Returning to the Moon I am jumping ahead of Starship (SpaceX) and New Glenn (Blue Origin) to talk about NASA’s plan to get people back to the lunar surface, and perhaps setting up a more or less permanent outpost there. This is how the ISS has been continuously occupied for 25 years, rotating crew members in and out twice a year. (China’s space station follows the same model, but with only 3 occupants compared to 7 for ISS). 2026 promises to be a big year for humanity and the Moon. No one has been beyond low Earth orbit (LEO) since the Apollo 17 mission had three people go to lunar orbit and two of them landed in December 1972, fifty three years ago. In November 2022 the first launch of NASA’s SLS (Space Launch System) occurred taking its uncreewed Orion capsule in a looping orbit past the Moon and back. It approached the surface of the Moon in each direction, and then came back to Earth and splashed down. Note that this was the FIRST flight of both the multi-stage rocket, and the habitable capsule. It all worked FIRST time.  Everything was built by contractors, but it underwent NASA’s methodology to make sure things worked rather than failed. The first stage consists of a liquid fueled rocket using four RS-25 engines, the same as the three engines on the Space Shuttle. It also has two solid fuel boosters strapped on, larger versions of the Space Shuttle solid fuel boosters. The second stage is essentially an off the shelf stage from the past Delta program. There will be a third stage added for the fourth and subsequent flights.  This is a derivative vehicle, with a long history of successful use of its components. When Vice President Mike Pence announced the details of the program in 2019 the landing of people on the Moon was supposed to happen in 2024.  Things have slipped a little since then. The first crewed mission to the vicinity of the Moon (no landing) Artemis II had slipped to April 2026, but now it has been pulled forward to February 2026 (next month!), when a crew of four will spend over ten days in space on Artemis II in a flight to the Moon approaching to within 4,600 miles, then in a free return manner (no need to have working engines) they will head back towards Earth.  All their energy will be removed by heat shields hitting the Earth’s atmosphere and then by the use of 11 parachutes, finally splashing down in the ocean. Note that on all 9 flights to the Moon of the Apollo program, the spacecraft came much closer to the Moon than this, and 8 of the flights went into orbit at around 60 to 70 miles above the surface. So this is a more conservative mission than those of Apollo. Things at this stage are looking good for Artemis to fly in February 2026. The next step of the Artemis is where things get wobbly. Rather than 2024, the first landing of astronauts on the Moon is currently scheduled for 2027. But that is not going to happen. Here is what the architecture of the mission currently looks like: Here we see the problem with the schedule, even with it currently slipped to landing two astronauts on the Moon in 2027. The architecture uses the SLS and Orion to get the astronauts to lunar orbit. Given there is a lunar flyby with astronauts onboard, scheduled for just two months from now (and the rocket is already stacked for that mission) that looks like a reasonable interpolation from existing progress. The problem with the new plan is the landing vehicle and getting it to lunar orbit.  It is all based on SpaceX’s Starship. So far, Starship has had 11 flights, six of which have been successful in reaching their own goals, and 5 of which have been failures.  But there has not yet, in eleven flights, been a goal of getting anything into orbit. And just in 2025 two vehicles have been destroyed by failures on the ground when the tanks have been pressure tested. In the section on Starship below I will talk more about what I see as conflicting product requirements which together doom Starship to a very long development process. For comparison, the Saturn V which took astronauts to the Moon nine times had a total of 13 flights , every one of which got payloads to Earth orbit. Two were uncrewed tests (and there were problems with the second and third stages on the second of these test flights). Its very first crewed flight (Apollo 8) took people to the Moon. and a total of 9 launches got people to the Moon. The other two flights were (Apollo 9) a crewed flight to test the Lunar Lander and orbital rendezvous in Earth orbit, and the uncrewed launch of the first space station, Skylab. Now look again at the plan for the Artemis III mission.  It requires multiple (reported numbers range from 14 to somewhere into the twenties) launches of the Starship to orbit. One of those launches uses the Super Heavy Booster and a special version of the second stage actual Starship, known as Starship HLS (Human Landing System).  That special version is expendable after it lands astronauts on the Moon, hosts them for perhaps two weeks, then brings them back to lunar orbit where they transfer to NASA’s Orion. Then it sends itself off into heliocentric orbit for all eternity. The HLS version is special in two ways. First it does not have to get back to Earth and so doesn’t need heat shields and does not need the three in-atmosphere Raptors for soft landing on Earth (see the section on Starship below).  That is good for all the mass equations. But it does, or might, have a second set of engines for landing on the Moon that are attached halfway up its body so that they cause less lunar dust to fly around as it lands. We have not yet seen a prototype of that version, not even a public rendering as far as I can tell. I have talked to people who are in regular communication with people inside SpaceX.  They report not a peep about what work has been done to design or build the lander.  That is not good for the current public schedule. BUT the really, really bad thing is that the lunar lander stage will use up most its fuel getting into Earth orbit — it is the second stage of the rocket after all. So it cannot get to the Moon unless it is refueled.  That will be done by sending up regular versions of the Starship second stage, all on reusable Super Heavy Boosters. They too will use up most of their fuel getting to orbit, and will need to keep some to get back to Earth to be reused on another flight. But it will have a little margin and its extra fuel will be transferred to the lunar landing Starship in orbit. No one has ever demonstrated transfer of liquid fuel in space. Because of the way the numbers work out it takes somewhere in the teens of these refueling operations, and depending on how quickly certified higher performance engines can be developed and tested for both the Super Heavy Booster and Starship itself, that number of refueling flights might range into the twenties. As an engineer this architecture looks to me like trouble, and with an impossible future. I am sure it will not happen in 2027, and I have doubts that it ever will. The acting administrator of NASA, Sean Duffy who is also the head of the US Department of Transportation, was worried about this too, and in October of 2025 he reopened bidding on the contract for a crewed lander for the Moon that collects and returns its crew from Orion in lunar orbit. The day after this announcement SpaceX said they were working on a simplified architecture to land people in the Moon. They have given no details of what this architecture looks like, but here are some options proposed by the technical press. A couple of weeks later the President announced the renomination of Jared Isaacman to be the NASA administrator, having withdrawn his nomination a few months before. Isaacman is a private citizen who personally paid for, and flew on, two of the three SpaceX crewed missions which have not flown to the ISS. He was confirmed to the NASA position on December 17 th , 2025, just two weeks ago. At the very least expect turbulence, both political and technical, in getting astronauts landed on the Moon. And see a possible surprise development below. SpaceX Starship (not to be confused with Boeing’s Starliner) Starship is SpaceX’s superheavy two stage rocket, designed to put 150(?) tons of payload into orbit, with components having been under development since 2012, going through extensive redesigns along the way. There have also been three major designs, builds, and tests of the Raptor engines that power both stages. This is how Wikipedia currently introduces them: Raptor is a family of rocket engines developed and manufactured by SpaceX. It is the third rocket engine in history designed with a full-flow staged combustion fuel cycle, and the first such engine to power a vehicle in flight. The engine is powered by cryogenic liquid methane and liquid oxygen, a combination known as methalox. SpaceX’s super-heavy-lift Starship uses Raptor engines in its Super Heavy booster and in the Starship second stage. Starship missions include lifting payloads to Earth orbit and is also planned for missions to the Moon and Mars. The engines are being designed for reuse with little maintenance. Currently the Raptor 3 version is expected to be used for operational Starship launches, and it comes in two versions. There are 33 Raptors in the first stage designed to operate optimally in the atmosphere, along with three such engines in the second stage, which also houses three vacuum optimized Raptors. The first stage engines and the second stage vacuum engines are designed to get payloads to orbit. The vacuum engines on the second stage would also be used for further operations on the way to the Moon and descending towards the surface there. And for non-expendable second stages they would be used for the initial de-orbit burn for landing the second stage Starship back on Earth. After using the heat shields to burn off some more energy  as it enters the atmosphere the second set of engines, the atmosphere optimized Raptors, are used to slow it down to a soft landing. Other systems for returning to Earth have used different tradeoffs. The Space Shuttle used its wings to slow down to very high horizontal landing speed, and then a combination of a drag parachute after touchdown and brakes on the wheels to get down to zero velocity. US capsules, such as Mercury, Gemini, Apollo, Orion, and Dragon have all used heat shields followed by parachutes during vertical fall, and lastly dropped into the sea for dampening the final residual velocity. (Soyuz, Starliner, and New Shepard all use last second retro rockets before hitting the ground, rather than water.) This means that unlike all the other solutions Starship has to carry a complete set of engines into orbit just for use during landing, along with enough fuel and oxidant to land. This is a high performance price for the thing that flies in space, mostly. The engines on the Starship first stage, like those on Falcon 9 and Blue Origin’s New Glenn, do get to space but never get to more than a small fraction of orbital speed, so returning them to Earth is a much, much, lower performance price than Starship’s second stage return of engines and fuel. The 2025 flights of Starship were, on average, better than the 2024 flights, but two vehicles destroyed themselves before getting to the flight stage, and still nothing got into orbit. How close is it to working?  I don’t know.  But I do keep tabs on promises that have been made. In November of 2024 the President of SpaceX  said “I would not be surprised if we fly 400 Starship launches in the next four years” .  A year ago today I said in response: “Looking at the success of Falcon 9 it is certainly plausible that I may live to see 400 Starship launches in a four year period, but I am quite confident that it will not happen in the next four years (2025 through 2028)” . We are a quarter of the way through her predicted time frame and we have gone from being 400 orbital launches away from her goal down to being a mere 400 away. Blue Origin Gets to Orbit The suborbital tourist flights that Blue Origin operates are not its main business. It has ambitions to compete head to head with SpaceX. But it is almost 600 launches behind, how can it be competitive? In 2025 Blue Origin made clear that it is not to be dismissed. From zero orbital launches at the start of 2025 to having two orbiters on their way to Mars (SpaceX has not yet done that) and showing that it can land a booster that has very very close to the performance of Falcon Heavy’s three booster configuration when landing all three boosters. And it may well do a soft landing on the Moon in 2026 (SpaceX won’t come close to that goal for a number of years). In February Blue Origin launched its first New Glenn rocket. It’s first stage is powered by seven BE-4 engines (“Blue Engine 4”), a methane burning engine that is more powerful than the Raptor 3 which will power new versions of SpaceX’s Starship. New Glenn reached orbit on its first attempt, and delivered a Blue Origin payload to space (a test version of their Blue Ring for in-space communications). The first stage attempted to land on Blue Origin’s Jacklyn landing platform at sea but failed. The BE-4 had previously powered two United Launch Alliance Vulcan rockets to orbit under a deal where Blue Origin sells engines to ULA. The second stage of New Glenn is powered by two BE-3 engines, which are a variant of the single engine used on Blue Origin’s New Shepard. In their second launch, in November, Blue Origin not only delivered three paid payloads to orbit (two of which are headed to Mars, where they will orbit the planet and carry out science experiments for UC Berkeley  on what happened to Mars’ atmosphere), but then the first stage (much larger than the first stage of a Falcon 9) landed on Jacklyn with an unrivaled level of precise control. Blue Origin plans to reduce the time spent hovering in future landings to reduce preserved fuel needs now that it has mastered return from orbit vertical landing. (Recall that they have landed dozens of New Shepard vertical landings on return from non-orbital flights.) Soon after this impressive second outing for New Glenn, Blue Origin announced a number of upgrades. They renamed the base vehicle that has now flown twice to be “New Glenn 7×2” where 7 and 2 refer to the number of first stage and second stage engines respectively.  They also announced that those flight engines would be upgraded to levels of thrust and duration that had already been demonstrated in ground tests. These are the new total thrust numbers, in pounds force. Additionally New Origin announced a new heavier, taller, and with larger payload faring, version, the “New Glenn 9×4” with two extra engines on each stage. Looking up from below the first stage the engine arrangement goes from the one on the left to the one on the right. And here is who the two variants look compared to the Saturn V which took humans to the Moon in 1969. The kicker to these successes is that the New Glenn 7×2 with a reusable first stage is very nearly equivalent to the Falcon Heavy when its three first stage boosters are reused. The reusable New Glenn 9×4 beats Falcon Heavy on all measures even when all three of Falcon Heavy are sacrificed and not recovered.  I can’t quite get all the numbers but this table makes the comparisons with the numbers I can find. Note that a “tonne” is the British spelling for a metric ton, which is 1,000Kg. That is approximately 2,206 lbs, which is 206 lbs more than a US ton, and 34 lbs less than a British ton. Meanwhile expectations are high for another launch of a New Glenn, the 7×2 version, sometime early in the new year. There has been no announcement from Blue Origin, nor any indication of the payload. But there is a general feeling that it may actually be a launch of Blue Origin’s Blue Moon Mark 1, an all up single launch mission to soft land on the Moon.  It was announced almost a year ago that Blue Origin has a deal to deliver a NASA payload to the Moon in the Blue Moon Pathfinder mission no earlier than 2026. The Mark 1 uses a BE-7 engine to soft land. Here is where things get interesting for a re-appraisal of how NASA astronauts might first land on the Moon again. Blue Origin is already under contract with NASA to land two astronauts on the Moon for a 30 day stay in 2030 using their much larger Blue Moon Mark 2.  The Mark 2 and Mark 1 share control systems and avionics, so a successful landing of Mark 1 will boost confidence in the Mark 2.  The architecture for the 2030 mission involves multiple launches. A NASA SLS launches a crewed Orion capsule to the vicinity of the Moon. A New Glenn gets a Mark 2 Blue Moon to an orbit that approaches the Moon. A “ Cislunar Transporter ” is launched separately and it gets fueled in LEO. Then it heads off to the same orbit as the Mark 2 and refuels it. The Mark 2 and the transporter both use three Blue Origin BE-7 engines  which are now fully operational . Then the astronauts transfer to the Mark 2 to land on the Moon.  Note that this architecture uses in flight refueling as does the SpaceX version, though with far fewer launches involved. BUT, soon after then NASA administrator Sean Duffy announced the re-opening of the contract for the lander for Artemis III, it appeared  that he was considering having Blue Origin use their Mark 1 version for the crewed mission. Whether that enthusiasm survives the changing of the guard to Jared Isaacman, the new and current NASA administrator, remains to be seen.  And whether Blue Origin can pull off a rendezvous in lunar orbit, to pick up and return the crew members going to the lunar surface, from an orbiting Orion capsule is also an open question.  I think the key idea with this option is to remove the need for any in flight refueling for the first crewed landing. There is going to be some stiff competition between SpaceX and Blue Origin. Either might win. New space stations The International Space Station will be at end of life in 2030 after continuous human habitation for almost thirty years. The other space station currently in orbit is the Chinese Tiangong station. Expect to see a real pick up in the building of space stations over the next few years, in anticipation of the end of the ISS. The Russian Orbital Service Station (ROS) is scheduled to begin construction, by Roscosmos, in orbit in 2027.  There is risk to this plan from the deterioration of the Russian economy. India plans to start building their  Bharatiya Antariksh Station (BAS) in 2028 and for it to be fully operational in 2035. India has had uncrewed orbital capability since 1980, and sent its first uncrewed mission to Mars in 2013. For BAS it is developing crewed launch capability. In 2025 India sent one of its own astronauts to the ISS on a SpaceX Dragon under an agreement with the company Axiom. A consortium of countries (US, Canada, Japan, European Union, and the United Arab Emirates) are collaborating on building the Lunar Gateway , a space station orbiting the Moon. Launch of the first module is scheduled for 2027 on a SpaceX Falcon Heavy. Blue Origin is competing for additional components and launches for the Gateway. A host of private companies plan on launching smaller private space stations in the near term, with one claiming it will do so in May 2026. This is going to be an active frontier, and may lead to more humans going on orbital flights than the current status quo of about 28 per year. Their robots have not demonstrated any practical work (I don’t count dancing in a static environment doing exactly the same set of moves each time as practical work). The demonstrated grasping, usually just a pinch grasp,  in the videos they show is at a rate which is painfully slow and not something that will be useful in practice. They claim that their robots will learn human-like dexterity but they have not shown any videos of multi-fingered dexterity where humans can and do grasp things that are unseen, and grasp and simultaneously manipulate multiple small objects with one hand. And no demonstrations of using the body with the hands which is how humans routinely carry many small things or one or two heavy things. They show videos of non tele-operated manipulation, but all in person demonstrations of manipulation are tele-operated. Their current plans for robots working in customer homes all involve a remote person tele-operating the robot. Their robots are currently unsafe for humans to be close to when they are walking. Their robots have no recovery from falling and need human intervention to get back up. Their robots have a battery life measured in minutes rather than hours. Their robots cannot currently recharge themselves. Unlike human carers for the elderly, humanoids are not able to provide any physical assistance to people that provides stabilizing support for the person walking, getting into and out of bed physical assistance, getting on to and off of a toilet, physical assistance, or indeed any touch based assistance at all. The CEOs claim that there robots will be able to do everything, or many things, or a lot of things, that a human can do in just a few short years. They currently do none. The CEOs claim a rate of adoption of these humanoid robots into homes and industries at a rate that is multiple orders of magnitude faster than any other technology in human history, including mainframe computers, and home computers and the mobile phones, and the internet. Many orders of magnitude faster. Here is a CEO of a humanoid robot company saying that they will be in 10% of US households by 2030. Absolutely no technology (even without the problems above) has ever come close to scaling at that rate.

0 views
Jameel Ur Rahman 2 weeks ago

The story of OnlineOTP

A few months ago I faced an annoying problem. I wanted to redeem my Cathay Pacific Miles but I was unable to log into my account. The SMS OTP never arrived. My account was tied to my Sri Lankan number, which I’ve had for many years, yet I never received their OTP. After wasting an inordinate amount of time, first with their chat support, then with the call support, I was told this was a “known problem”. When I browsed through /r/SriLanka I immediately noticed this was a recurring problem that has gone back for more than a year on a number of services. I really wanted to scratch this itch. I knew from past experience that jumping to building was not the best solution, but at the same time I wanted to ride the momentum of this idea. Two weeks later I had an MVP ready to go. Powered by Ruby on Rails, Tailwind, Render, Twilio, Resend, Tally, Hopes and Wishes. The Security guy in me was crying as I built this product. The Pragmatist in me was satisfied that I had a use case when the product flow was broken. The Entrepreneur in me watched me scratch this itch knowing the pitfall I was knowingly putting myself into, after all I hadn’t validated this product yet. With this I went live! ... in Beta . My hope was that the survey would get me my initial customers and help me validate this product. I was all too willing to keep this product live for a year in case I got a single Beta user. 1 month later I had 5 survey submissions I gave out 4 beta codes And got 0 signups that redeemed a code Fun Fact: Someone created an account on my site before I could 😅 After I went live in Beta, itch scratching satisfied, the Entrepreneur in me finally got a hold of the steering wheel and went to work. I talked to a number of people and one very helpful interested customer who reached out to me on LinkedIn. When trying to list every real world situation where someone might need OnlineOTP, or less gloriously "SMS to email" I came up with a surprising number of usecases. Expats who lose access to home-country services Professionals who must verify accounts across multiple countries Travelers who need OTP reliably and without roaming fees People in countries with unreliable carriers People with privacy or security concerns I went on the hunt. I stalked through forums trying to find users who face this problem and to pick up how they solved this problem for themselves. I had some success And some failures Overall I came to the overwhelming conclusion that I had a problem, but not really a solution that would reliably work. Here’s a snippet from a report I wrote to my coach. The majority of people want to receive OTP from financial institutions like banks. Banks do not like virtual numbers as it somewhat defeats the purpose of a multi factor authentication. Which means as a product calling itself OnlineOTP, I can not guarantee service quality as banks may not send their OTPs to VOIP numbers even though they accept it during registration. #strike1 The TAM is quite small and will get smaller as The people who want the solution seem to either be travellers who feel this momentary pain and then they are back home As banks move towards the industry standard of two factor auth via apps, passkeys or authenticator apps this will reduce in value. People don’t want to let others get access to OTPs. especially since it’s coming from banks. Trust factor issue. #strike2 When I initially started the project I got an “OK” from Twilio for my usecase, but before I went past Beta I wanted to be doubly sure and this time there was a lot of push back and a polite no, that this is against their acceptable use policy. Researching other providers I found that almost all of them have terms that imply they won’t be happy with reselling phone numbers or using it to receive OTPs. #strike3 I think I’m fairly sure a real problem exists but I don’t think the solution I’ve come up with is the right solution. At the moment there doesn’t seem to technically be a way to provide SMS to Email without becoming a Telecom Provider myself (MVNO specifically), which is not practical. A bit disapointing. That said, I regret nothing. It’s been fun going through this process even though it's resulting in me shutting down a product a month after launch. I’m just glad no one redeemed a beta code, as I would be honour bound to support the product for at least a year then. With this blog post, I'm closing up OnlineOTP. Excited to see what 2026 holds. Happy New Year! Cool Logo… Check Shareworthy Landing page… Check SEO… Check Focusing on the problem…Check Functioning buy a VoIP number and then get SMS to Email… Check Handling edge cases when buying a number… Check [Entrepreneur: What why?] Live dashboard showing SMS as you receive it… Check [Entrepreneur: seriously?] [Security guy: mate you're receiving OTPs… you should be self destructing it instead.] Mandatory FAQ explaining caveats with this product… Check I'm a UK expat living in Malaysia who needs a UK phone number that can receive SMS while in Malaysia. I'm a Certified Public Accountant based in the Philippines with clients in Singapore and Hong Kong. I am unable to reliably receive SMS OTP to process payments while sitting in Philippines I'm a Virtual Assistant who manages their client's accounts remotely and needs OTP access to complete tasks. I'm a Freelancer who needs a local number in multiple countries to access region-specific apps. I'm a business owner who manages accounts in multiple regions and needs OTPs from each region forwarded to one inbox. I'm a businessman who wants to receive OTPs on my Canadian phone number without having to pay Roaming Charges while I travel. And I travel frequently. I'm a back packer on a tour around the world who uses a temporary number but still needs to reliably access OTPs from their local bank. I'm a digital nomad who cycles through countries every few months and can't maintain SMS reliability. I'm a cruise passenger relying on ship WiFi and unable to receive SMS at sea. (Or Flight). I'm a traveler who temporarily uses a local SIM card but still needs OTPs from my home-country number. I'm a Sri Lankan who has a local Sri Lankan number who does not reliably receive SMS from Cathay Pacific on my local phone number. I'm someone living in a rural area where cellular coverage is weak, but email over WiFi works. People accessing platforms that require local numbers I'm an online seller/buyer who needs verification codes from marketplaces that only text local numbers. (If I remember correctly, Carousell in Singapore had that issue when I tried to buy something from it when I visited SG) I'm someone who wants an international virtual number for privacy but needs guaranteed SMS delivery. I'm someone who frequently relocates and prefers a stable, long-term virtual number. I'm a business founder who doesn't want to expose their personal number to dozens of SaaS platforms.

0 views
Herman's blog 2 weeks ago

Discovery and AI

I browse the discovery feed on Bear daily, both as part of my role as a moderator, and because it's a space I love, populated by a diverse group of interesting people. I've read the posts regarding AI-related content on the discovery feed, and I get it. It's such a prevalent topic right now that it feels inescapable, available everywhere from Christmas dinner to overheard conversation on the subway. It's also becoming quite a polarising one, since it has broad impacts on society and the natural environment. This conversation also raises the question about popular bloggers and how pre-existing audiences should affect discoverability. As with all creative media, once you have a big-enough audience it becomes self-perpetuating that you get more visibility. Think Spotify's 1%. Conveniently, Bear is small enough that bloggers with no audience can still be discovered easily and it's something I'd like to preserve on the platform. In this post I'll try and explain my thinking on these matters, and clear up a few misconceptions. First off, posts that get many upvotes through a large pre-existing audience, or from doing well on Hacker News do not spend disproportionately more time on the discovery feed. Due to how the algorithm works, after a certain number of upvotes, more upvotes have little to no effect. Even a post with 10,000 upvotes won't spend more than a week on page #1. I want Trending to be equally accessible to all bloggers on Bear. While this cap solves the problem of sticky posts, there is a second, less pressing issue: If a blogger has a pre-existing audience, say in the form of a newsletter or Twitter account, some of their existing audience will likely upvote, and that post has a good chance of feature on the Trending page. One of the potential solutions I've considered is either making upvotes available to logged in users only, or Bear account holders receive extra weighting in their upvotes. However, due to how domains work each blog is a new website according to the browser, and so logins don't persist between blogs. This would require logging in to upvote on each site, which isn't feasible. While I moderate Bear for spam, AI-generated content, and people breaking the Code of Conduct, I don't moderate by topic. That removes the egalitarian nature of the platform and puts up topic rails like an interest-group forum or subreddit. While I'm not particularly interested in AI as a topic, I don't feel like it's my place to remove it, in the same way that I don't feel particularly strongly about manga. There is a hide blog feature on the discovery page. If you don't want certain blogs showing up in your feed, add them to the hidden textarea to never see them again. Similarly to how Bear gives bloggers the ability to create their own tools within the dashboard, I would like to lean into this kind of extensibility for the discovery feed, with hiding blogs being the start. Curation instead of exclusion. This post is just a stream of consciousness of my thoughts on the matter. I have been contemplating this, and, as with most things, it's a nuanced problem to solve. If you have any thoughts or potential solutions, send me an email. I appreciate your input. Enjoy the last 2 days of 2025!

0 views

Can Bundler Be as Fast as uv?

At RailsWorld earlier this year, I got nerd sniped by someone. They asked “why can’t Bundler be as fast as uv?” Immediately my inner voice said “YA, WHY CAN’T IT BE AS FAST AS UV????” My inner voice likes to shout at me, especially when someone asks a question so obvious I should have thought of it myself. Since then I’ve been thinking about and investigating this problem, going so far as to give a presentation at XO Ruby Portland about Bundler performance . I firmly believe the answer is “Bundler can be as fast as uv” (where “as fast” has a margin of error lol). Fortunately, Andrew Nesbitt recently wrote a post called “How uv got so fast” , and I thought I would take this opportunity to review some of the highlights of the post and how techniques applied in uv can (or can’t) be applied to Bundler / RubyGems. I’d also like to discuss some of the existing bottlenecks in Bundler and what we can do to fix them. If you haven’t read Andrew’s post, I highly recommend giving it a read . I’m going to quote some parts of the post and try to reframe them with RubyGems / Bundler in mind. Andrew opens the post talking about rewriting in Rust: uv installs packages faster than pip by an order of magnitude. The usual explanation is “it’s written in Rust.” That’s true, but it doesn’t explain much. Plenty of tools are written in Rust without being notably fast. The interesting question is what design decisions made the difference. This is such a good quote. I’m going to address “rewrite in Rust” a bit later in the post. But suffice to say, I think if we eliminate bottlenecks in Bundler such that the only viable option for performance improvements is to “rewrite in Rust”, then I’ll call it a success. I think rewrites give developers the freedom to “think outside the box”, and try techniques they might not have tried. In the case of , I think it gave the developers a good way to say “if we don’t have to worry about backwards compatibility, what could we achieve?”. I suspect it would be possible to write a uv in Python (PyUv?) that approaches the speeds of uv, and in fact much of the blog post goes on to talk about performance improvements that aren’t related to Rust. pip’s slowness isn’t a failure of implementation. For years, Python packaging required executing code to find out what a package needed. I didn’t know this about Python packages, and it doesn’t really apply to Ruby Gems so I’m mostly going to skip this section. Ruby Gems are tar files, and one of the files in the tar file is a YAML representation of the GemSpec. This YAML file declares all dependencies for the Gem, so RubyGems can know, without evaling anything, what dependencies it needs to install before it can install any particular Gem. Additionally, RubyGems.org provides an API for asking about dependency information, which is actually the normal way of getting dependency info (again, no required). There’s only one other thing from this section I’d like to quote: PEP 658 (2022) put package metadata directly in the Simple Repository API, so resolvers could fetch dependency information without downloading wheels at all. Fortunately RubyGems.org already provides the same information about gems. Reading through the number of PEPs required as well as the amount of time it took to get the standards in place was very eye opening for me. I can’t help but applaud folks in the Python community for doing this. It seems like a mountain of work, and they should really be proud of themselves. I’m mostly going to skip this section except for one point: Ignoring requires-python upper bounds. When a package says it requires python<4.0, uv ignores the upper bound and only checks the lower. This reduces resolver backtracking dramatically since upper bounds are almost always wrong. Packages declare python<4.0 because they haven’t tested on Python 4, not because they’ll actually break. The constraint is defensive, not predictive. I think this is very very interesting. I don’t know how much time Bundler spends on doing “required Ruby version” bounds checking, but it feels like if uv can do it, so can we. I really love that Andrew pointed out optimizations that could be made that don’t involve Rust. There are three points in this section that I want to pull out: Parallel downloads. pip downloads packages one at a time. uv downloads many at once. Any language can do this. This is absolutely true, and is a place where Bundler could improve. Bundler currently has a problem when it comes to parallel downloads, and needs a small architectural change as a fix. The first problem is that Bundler tightly couples installing a gem with downloading the gem. You can read the installation code here , but I’ll summarize the method in question below: The problem with this method is that it inextricably links downloading the gem with installing it. This is a problem because we could be downloading gems while installing other gems, but we’re forced to wait because the installation method couples the two operations. Downloading gems can trivially be done in parallel since the files are just archives that can be fetched independently. The second problem is the queuing system in the installation code. After gem resolution is complete, and Bundler knows what gems need to be installed, it queues them up for installation. You can find the queueing code here . The code takes some effort to understand. Basically it allows gems to be installed in parallel, but only gems that have already had their dependencies installed. So for example, if you have a dependency tree like “gem depends on gem which depends on gem ” ( ), then no gems will be installed (or downloaded) in parallel. To demonstrate this problem in an easy-to-understand way, I built a slow Gem server . It generates a dependency tree of ( depends on , depends on ), then starts a Gem server. The Gem server takes 3 seconds to return any Gem, so if we point Bundler at this Gem server and then profile Bundler, we can see the impact of the queueing system and download scheme. In my test app, I have the following Gemfile: If we profile Bundle install with Vernier, we can see the following swim lanes in the marker chart: The above chart is showing that we get no parallelism during installation. We spend 3 seconds downloading the gem, then we install it. Then we spend 3 seconds downloading the gem, then we install it. Finally we spend 3 seconds downloading the gem, and we install it. Timing the process shows we take over 9 seconds to install (3 seconds per gem): Contrast this with a Gemfile containing , , and , which have no dependencies, but still take 3 seconds to download: Timing for the above Gemfile shows it takes about 4 seconds: We were able to install the same number of gems in a fraction of the time. This is because Bundler is able to download siblings in the dependency tree in parallel, but unable to handle other relationships. There is actually a good reason that Bundler insists dependencies are installed before the gems themselves: native extensions. When installing native extensions, the installation process must run Ruby code (the file). Since the could require dependencies be installed in order to run, we must install dependencies first. For example depends on , but is only used during the installation process, so it needs to be installed before can be compiled and installed. However, if we were to decouple downloading from installation it would be possible for us to maintain the “dependencies are installed first” business requirement but speed up installation. In the case, we could have been downloading gems and at the same time as gem (or even while waiting on to be installed). Additionally, pure Ruby gems don’t need to execute any code on installation. If we knew that we were installing a pure Ruby gem, it would be possible to relax the “dependencies are installed first” business requirement and get even more performance increases. The above case could install all three gems in parallel since none of them execute Ruby code during installation. I would propose we split installation in to 4 discrete steps: Downloading and unpacking can be done trivially in parallel. We should unpack the gem to a temporary folder so that if the process crashes or the machine loses power, the user isn’t stuck with a half-installed gem. After we unpack the gem, we can discover whether the gem is a native extension or not. If it’s not a native extension, we “install” the gem simply by moving the temporary folder to the “correct” location. This step could even be a “hard link” step as discussed in the next point. If we discover that the gem is a native extension, then we can “pause” installation of that gem until its dependencies are installed, then resume (by compiling) at an appropriate time. Side note: , a Bundler alternative , works mostly in this manner today. Here is a timing of the case from above: Lets move on to the next point: Global cache with hardlinks. pip copies packages into each virtual environment. uv keeps one copy globally and uses hardlinks I think this is a great idea, but I’d actually like to split the idea in two. First, RubyGems and Bundler should have a combined, global cache, full stop. I think that global cache should be in , and we should store files there when they are downloaded. Currently, both Bundler and RubyGems will use a Ruby version specific cache folder. In other words, if you do on two different versions of Ruby, you get two copies of Rails and all its dependencies. Interestingly, there is an open ticket to implement this , it just needs to be done. The second point is hardlinking on installation. The idea here is that rather than unpacking the gem multiple times, once per Ruby version, we simply unpack once and then hard link per Ruby version. I like this idea, but I think it should be implemented after some technical debt is paid: namely implementing a global cache and unifying Bundler / RubyGems code paths. On to the next point: PubGrub resolver Actually Bundler already uses a Ruby implementation of the PubGrub resolver. You can see it here . Unfortunately, RubyGems still uses the molinillo resolver . In other words you use a different resolver depending on whether you do or . I don’t really think this is a big deal since the vast majority of users will be doing most of time. However, I do think this discrepancy is some technical debt that should be addressed, and I think this should be addressed via unification of RubyGems and Bundler codebases (today they both live in the same repository, but the code isn’t necessarily combined). Lets move on to the next section of Andrew’s post: Andrew first mentions “Zero-copy deserialization”. This is of course an important technique, but I’m not 100% sure where we would utilize it in RubyGems / Bundler. I think that today we parse the YAML spec on installation, and that could be a target. But I also think we could install most gems without looking at the YAML gemspec at all. Thread-level parallelism. Python’s GIL forces parallel work into separate processes, with IPC overhead and data copying. This is an interesting point. I’m not sure what work pip needed to do in separate processes. Installing a pure Ruby, Ruby Gem is mostly an IO bound task, with some ZLIB mixed in. Both of these things (IO and ZLIB processing) release Ruby’s GVL, so it’s possible for us to do things truly in parallel. I imagine this is similar for Python / pip, but I really have no idea. Given the stated challenges with Python’s GIL, you might wonder whether Ruby’s GVL presents similar parallelism problems for Bundler. I don’t think so, and in fact I think Ruby’s GVL gets kind of a bad rap. It prevents us from running CPU bound Ruby code in parallel. Ractors address this, and Bundler could possibly leverage them in the future, but since installing Gems is mostly an IO bound task I’m not sure what the advantage would be (possibly the version solver, but I’m not sure what can be parallelized in there). The GVL does allow us to run IO bound work in parallel with CPU bound Ruby code. CPU bound native extensions are allowed to release the GVL , allowing Ruby code to run in parallel with the native extension’s CPU bound code. In other words, Ruby’s GVL allows us to safely run work in parallel. That said, the GVL can work against us because releasing and acquiring the GVL takes time . If you have a system call that is very fast, releasing and acquiring the GVL could end up being a large percentage of that call. For example, if you do , and the buffer is very small, you could encounter a situation where GVL book keeping is the majority of the time. A bummer is that Ruby Gem packages usually contain lots of very small files, so this problem could be impacting us. The good news is that this problem can be solved in Ruby itself, and indeed some work is being done on it today . No interpreter startup. Every time pip spawns a subprocess, it pays Python’s startup cost. Obviously Ruby has this same problem. That said, we only start Ruby subprocesses when installing native extensions. I think native extensions make up the minority of gems installed, and even when installing a native extension, it isn’t Ruby startup that is the bottleneck. Usually the bottleneck is compilation / linking time (as we’ll see in the next post). Compact version representation. uv packs versions into u64 integers where possible, making comparison and hashing fast. This is a cool optimization, but I don’t think it’s actually Rust specific. Comparing integers is much faster than comparing version objects. The idea is that you take a version number, say , and then pack each part of the version in to a single integer. For example, we could represent as and as , etc. It should be possible to use this trick in Ruby and encode versions to integer immediates, which would unlock performance in the resolver. Rust has an advantage here - compiled native code comparing u64s will always be faster than Ruby, even with immediates. However, I would bet that with the YJIT or ZJIT in play, this gap could be closed enough that no end user would notice the difference between a Rust or Ruby implementation of Bundler. I started refactoring the object so that we might start doing this, but we ended up reverting it because of backwards compatibility (I am jealous of in that regard). I think the right way to do this is to refactor the solver entry point and ensure all version requirements are encoded as integer immediates before entering the solver. We could keep the API as “user facing” and design a more internal API that the solver uses. I am very interested in reading the version encoding scheme in uv. My intuition is that minor numbers tend to get larger than major numbers, so would minor numbers have more dedicated bits? Would it even matter with 64 bits? I’m going to quote Andrew’s last 2 paragraphs: uv is fast because of what it doesn’t do, not because of what language it’s written in. The standards work of PEP 518, 517, 621, and 658 made fast package management possible. Dropping eggs, pip.conf, and permissive parsing made it achievable. Rust makes it a bit faster still. pip could implement parallel downloads, global caching, and metadata-only resolution tomorrow. It doesn’t, largely because backwards compatibility with fifteen years of edge cases takes precedence. But it means pip will always be slower than a tool that starts fresh with modern assumptions. I think these are very good points. The difference is that in RubyGems and Bundler, we already have the infrastructure in place for writing a “fast as uv” package manager. The difficult part is dealing with backwards compatibility, and navigating two legacy codebases. I think this is the real advantage the uv developers had. That said, I am very optimistic that we could “repair the plane mid-flight” so to speak, and have the best of both worlds: backwards compatibility and speed. I mentioned at the top of the post I would address “rewrite it in Rust”, and I think Andrew’s own quote mostly does that for me. I think we could have 99% of the performance improvements while still maintaining a Ruby codebase. Of course if we rewrote it in Rust, you could squeeze an extra 1% out, but would it be worthwhile? I don’t think so. I have a lot more to say about this topic, and I feel like this post is getting kind of long, so I’m going to end it here. Please look out for part 2, which I’m tentatively calling “What makes Bundler / RubyGems slow?” This post was very “can we make RubyGems / Bundler do what uv does?” (the answer is “yes”). In part 2 I want to get more hands-on by discussing how to profile Bundler and RubyGems, what specifically makes them slow in the real world, and what we can do about it. I want to end this post by saying “thank you” to Andrew for writing such a great post about how uv got so fast . Download the gem Unpack the gem Compile the gem Install the gem

0 views
Xe Iaso 3 weeks ago

Arcane Cheese with Doomtrain Extreme

Spoiler Warning If you want to go through the Final Fantasy 14 duty Hell on Rails (Extreme) blind, don't read this guide as it spoils how to easily solve one of the mechanics in it. If you don't play Final Fantasy 14, most of the words in this article are going to make no sense to you and I will make no attempt to explain them. Just know that most of the words I am saying do have meaning even though they aren't in The Bible. In phase 4 of Hell on Rails (Extreme), the boss will cast Arcane Revelation, which makes the arena look something like this: There will be a very large circle of bad moving around the arena. One tank and one healer will be marked with an untelegraphed AoE attack that MUST be soaked by at least one other player (or two for healers). Doomtrain will move the circle of bad anywhere from 1-3 times and leave only a small area of the arena safe. Normally you're supposed to solve it something like this: Instead of normal light party groups, break up into two groups: melee and casters. This will allow the melees to keep as much uptime as the mechanics allow, but also let the casters get uptime at a distance. Solving this is pretty easy with practice. However as a caster this is kinda annoying because when the North side is safe, you have to fall down off the ledge and the only way to get back is by going around the long way with the janky teleporters that are annoying to hit on purpose but very easy to hit on accident. There is an easier way: you just stand in the upper corners so your melees can greed uptime and just soak all of the bad: This looks a lot easier but is actually very technically complicated for nearly every class. My example solve for this includes the following party members: The light party assignment is as follows: Arcane Revelation can perform up to three hits. In each of the hits you need to mitigate the damage heavily or you will wipe. I've found the most consistent results doing this: First hit: WAR casts Shake it Off , Reprisal , and Rampart ; WHM casts Plenary Indulgence and Medica III ; SGE casts Kerachole and Eukrasian Prognosis II ; SAM (and RPR) casts Bloodbath and mostly focuses on DPSing as much as possible to heal from the massive damage you will be taking throughout this mechanic; DNC casts Shield Samba . After the hit: heal as much as you can to offset the hit you took. If you're lucky you didn't take much. If you're not: you took a lot. Dancer's Curing Waltz can help here. Second hit: GNB casts Heart of Light , Reprisal , and Rampart ; SGE casts Holos and Eukrasian Prognosis II ; PCT casts Addle . After the hit: SGE casts a Zoe -boosted Pneuma . Generally you do what you can to heal and maintain DPS uptime. Hopefully you don't have to take another heavy hit. Third hit: One of the tanks uses a Tank Limit Break 2 , Healers dump as many mits as they have left, hopefully you won't die but getting to this point means you got very very unlucky. Between each of these hits you need to heal everyone up to 100% as soon as possible otherwise you WILL wipe. Most of the damage assumptions in this guide assume that everyone is at 100% health. The melee classes can mostly be left to their own devices to greed as much uptime as possible, but they may need Aquaveil, Taurochole, or other single target damage mitigations as appropriate. By the end of this you will have used up all of your mitigations save tank invulns. Here's a video of the first time I did this as Sage: That exasperated laugh is because previously Arcane Revelation was my hard prog point as even though I was able to do it consistently, others were not. This caused many wipes 7 minutes into a 10 minute fight. This cheese makes it consistent with random people on Party Finder. One of the tanks will need to soak a stack tower with an invuln. Everyone else runs to the back of the car to enter the next phase and then you continue the fight as normal. Spoiler Warning If you want to go through the Final Fantasy 14 duty Hell on Rails (Extreme) blind, don't read this guide as it spoils how to easily solve one of the mechanics in it. If you don't play Final Fantasy 14, most of the words in this article are going to make no sense to you and I will make no attempt to explain them. Just know that most of the words I am saying do have meaning even though they aren't in The Bible. In phase 4 of Hell on Rails (Extreme), the boss will cast Arcane Revelation, which makes the arena look something like this: There will be a very large circle of bad moving around the arena. One tank and one healer will be marked with an untelegraphed AoE attack that MUST be soaked by at least one other player (or two for healers). Doomtrain will move the circle of bad anywhere from 1-3 times and leave only a small area of the arena safe. Normally you're supposed to solve it something like this: Instead of normal light party groups, break up into two groups: melee and casters. This will allow the melees to keep as much uptime as the mechanics allow, but also let the casters get uptime at a distance. Solving this is pretty easy with practice. However as a caster this is kinda annoying because when the North side is safe, you have to fall down off the ledge and the only way to get back is by going around the long way with the janky teleporters that are annoying to hit on purpose but very easy to hit on accident. There is an easier way: you just stand in the upper corners so your melees can greed uptime and just soak all of the bad: This looks a lot easier but is actually very technically complicated for nearly every class. My example solve for this includes the following party members: Tank 1: Warrior (WAR) Tank 2: Gunbreaker (GNB) Healer 1: White Mage (WHM) Healer 2: Sage (SGE) Melee 1: Samurai (SAM) Melee 2: Reaper (RPR) Ranged 1: Dancer (DNC) Ranger 2: Pictomancer (PCT) WAR, WHM, SAM, DNC GNB, SGE, RPR, PCT

0 views

Does the Internet know what time is it?

Time is one of those things that is significantly harder to deal with than you’d naively expect. Its common in computing to assume that computers know the current time. After all, there are protocols like NTP for synchronizing computer clocks, and they presumably work well and are widely used. Practically speaking, what kinds of hazards lie hidden here? I’ll start this post with some questions: Some quick definitions: I just checked the system time of my laptop against time.gov , which reports a -0.073s offset. So for a N=1 sample size, I’m cautiously optimistic. There are research papers, like Spanner, TrueTime & The CAP Theorem , that describe custom systems that rely on atomic clocks and GPS to provide clock services with very low, bounded error. While these are amazing feats of engineering, they remain out of reach for most applications. What if we needed to build a system that spanned countless computers across the Internet and required each to have a fairly accurate clock? I wasn’t able to find a study that measured clock offset in this way. There are, however, a number of studies that measure clock skew (especially for fingerprinting). Many of these studies are dated, so it seems like now is a good time for a new measurement. This post is my attempt to measure clock offsets, Internet-wide. When processing HTTP requests, servers fill the HTTP Date header . This header should indicate “the date and time at which the message originated”. Lots of web servers generate responses on-the-fly, so the Date header reveals the server’s clock in seconds. Looks pretty good. I’ll use this as the basis for the measurements. Unfortunately, there are a bunch of challenges we’ll need to deal with. First, resources may get cached in a CDN for some time and the Date header would reflect when the resource was generated instead of the server’s current time reference. Requesting a randomized path will bypass the CDN, typically generating a 404 error. Unfortunately, I found some servers will set the Date header to the last modified time of the 404 page template. I considered performing multiple lookups to see how the Date header advances between requests, but some websites are distributed, so we’d be measuring a different system’s clock with each request. The safest way to avoid this hazard is to only consider Date headers that are offset to the future, which is the approach we’ll use. HTTP responses will take some time to generate; sometimes spanning a couple seconds. We can’t be sure when the Date header was filled, but we know it was before we got the response. Since we only want to measure timestamps that are from the future, we can subtract the timestamp in the date header from when we received the response. This gives a lower bound for the underlying clock offset. When performing broad Internet scans you’ll find many servers have invalid or expired TLS certificates. For the sake of collecting more data I’ve disabled certificate validations while scanning. Finally, our own system clock has skew. To minimize the effect of local clock skew I made sure I had a synchronization service running (systemd-timesyncd on Debian) and double checked my offset on time.gov. All offset measurements are given in whole seconds, rounding towards zero, to account for this challenge. The measurement tool is mostly a wrapper around this Golang snippet: For performance reasons, the code performs a HTTP HEAD request instead of the heavier GET request. Starting in late-November I scanned all domain names on the Tranco top 1,000,000 domains list (NNYYW) . I scanned slowly to avoid any undesired load on third-party systems, with the scan lasting 25 days. Of the million domain names, 241,570 systems could not be measured due to connection error such as timeout, DNS lookup failure, connection refusal, or similar challenges. Not all the domains on the Tranco list have Internet-accessible HTTPS servers running at the apex on the standard port, so these errors are expected. Further issues included HTTP responses that lacked a Date header (13,098) or had an unparsable Date header (102). In all, 745,230 domain names were successfully measured. The vast majority of the measured domains had an offset of zero (710,189; 95.3%). Date headers set to the future impacted 12,717 domains (1.7%). Date headers set to the past will be otherwise ignored, but impacted 22,324 domains (3.0%). The largest positive offset was 39,867,698 seconds, landing us 461 days in the future (March 2027 at scan time). If we graph this we’ll see that the vast majority of our non-negative offsets are very near zero. We also observe that very large offsets are possible but quite rare. I can’t make out many useful trends from this graph. The large amount of data points near zero seconds skews the vertical scale and the huge offsets skew the horizontal scale. Adjusting the graph to focus on 10 seconds to 86,400 seconds (one day) and switching offsets to a log scale provides this graph: This curve is much closer to my expectations. I can see that small offsets of less than a minute have many observances. One thing I didn’t expect were spikes at intervals of whole hours, but it makes a lot of sense in hindsight. This next graph shows the first day, emphasizing data points that exactly align to whole hour offsets. The largest spikes occur at one, three, and nine hours with no clear trend. Thankfully, geography seems to explain these spikes quite well. Here are the top-level domains (TLDs) of domains seen with exactly one hour offset: Germany (.DE), Czech Republic (.CZ), Sweden (.SE), Norway (.NO), Italy (.IT), and Belgium (.BE) are all currently using Central European Time, which uses offset UTC+1. TLDs of domains seen with exactly three hour offset: The country-code top-level domain (ccTLD) for Russia is .RU and Moscow Standard Time is UTC+3. TLDs of domains with exactly nine hour offset: South Korea (.KR) and Cocos (Keeling) Islands (.CC) follow UTC+9. So I strongly suspect these whole-hour offset spikes are driven by local time zones. These systems seem to have set their UTC time to the local time, perhaps due to an administrator who set the time manually to local time, instead of using UTC and setting their timezone. While this type of error is quite rare, impacting only 49 of the measured domain names (0.007%), the large offsets could be problematic. Another anomalous datapoint at 113 seconds caught my attention. Almost all of the data points at the 113 second offset are for domain names hosted by the same internet service provider using the same IP block. A single server can handle traffic for many domain names, all of which will have the same clock offset. We’ll see more examples of this pattern later. Knowing that we have some anomalous spikes due to shared hosting and spikes at whole hour intervals due to timezone issues, I smoothed out the data to perform modeling. Here’s a graph from zero to fifty-nine minutes, aggregating ten second periods using the median. I added a power-law trend line, which matches the data quite well (R 2 = 0.92). I expected to see a power-law distribution, as these are common when modeling randomized errors, so my intuition feels confirmed. The average clock offset, among those with a non-negative offset, was 6544.8 seconds (about 109 minutes). The median clock offset was zero. As with other power-law distributions, the average doesn’t feel like a useful measure due to the skew of the long tail. The HTTP Date header measurement has proven useful for assessing offsets of modern clocks, but I’m also interested in historical trends. I expect that computers are getting better at keeping clocks synchronized as we get better at building hardware, but can we measure it? I know of some bizarre issues that have popped up over time, like this Windows STS bug , so its even possible we’ve regressed. Historical measurements require us to ask “when was this timestamp generated?” and measure the error. This is obviously tricky as the point of the timestamp is to record the time, but we suspect the timestamp has error. Somehow, we’ve got to find a more accurate time to compare each timestamp against. It took me a while to think of a useful dataset, but I think git commits provide a viable way to measure historical clock offsets. We’ve got to analyze git commit timestamps carefully as there’s lots of ways timestamps can be out of order even when clocks are fully synchronized. Let’s first understand how “author time” and “commit time” work. When you write some code and it, you’ve “authored” the code. The git history at this point will show both an “author time” and “commit time” of the same moment. Later you may merge that code into a “main” branch, which updates the “commit time” to the time of the merge. When you’re working on a team you may see code merged in an order that’s opposite the order it was written, meaning the “author times” can be out of chronological order. The “commit times”, however, should be in order. The Linux kernel source tree is a good candidate for analysis. Linux was one of the first adopters of git, as git was written to help Linux switch source control systems. My local git clone of Linux shows 1,397,347 commits starting from 2005. It may be the largest substantive project using git, and provides ample data for us to detect timestamp-based anomalies. I extracted the timing and other metadata from the git history using: Here’s a graph of the “commit time”, aggregating 1000 commit blocks using various percentiles, showing that commits times are mostly increasing. While there’s evidence of anomalous commit timestamps here, there are too few for us find meaningful trends. Let’s keep looking. Here’s a graph of the “author time” showing much more variation: We should expect to see author times vary, as it takes differing amounts of time for code to be accepted and merged. But there are also large anomalies here, including author times that are decidedly in the future and author times that pre-date both git and Linux. We can get more detail in the graph by zooming into the years Linux has been developed thus far: This graph tells a story about commits usually getting merged quickly, but some taking a long time to be accepted. Certain code taking longer to review is expected, so the descending blue data points are expected. There are many different measurements we could perform here, but I think the most useful will be “author time” minus “commit time”. Typically, we expect that code is developed, committed, reviewed, approved, and finally merged. This provides an author time that is less than the commit time, as review and approval steps take time. A positive value of author time minus commit time would indicate that the code was authored in the future, relative to the commit timestamp. We can’t be sure whether the author time or the commit time was incorrect (or both), but collectively they record a timestamp error. These commits are anomalous as the code was seemingly written, committed, then traveled back in time to be merged. We’ll refer to these commits as time travelling commits, although timestamp errors are very likely the correct interpretation. Looking at the Linux git repo, I see 1,397,347 commits, of which 1,773 are time travelling commits. This is 0.127% of all commits, a somewhat rare occurrence. Here’s a graph of these timestamp errors: There are some fascinating patterns here! Ignoring the marked regions for a moment, I notice that offsets below 100 seconds are rare; this is quite unlike the pattern seen for HTTP Date header analysis. I suspect the challenge is that there is usually a delay between when a commit is authored and when it is merged. Code often needs testing and review before it can be merged; those tasks absorb any small timestamp errors. This will make modeling historical clock offset trends much more difficult. The region marked “A” shows many errors below 100 seconds, especially along linear spikes. There appears to be two committers in this region, both using “de.ibm.com” in their email address. The majority of authors in region A have “ibm.com” in their email address. So these anomalies appear to be largely due to a single company. These commits appear to have the author timestamp rewritten to a (mostly) sequential pattern. Here are the commits for two of the days: The author dates here are perfectly sequential, with one second between each commit. The commit dates also increase, but more slowly, such that the difference between author date and commit date increases with later commits. I suspect these timestamps were set via some sort of automation software when processing a batch of commits. The software may have initially set both author and commit timestamps to the current time, but then incremented the author timestamp by one with each subsequent commit while continuing to use the current time for the commit timestamp. If the software processed commits faster than one per second, we’d see this pattern. I don’t think these timestamps are evidence of mis-set clocks, but rather an automated system with poor timestamp handling code. The region marked “B” shows many errors near a 15.5 hour offset (with several exactly on the half-hour mark). Looking at the email addresses I see several “com.au” domains, suggesting some participants were located in Australia (.AU). Australia uses several time zones, including UTC+8, UTC+8:45, UTC+9:30, UTC+10, UTC+10:30, and UTC+11… but nothing near 15.5 hours. The GitHub profiles for one of the committers shows a current timezone of UTC-5. This suggests that an author in Australia and a committer in the Americas both mis-set their clocks, perhaps combining UTC+10:30 and UTC-5 to to reach the 15.5 hour offset. We saw examples of timezone related clock errors when looking at the HTTP Date header; this appears to be an example of two timezone errors combining. The region marked “C” shows many error around 30 to 260 days, which are unusually large errors. The committer for each of these is the same email address, using the “kernel.org” domain name. If we render the author and committer timestamps we’ll see this pattern: I notice that the day in the author timestamp usually matches the month in the committer timestamp, and when it doesn’t it’s one smaller. When the author day and the committer month match, the author month is less than or the same as the committer day. The days in the author timestamp vary between one and nine, while the days in the commit timestamp vary between eight and twenty-one. This suggests that the author timestamp was set incorrectly, swapping the day and month. Looking at these commits relative to the surrounding commits, the commit timestamps appears accurate. If I fix the author timestamps by swapping the day and month, then the data points are much more reasonable. The author timestamps are no longer after the commit timestamps, with differences varying between zero and thirty-six days, and an average of nine days. So it seems these author timestamps were generated incorrectly, swapping month and day, causing them to appear to travel back in time. Git has had code for mitigating these sorts of issues since 2006, like this code that limits timestamps to ten days in the future . I’m not sure why the commits in region “C” weren’t flagged as erroneous. Perhaps a different code path was used? Region “C” doesn’t appear to be related to a mis-set system clock, but instead a date parsing error that swapped day and month. This type of error is common when working between different locales, as the ordering of month and day in a date varies by country . Finally, the region marked “D” shows a relatively sparse collection of errors. This may suggest that git timestamp related errors are becoming less common. But there’s an analytical hazard here: we’re measuring timestamps that are known to time travel. It’s possible that this region will experience more errors in the future! I suspect region “A” and “C” are due to software bugs, not mis-set clocks. Region “B” may be due to two clocks, both mis-set due to timezone handling errors. It seems unwise to assume that I’ve caught all the anomalies and can attribute the rest of the data points to mis-set clocks. Let’s continue with that assumption anyway, knowing that we’re not on solid ground. The Linux kernel source tree is an interesting code base, but we should look at more projects. This next graph counts positive values of “author time” minus “commit time” for Linux, Ruby, Kubernetes, Git, and OpenSSL. The number of erroneous timestamps is measured per-project against the total commits in each year. It’s difficult to see a trend here. Linux saw the most time travelling commits from 2008 through 2011, each year above 0.4%, and has been below 0.1% since 2015. Git had zero time travelling commits since 2014, with a prior rate below 0.1%. Digging into the raw data I notice that many time travelling commits were generated by the same pair of accounts. For Kubernetes, 78% were authored by [email protected] and merged by [email protected] , although these were only one second in the future. These appear to be due to the “Kubernetes Submit Queue”, where the k8s-merge-robot authors a commit on one system and the merge happens within GitHub. For Ruby, 89% were authored by the same user and committed by [email protected] with an offset near 30 seconds. I attempted to correct for these biases by deduplicating commit-author pairs, but the remaining data points were too sparse to perform meaningful analysis. Time travelling usually reaches its peak two to four years after a project adopts source control, ramping up before, and generally falling after. This hints at a project management related cause to these spikes. I’ll speculate that this is due to developers initially using Git cautiously as it is new to them, then as they get comfortable with Git they begin to build custom automation systems. These new automation systems have bugs or lack well-synchronized clocks, but these issues are addressed over time. I don’t think I can make any conclusion from this data about system clocks being better managed over time. This data doesn’t support my expectation that erroneous timestamps would reduce over time, and I’ll call this a “negative result”. There’s too many challenges in this data set. This analysis explored timestamps impacted by suspected mis-set clocks. HTTP scanning found that 1.7% of domain names had a Date header mis-set to the future. Web server offsets strongly matched a power-law distribution such that small offsets were by far the most common. Git commit analysis found up to 0.65% of commits (Linux, 2009) had author timestamps in the future, relative to the commit timestamp. No clear historical trend was discovered. Timestamps with huge offsets were detected. The largest Linux commit timestamp was in the year 2085 and the largest HTTP Date header was in the year 2027. This shows that while small timestamps were most common, large errors will occur. Many underlying causes were proposed while analyzing the data, including timezone handling errors, date format parsing errors, and timestamps being overwritten by automated systems. Many data points were caused by the same group, like IP address blocks used by many domains or Git users (or robots) interacting with multiple commits. Deduplicating these effects left too few data points to perform trend analysis. Synchronizing computer clocks and working with timestamps remains a challenge for the industry. I’m sure there are other data sets that support this kind of measurement. If you’ve got any, I’d love to hear what trends you can discover! How often are computer clocks set to the wrong time? How large do these offsets grow? Can we model clock offsets, and make predictions about them? Are out-of-sync clocks a historical concern that we’ve largely solved, or is this still a concern? Clock skew : the rate at which a clock deviates from a one-second-per-second standard, often measured in parts per million Clock offset : the difference between the displayed time and Coordinated Universal Time (UTC), often measured in seconds

0 views
André Arko 1 months ago

Why are <code>exec</code> and <code>run</code> so confusing?

Originally posted on the Spinel blog . While working on rv , there’s a specific question that has come up over and over again, in many different forms. In the simplest possible form, it boils down to: What is the difference between and ? Why have both? We haven’t finished implementing either or yet, but every time one or the other comes up in a conversation, everything instantly becomes more confusing. This post will summarize the history of and in Bundler, npm, Cargo, and uv. Once we have the history laid out, we can take a look at what we plan to do in rv, and you can give us your feedback . Bundler manages project-specific packages, but not generally available “global” commands. Project-specific packages installed with Bundler can include their own commands. While working on Bundler 1.0, we needed a standard way to do something new: run commands completely scoped inside a project, rather than scoped to the entire Ruby installation on the current machine. We tried both a wrapper command ( ) and generating dedicated scripts in the project’s directory. With binstubs, you could run to get the project rake, and to get the global rake. I personally preferred the binstub approach, but it was that ultimately became the popular way to use commands inside your project. My theory is that it won because you can use it to run any command, including , or , or anything else you want. Somewhat confusingly (inspired by the command explained below) there is a separate command that is not related to Bundler and instead installs and runs a command from a package. RubyGems only manages global packages and commands, so is more of a convenience to make it easier to globally install and run a package with just one command. npm manages both project-specific and global packages, and can install any package so its commands are available either only within a project or globally across every project that uses the same version of Node. The project-focused design of npm expects commands from project packages to be run by first adding the command to in the section, and the run via . This is even more inconvenient than Bundler’s binstubs, and so I think there was pent-up demand to be able to “just run a command directly”. That was eventually provided by and its alias . The setup makes it very easy to run any command, whether the package is in the local project or not, whether a script is set up or not, and even whether the package is installed at all or not. Simply running the command is enough to install the needed package and run its command. It’s especially helpful to have available when you need to create a new project by running an npm package, since it’s a huge pain to create a project and install a package and set up a script just so you can run the package to overwrite your project. The most popular example of this I am aware of is , but it’s a popular idiom for many packages that contain commands. Cargo is simalarly a package manager, but unlike Ruby and Node, project-level packages do not include commands, and package commands are installed globally. Library packages are added to a project with , while command packages are installed globally with . Once a package is installed globally, it can simply be available on your , and Cargo no longer has to be involved in running it. The command is extremely limited in scope, and only allows you to build and run a binary created by your own project – it does not work to run commands from packages, either project-local or global. In uv, the command seems to be most strongly inspired by , including having its own short alias of . The command is exclusively for running commands directly from packages, automatically installing them if necessary. To give an example, that means you can use to install and run the github-backup command from the Python package named github-backup, whether or not that packge is included in your current project. Conversely, the command is closer to : it installs and configures Python, installs project packages if inside a project, and then runs a command from or runs a file. That means can be used for both to get a shell with only your project’s Python and packages, and can also be used as to run a script file directly. installs and runs: builds and runs: installs python and any project packages, then runs: installs python and the named package, then runs: With all of that background now set up, what should do? What should do? To be the most like Bundler, we should use to run commands for a project. To be the most like and , we should use to install and run a single package with a command. Today, we’re leaning towards an option that includes all of the useful functionality from every command above, and aligns Ruby tooling with JS tooling and Python tooling: installs ruby and any project packages, then runs: installs ruby and the named package, then runs: If we try to combine those two commands into one command, we quickly run into the ambiguity that has been so frustrating to handle in Bundler for all of these years: if you , do you want the global rake package? Or do you want the project-local rake package? How can we know which you want? In my opinion, solves this relatively elegantly by having always run a package globally, and always run a package locally inside the current project, if one exists. What do you think? Let us know in the GitHub discussion about this post. commands created by your package commands from project packages (like ) commands from $PATH (like ) scripts from files (like ) project-defined script commands (like ), which can call: commands from project packages (like ) commands from $PATH (like ) scripts from files (like ) non-project commands from any package (like ) commands created by your package commands from project packages (like ) commands from $PATH (like ) scripts from files (like ) project-defined script commands (like ), which can call: commands from project packages (like ) commands from $PATH (like ) scripts from files (like ) non-project commands from any package (like ) commands from project packages (like ) commands from $PATH (like ) scripts from files (like ) project-defined script commands (like ), which can call: commands from project packages (like ) commands from $PATH (like ) scripts from files (like ) non-project commands from any package (like )

0 views
pabloecortez 1 months ago

The Year 2025 for powRSS

One of the fun side projects I've taken on this year has been powRSS , the public RSS feed aggregator for the Indieweb. As 2025 comes to a close, I want to put together a summary of the things that went on. I'm a strong believer of building in public, and that includes talking about the goals, successes and failures. Here is how powRSS did this year, from May 21st when it launched to today, December 14th. Thank you so much to all of you who helped support this project! It's so rewarding to see the response it has gotten in the eight months it has been running. I'm excited about what the next year has in store for us! On Friday May 21, Fred Rocha wrote a blog post titled Small (web) is beautiful in which he talked about digital gardens, the indieweb, and the challenge of discovering new sites and independent voices to follow. I replied to him with a blog post where I put together some of the resources I knew about like Andreas Gohr's Indieplog.page and Viktor Lofgren's Marginalia Search . During this time I had been wanting to get back into the Gemini Protocol, as that project was what introduced me to the small and personal web about five years ago. I loved the ethos and the community aspect of it all. When documentation wasn't available to achieve something, I knew I could ask for help and many kind folks would be glad to offer advice. That evening I put together a quick proof of concept written in Ruby and launched the following morning. I find that the desire to help and build together remains true today with Indieweb communities, and I'm grateful for the comments, advice, and feedback I've received about powRSS since it launched. This version was a static page, set to rebuild every 12 hours with new posts from its list of known feeds. It's actually very similar to the way lettrss works to send out each book chapter :-) During this time all blog submissions were handled via e-mail. I added my e-mail address to my blog and when people came across the project they'd send me links to their RSS feeds. About a week later, as more people began submitting their blogs to be added to the feed, I decided to add categories and a dedicated submissions form. On the afternoon of my birthday, May 31, I came across a post from Joan Westenberg: Independent sites who don’t have the resources to compete with major platforms in visibility and search rankings, lose traffic and, consequently, viability. As a result, entire categories of information and smaller communities become less accessible, hidden behind the algorithms of the dominant, bloated tech giants. I took this quote and shared a link to powRSS on Mastodon, and this is where things got even more exciting! Post by @[email protected] View on Mastodon Westenberg, who has 30k followers, made powRSS visible to a lot more people, and that meant receiving way more submissions and responding to new kinds of feedback. One of the first great suggestions came from Alex White who sent me a message suggesting the addition of a "Random site" feature like StumbleUpon. That seemed really fun to implement, so I wrote another blog post announcing the new feature. With more blogs being added to powRSS, I began spending more time going through submissions. It's important to me that powRSS remains a space for human creativity, independent voices, and the serendipity of coming across people who, like you, understand that the web is indeed beautiful. The things we read and interact with inform our decisions and strengthen our convictions, so cultivating a space that enables this type of discovery matters. Today I continue to manually review all submissions. I like knowing that every link on powRSS takes me to the website of another person who took the time and care to build out a space for themselves on the internet. My absolute favorite part of this project has been discovering blogs I would have never come across otherwise and having conversations with those authors. Around November I wanted to give powRSS a more retro feel to better reflect its mission. In this design the two-column layout on desktop was important because I wanted those recently-added blogs to also have some discoverability. As you can imagine, some authors write more frequently than others. Some of you write every few months, and if you were to add your blog to powRSS without a recent blog post, it could take a while before others knew about your blog. The "new to powRSS" column made it easy to find blogs which maybe didn't have recent posts but you also knew were being actively maintained, since each addition to powRSS requires the manual submission from its author. Indeed, some of you told me you felt more excited about blogging again knowing that your posts were definitely going to be seen by others! As you can see, powRSS no longer had categories like before. I thought a while before getting rid of them, and I think in retrospect it was a mistake, so I brought them back with a twist. I do want to explain my reasoning though. By giving blogs a strict category, we end up pigeonholing authors, especially those who have personal sites. I love seeing personal stories along with pictures of a trip or the last book you read even if your blog is mainly about programming or photography or sports. The whole point of the personal blog is to have that freedom. "Can I still share pictures of my dog if I'm in the Technology category?" was a question I received, so I realized site-wide categories weren't the way to go. However, there is of course a benefit to knowing about the blog you're about to visit, so I chose a happy middle ground by adding brief category labels below each blog. This was added in time for the Winter redesign I launched at the beginning of December. Here is what powRSS looks like today: powRSS today Thank you all for making the web more exciting, more vibrant, and more human. Have an excellent rest of the year! Grateful, Pablo Enoc

0 views
Chris Coyier 1 months ago

Media Diet

🎵 Florence + The Machine, Everybody Scream — I have no prior Florence experience but really like this album. The whole “start slow and build a song to a wild ass peak” thing works for me. “You Can Have It All” is a favorite. A little called out by “It must be nice to be a man and make boring music just because you can” in “One of the Greats” but, fortunately, narrowly escape as I only aspire to make music that is good enough to be boring. 🎧 Get Up in the Cool  — Podcast from Cameron DeWhitt totally focused on the heavily niche musical interest of Old Time. My friend Darin was on recently . 🎥 Predator: Badlands — What a friggin masterpiece. This is why going to the movies exists. I loved how predator boy shows up at the end and uses like every single thing he learned on the death planet in the final fight. 🕹️ Ball x Pit — Just fun as heck. I the end I didn’t even hate the meta progression city building stuff. Really enjoyed the different characters and upgrades that take some of the monotony away just as you’re starting to feel it. 🎥 Eddington (why don’t all movies have an obvious “this is the official website of the movie” website? If I made a movie you’d better as hell know it would have a banger website.) — I loved how this movie evokes how it feels to observe the foreverbattle of the far right and far left. And how there is a sliding scale of just how crazy any given person on either side can be. When the movie gives way to violence, it felt like a release to me, like obviously this is where things are going. 📺 Stranger Things  — Just a fan like everyone else. Anxiously awaiting this story to continue, wasn’t disappointed when it did, and can’t wait for Christmas for the next four. 📕 I am Rebel  — A very helpful lady at Barnes & Nobel helped Ruby and I find the perfect book to buy with her birthday gift card. We both really enjoyed this story of a dog sticking to his feelings and finding his owner despite changes, both physical and emotional. I wanted a little more about the corrupt king and the revolutionaries plan, but that would have aged the book up. 🎵 Western Centuries — What an amazing band, I’m sorry I missed their active period. They lost a guy and must have just called it, understandably. This is just perfection to me:

0 views
Simon Willison 1 months ago

Under the hood of Canada Spends with Brendan Samek

I talked to Brendan Samek about Canada Spends , a project from Build Canada that makes Canadian government financial data accessible and explorable using a combination of Datasette, a neat custom frontend, Ruby ingestion scripts, sqlite-utils and pieces of LLM-powered PDF extraction. Here's the video on YouTube . Sections within that video: Build Canada is a volunteer-driven non-profit that launched in February 2025 - here's some background information on the organization, which has a strong pro-entrepreneurship and pro-technology angle. Canada Spends is their project to make Canadian government financial data more accessible and explorable. It includes a tax sources and sinks visualizer and a searchable database of government contracts, plus a collection of tools covering financial data from different levels of government. The project maintains a Datasette instance at api.canadasbilding.com containing the data they have gathered and processed from multiple data sources - currently more than 2 million rows plus a combined search index across a denormalized copy of that data. The highest quality government financial data comes from the audited financial statements that every Canadian government department is required to publish. As is so often the case with government data, these are usually published as PDFs. Brendan has been using Gemini to help extract data from those PDFs. Since this is accounting data the numbers can be summed and cross-checked to help validate the LLM didn't make any obvious mistakes. You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options . 02:57 Data sources and the PDF problem 05:51 Crowdsourcing financial data across Canada 07:27 Datasette demo: Search and facets 12:33 Behind the scenes: Ingestion code 17:24 Data quality horror stories 20:46 Using Gemini to extract PDF data 25:24 Why SQLite is perfect for data distribution datasette.io , the official website for Datasette sqlite-utils.datasette.io for more on Canada Spends BuildCanada/CanadaSpends on GitHub

0 views
Rob Zolkos 1 months ago

Fizzy Design Evolution: A Flipbook from Git

After writing about the making of Fizzy told through git commits , I wanted to see the design evolution with my own eyes. Reading about “Let’s try bubbles” and “Rename bubbles => cards” is one thing. Watching the interface transform over 18 months is another. So I got to work: I went through each day of commits in the Fizzy repository , got the application to a bootable state, seeded the database, and took a screenshot. Then I stitched those screenshots together into a flipbook-style video. Here’s the final result - I hope you enjoy it! Read on below for details about the process and the backing music. The Fizzy repository has 8,152 commits spanning from June 2024 to November 2025. I extracted the git SHA of the last commit from each day, excluding dependabot commits - a good proxy for the state of the app at the end of each work day. With that list I then copied the Fizzy source into dated folders and checked out the corresponding SHA for each. I then wrote and ran a sanity check script to ensure the right Ruby versions were installed, that worked, and that database migrations and seeds completed successfully. Once each folder was ready, the screenshot process began: I worked in batches of 10 checkpoints at a time to verify I was capturing the right page, since routes changed throughout the project’s history. Not every checkpoint was bootable. Schema migrations sometimes broke older seeds. Gems changed. Dependencies shifted—including private gems I didn’t have access to. Some checkpoints required manual fixes to get the application running. But persistence paid off - I captured all the screenshots to show the visual journey from “splats on a windshield” (the original playful blob metaphor) to the polished kanban board that shipped. As I watched the completed flipbook, it lacked any backing music—kinda boring! I remembered Jason Fried mentioning on the Fizzy livestream demo that if you hold Shift while dragging a card to another column, Fizzy plays a little sound—a different one per column. So I did what anyone making a flipbook out of git history screenshots in 2025 would do: I asked Claude to stitch those original sound files from the repo into a soundtrack. Because what else would you use to score a Fizzy flipbook besides Fizzy’s own sound files? Rob: Tell me about the audio files in this repo Claude: There are 30 MP3 audio files in public/audio/, organized by instrument: These appear to be musical note samples, likely used for notification sounds or some kind of audio feedback feature in the app. Rob: From these notes is it possible to make a tune? (not asking you to, just whether the notes themselves are condusive to any sort of melodic arrangement?) Claude: Yes, these notes are well-suited for creating melodies. Here's why: The notes available: C3, D4, E3, F#4, G3, B3 This is essentially a G major pentatonic scale (with some octave displacement): Why it works melodically: Limitations: But for short jingles, notification sounds, or simple tunes - absolutely workable. You could easily create something pleasant and recognizable with these. Rob: and with the variations in instruments I take it it would be possible to make it pretty interesting Claude: Exactly. With 5 distinct timbres (banjo, harpsichord, mandolin, piano, vibes), you could: Layer and contrast: Create call-and-response: Build dynamics: Evoke different moods: The instruments also share similar attack characteristics (plucked/struck, quick decay) so they'd blend well together without clashing sonically. Rob: well I want to make a background track for a demo video of this application. it is composed of screenshots of the application as it evolved over time. And then stitched into a flipbook with a 2 second gap between shots. Is this something you can work with and compose a tune with these files? Claude: That sounds like a fun project! Yes, I can work with this. To compose something appropriate, I need a few details: Once I know the duration and mood, I can sketch out a melodic sequence that evolves over the timeline - maybe starting simple and building as the app matures in the screenshots. Rob: a single combined audio file. about 2 minutes long. dramatic would be good but starting off slow tempo and then building into a fast crescendo. reminiscent of early computer demos Claude: Love it. Early computer demo scene vibes - think Amiga/C64 era, building from sparse arpeggios to a frenetic finale. Let me check if ffmpeg is available and explore the audio files a bit more... From there, Claude ensured ffmpeg was installed and got to work. We iterated a few times - “vibe music produced” - until it sounded right. A really fun way to bring some life to what would have otherwise been a silent flipbook. Watching the design evolve frame by frame reveals things the commit history can only hint at: Epicenter design. You can see epicenter design in action—a philosophy that’s been at the core of 37signals’ process for decades. The team started with the most essential elements (the grouping of bubbles/cards) and built outward from there. Navigation, chrome, and secondary features came later. The flipbook shows this progression clearly: core functionality first, polish second. The color exploration. Early versions experimented with bold, vibrant colors. The palette refined over time into something more subdued and professional while retaining the signature fun style 37signals applications are known for. Layout density. The interface started sparse and grew more information-dense as features accumulated, then simplified again in the later months. The bubble-to-card transition. You can see the playful, organic blob shapes gradually give way to more structured rectangular cards. The whimsy of the early windshield metaphor fades into the pragmatism of a task management tool. Typography and spacing. Subtle but constant. Every few frames you notice tighter line heights, adjusted padding, refined font weights. The polish accumulated through hundreds of small commits. Animation. You can’t see the subtle hover animations in a flipbook of static screenshots, but the team experimented with movement and texture throughout development. Here are a few examples: This kind of archaeology is only possible because 37signals kept the full git history intact. Most companies would have squashed or rebased this history into oblivion before open-sourcing. The fact that they didn’t is a gift to anyone curious about how software really gets made. Let me know if you try something similar with another codebase. The process is time-consuming but oddly satisfying. Boot the application Use Playwright to log in and navigate to the correct page Capture the screenshot banjo/ - 6 files (B3, C3, D4, E3, Fsharp4, G3) harpsichord/ - 6 files (same notes) mandolin/ - 6 files (same notes) piano/ - 6 files (same notes) vibes/ - 6 files (same notes) G, B, C, D, E, F# → all notes from G major, missing only A Pentatonic foundation - Pentatonic scales are inherently melodic and almost impossible to make sound "wrong." They're used across virtually all musical cultures. No dissonant intervals - There are no half-step clashes. Any combination of these notes will sound consonant together. Strong tonal center - G3 provides a natural "home" note, with B3 and D4 forming a G major triad. Range - Spanning from C3 to F#4 (about an octave and a half) gives enough range for interesting melodic contour. Only 6 notes, so extended melodies might feel repetitive The octave jumps (C3 vs D4, E3 vs F#4) could make smooth stepwise motion tricky No A note, which limits some common melodic patterns Use piano or vibes for a main melody line Add harpsichord for ornamentation or counterpoint Bring in banjo or mandolin for rhythmic texture Piano plays a phrase, vibes answer Or trade off between the plucked strings (banjo/mandolin) and the keyboard instruments Start sparse with a single instrument Layer in others for a fuller sound Drop back down for resolution Vibes alone = mellow, dreamy Harpsichord = baroque, playful Banjo = folksy, upbeat Piano = neutral, versatile Mandolin = bright, energetic How many screenshots/frames are there? (This tells me the total duration - frames × 2 seconds) What's the overall vibe you're going for? Upbeat and celebratory (look how far we've come!) Calm and reflective (a journey through time) Playful and quirky Professional/polished What format do you need the output in? A single combined audio file (I'd need to use ffmpeg or similar to stitch the samples) A script/sequence that describes what plays when (for you to assemble) Something else? Any preference on lead instrument? Piano tends to be the safest "neutral" choice, but if you have a vibe in mind, let me know.

0 views
DHH 1 months ago

Fizzy is our fun, modern take on Kanban (and we made it open source!)

Kanban is a simple, practical approach to visually managing processes and backlogs by moving work cards from one progress column to another. Toyota came up with it to track their production lines back in the middle of the 20th century, but it's since been applied to all sorts of industries with great effect. And Fizzy is our new fun, modern take on it in digital form. We're certainly not the first to take a swing at this, not even for software development. Since the early 2000s, there's been a movement to use the Kanban concept to track bugs, issues, and ideas in our industry. And countless attempts to digitize the concept over the years.  But as with so much other software, good ideas can grow cumbersome and unwieldy surprisingly quickly. Fizzy is a fresh reset of an old idea. We need more of that.  Very little software is ever the final word on solving interesting problems. Even products that start out with great promise and simplicity tend to accumulate cruft and complexity over time. A healthy ecosystem needs a recurring cycle of renewal. We've taken this mission to heart not just with Fizzy's fun, colorful, and modern implementation of the Kanban concept, but also in its distribution.  Fizzy is available as a service we run where you get 1,000 cards for free, and then it's $20/month for unlimited usage. But we're also giving you access to the entire code base, and invite enterprising individuals and companies to run their own instance totally free of charge. This is done under the O'Saasy License, which is basically the do-whatever-you-want-just-don't-sue MIT License, but with a carve-out that reserves the commercialization rights to run Fizzy as SaaS for us as the creators. That means it's not technically Open Source™, but the source sure is open, and you can find it on our public GitHub repository. That open source is what we run too. So new features or bugs fixes accepted on GitHub will make it into both our Fizzy SaaS offering and what anyone can run on their own hardware. We've already had a handful of contributions go live like this! Ultimately, it's our plan to let data flow freely between the SaaS and the local installations. You'll be able to start an account on your own instance, and then, if you'd rather we just run it for you, take that data with you into the managed setup. Or the other way around! In an age where SaaS companies come and go, pivot one way or the other, I think it's a great reassurance that the source code is freely available, and that any work put into a SaaS account is portable to your own installation later. I'm also just a huge fan of being able to View Source. Traditionally, that's been reserved to the front end (and even that has been disappearing due to the scourge of minimization, transpiling, and bundling), but I'm usually even more interested in seeing how things are built on the backend. Fizzy allows you full introspection into that. Including the entire history of how the product was built, pull request by pull request. It's a great way to learn how modern Rails applications are put together! So please give Fizzy a spin. Whether you're working on software, with a need to track those bugs and feature requests, or you're in an entirely different business and need a place for your particular issues and ideas. Fizzy is a fresh, fun way to manage it all, Kanban style. Enjoy!

0 views
Rob Zolkos 1 months ago

The Making of Fizzy, Told by Git

Today Fizzy was released and the entire source code of its development history is open for anyone to see . DHH announced on X that the full git history is available - a rare opportunity to peek behind the curtain of how a 37signals product comes together. I cloned down the repository and prompted Claude Code: “Can you go through the entire git history and write a documentary about the development of this application. What date the first commit was. Any major tweaks, changes and decisions and experiments. You can take multiple passes and use sub-agents to build up a picture. Make sure to cite commits for any interesting things. If there is anything dramatic then make sure to see if you can figure out decision making. Summarize at the end but the story should go into STORY.md” It responded with: “This is a fascinating task! Let me create a comprehensive investigation plan and use multiple agents to build up a complete picture of this project’s history.” Here is the story of Fizzy - as interpreted by Claude - from the trail of git commits. Enjoy! A chronicle of 18 months of development at Basecamp, told through 8,152 commits. At 1:19 PM on a summer Friday, Kevin McConnell typed the words that would begin an 18-month journey: Within hours, the foundation was laid. The team moved with practiced efficiency: By end of day, the skeleton of a Rails application stood ready. But what would it become? One month after inception, Jason Zimdars introduced the application’s first real identity: A “Splat” — the name evokes something chaotic, impactful, unexpected. Like a bug hitting your windshield on a summer drive. The original data model was simple: The next day brought the visual metaphor that would define the early application: The windshield was the canvas. Splats appeared on it like bugs on glass — colorful, slightly chaotic, each one a piece of information demanding attention. The commits reveal urgency. Something important was coming: The all-hands demo. Approximately one month after project inception, Fizzy (then still called “Splat”) was shown to the entire company. The pressure to polish was evident in the commit messages. Seven days after the windshield metaphor was established, Jason Zimdars typed four words that would reshape the application’s identity: The chaotic “splat” gave way to something gentler — bubbles floating on a windshield , like soap suds catching light. The animation changed from aggressive splattering to gentle floating: Perfect circles gave way to hand-drawn blob shapes. The team was discovering what their product was through the act of building it. A new interaction pattern emerged: When users “boosted” a bubble, it would puff up and float away — like champagne fizz rising. The animation: The metaphor was crystallizing. Bubbles. Fizzing. Effervescence. The name would come soon. In a single day, the application found its final name through two commits: 42 files changed. The model, controllers, views, tests — everything touched. Hours later: Fizzy. The name captured everything: the bubbles, the effervescence, the playful energy of the interface. Visual design had driven product naming — the team discovered what they were building through the act of building it. The flat list of bubbles needed structure: But “Projects” didn’t feel right. Eight days later: Then “Bucket” became “Collection.” Eventually, “Collection” would become “Board.” The terminology dance — Projects → Buckets → Collections → Boards — reveals a team searching for the right mental model. They ultimately landed on the familiar “Board” metaphor, aligning with tools like Trello and Linear. David Heinemeier Hansson, creator of Ruby on Rails and co-founder of Basecamp, made his first contribution with characteristic pragmatism: He deleted an unused image file. It was a statement of intent. Within two days, DHH’s fingerprints were everywhere: He upgraded the entire application to Rails 8 release candidate and systematically added HTTP caching throughout. DHH’s most distinctive contribution was his crusade against what he called “anemic” code — thin wrappers that explain nothing and add needless indirection. He used this term 15 times in commit messages: Philosophy: Code should either add explanatory value OR hide implementation complexity. Thin wrappers that do neither are “anemic” and should be eliminated. Then came April 2025. DHH made 323 commits in a single month — 55% of his total contributions compressed into 30 days. This was a surgical strike. He: His commit messages tell the story: In DHH’s philosophy: deletion is a feature, not a bug. After 10 months as “Bubbles,” another transformation: 333 files changed. “Pop” (completing a bubble) became “Closure” (closing a card). The playful metaphor gave way to task management vocabulary. The final architectural piece: Fizzy had become a kanban board . Cards lived in columns. Columns could be customized, colored, reordered. The application had evolved from “bugs on a windshield” to a sophisticated project management tool. Collections became Boards. The transformation was complete: Original (July 2024): Final (November 2025): A Claude-powered AI assistant that could answer questions about project content. Born, restricted to staff, then removed entirely. Perhaps replaced by the more ambitious MCP (Model Context Protocol) integration — making Fizzy AI-native at the protocol level rather than bolting on a chatbot. Emoji reactions for cards and comments. Added. Removed. Then added again. The git history shows healthy debate — not everything that ships stays shipped, and not everything removed stays gone. Saved custom views were replaced by ephemeral quick filters. Complexity gave way to simplicity. Predefined workflows with stages were removed in favor of ad-hoc column organization. Users would create their own structure. The MCP (Model Context Protocol) branch represents cutting-edge AI integration — allowing Claude and other AI assistants to interact with Fizzy programmatically. An manifest advertises Fizzy’s capabilities to AI clients. Status: Removed from main, but the infrastructure remains fascinating. This is one of the earliest explorations of making traditional web applications AI-native. Multiple parallel branches exploring different approaches to mobile column navigation. Scroll snapping. Contained scrolling. Swipeable columns. The problem remains unsolved — there’s no “one true way” for mobile kanban navigation. Making Fizzy work with SQLite in addition to MySQL. Simpler local development. Better portability. The search index was even sharded into 16 tables ( through ) for scale. The proprietary SAAS features were extracted into a separate gem. What remained was a clean, open-source Rails application. After 18 months of development, 8,152 commits, and countless pivots, Fizzy became open source. Jason Zimdars (2,217 commits) — The visual architect. From “Let’s try bubbles” to pixel-perfect polish. Jorge Manrubia (2,053 commits) — The engineering backbone. Consistent, prolific, essential. Andy Smith (1,007 commits) — Front-end craftsmanship and UI refinement. Mike Dalessio (875 commits) — Infrastructure, performance, the recent dashboard work. David Heinemeier Hansson (586 commits) — The architectural enforcer. Rails modernization and the war on anemic code. Kevin McConnell (351 commits) — Started it all with “New Rails app.” Jose Farias (341 commits) — Feature development and testing. Stanko K.R. (239 + 54 commits) — Security hardening and webhook restrictions. Jeffrey Hardy (100 commits) — Early infrastructure and modernization. Jason Fried (7 commits) — The occasional “Small copy adjustment” from the CEO. July 2024 (v0.1): September 2024 (v0.2): November 2025 (v1.0): The story of Fizzy is the story of discovery through building . The team didn’t know they were building a kanban board when they started with “splats on a windshield.” They found out through iteration. Key lessons: Names matter, but they can change. Splat → Bubble → Card. Project → Bucket → Collection → Board. The right name emerges through use. Deletion is a feature. Boosts, Fizzy Ask, custom views, workflows — removing the wrong features is as important as adding the right ones. Architecture evolves. The final column-based kanban system looks nothing like the original flat list of splats. DHH’s philosophy: Remove anemic code. Keep transactions short. Use the latest Rails. Delete more than you add. Design drives naming. “Fizzy” emerged from the visual metaphor of bubbles puffing up and floating away — the design informed the brand. Open source takes extraction. 18 months of SAAS development needed careful separation before the core could be shared. The git history of Fizzy is a masterclass in iterative product development. 8,152 commits. 25+ contributors. 18 months. One application that discovered its identity through the act of creation. “Let’s try bubbles.” — Jason Zimdars, July 31, 2024 Documentary compiled December 2, 2025 Based on analysis of the Fizzy git repository First Commit: June 21, 2024 Total Commits: 8,152 Contributors: 25+ Lines of Code Changed: Hundreds of thousands Name Changes: 4 (Splat → Bubble → Card; Project → Bucket → Collection → Board) Features Removed: At least 4 major ones DHH Commits in April 2025 Alone: 323 1:23 PM — Gemfile updated ( ) 3:47 PM — Rubocop configured ( ) 4:07 PM — Minimal authentication flow ( ) 4:29 PM — CSS reset and base styles ( ) 4:46 PM — Brakeman security scanning added ( ) Removed the entire Boosts feature ( ) — 299 lines across 27 files, gone Eliminated activity scoring ( , , ) Extracted RESTful controllers from overloaded ones ( , ) Enforced transaction discipline ( — “No long transactions!”) Splats on a Windshield Cards → Columns → Boards → Accounts Jason Zimdars (2,217 commits) — The visual architect. From “Let’s try bubbles” to pixel-perfect polish. Jorge Manrubia (2,053 commits) — The engineering backbone. Consistent, prolific, essential. Andy Smith (1,007 commits) — Front-end craftsmanship and UI refinement. Mike Dalessio (875 commits) — Infrastructure, performance, the recent dashboard work. David Heinemeier Hansson (586 commits) — The architectural enforcer. Rails modernization and the war on anemic code. Kevin McConnell (351 commits) — Started it all with “New Rails app.” Jose Farias (341 commits) — Feature development and testing. Stanko K.R. (239 + 54 commits) — Security hardening and webhook restrictions. Jeffrey Hardy (100 commits) — Early infrastructure and modernization. Jason Fried (7 commits) — The occasional “Small copy adjustment” from the CEO. July 24, 2024: “Handful of tweaks before all-hands” — Demo day pressure July 31, 2024: “Let’s try bubbles” — The visual pivot September 4, 2024: “Splat -> Fizzy” — Finding the name April 2025: DHH’s 323-commit refactoring blitz October 2025: “Remove Fizzy Ask” — The AI feature that didn’t survive November 28, 2025: “Initial README and LICENSE” — Going public Rails 8.x — Always on the latest, sometimes ahead of stable Hotwire (Turbo + Stimulus) — No heavy JavaScript framework Solid Queue & Solid Cache — Rails-native background jobs and caching SQLite + MySQL support — Database flexibility Kamal deployment — Modern container orchestration UUID primary keys — Using UUIDv7 for time-ordering Multi-tenancy — Account-based data isolation Names matter, but they can change. Splat → Bubble → Card. Project → Bucket → Collection → Board. The right name emerges through use. Deletion is a feature. Boosts, Fizzy Ask, custom views, workflows — removing the wrong features is as important as adding the right ones. Architecture evolves. The final column-based kanban system looks nothing like the original flat list of splats. DHH’s philosophy: Remove anemic code. Keep transactions short. Use the latest Rails. Delete more than you add. Design drives naming. “Fizzy” emerged from the visual metaphor of bubbles puffing up and floating away — the design informed the brand. Open source takes extraction. 18 months of SAAS development needed careful separation before the core could be shared.

0 views
Rob Zolkos 1 months ago

Fizzy Webhooks: What You Need to Know

Fizzy is a new issue tracker ( source available ) from 37signals with a refreshingly clean UI. Beyond looking good, it ships with a solid webhook system for integrating with external services. For most teams, webhooks are the bridge between the issues you track and the tools you already rely on. They let you push events into chat, incident tools, reporting pipelines, and anything else that speaks HTTP. If you are evaluating Fizzy or planning an integration, understanding what these webhooks can do will save you time. I also put together a short PDF with the full payload structure and example code, which I link at the end of this post if you want to go deeper. Here are a few ideas for things you could build on top of Fizzy’s events: If you want to go deeper, you can also build more opinionated tools that surface insights and notify people who never log in to Fizzy: Here is how to set it up. Step 1. Visit a board and click the Webhook icon in the top right. Step 2. Give the webhook a name and the payload URL and select the events you want to be alerted to. Step 3. Once the webhook saves you will see a summary of how it is setup and most importantly the webhook secret which you will need for your handler for securing the webhook. There is also a handy event log showing you when an event was delivered. Since I like to tinker with these sorts of things, I built a small webhook receiver to capture and document the payload structures. Fizzy sends HTTP POST requests to your configured webhook URL when events occur. Each request includes an header containing an HMAC-SHA256 signature of the request body. The verification process is straightforward: Fizzy covers the essential card lifecycle events: The approach was straightforward: I wrote a small Ruby script using WEBrick to act as a webhook receiver. The script listens for incoming POST requests, verifies the HMAC-SHA256 signature (using the webhook secret Fizzy provides when you configure webhooks), and saves each event as a separate JSON file with a timestamp and action name. This made it easy to review and compare the different event types later. To expose my local server to the internet, I used ngrok to create a temporary public URL pointing to port 4002. I then configured Fizzy’s webhook settings with this ngrok URL and selected the event types I wanted to capture. With everything set up, I went through Fizzy’s UI and manually triggered each available event: creating cards, adding comments, assigning and unassigning users, moving cards between columns and boards, marking cards as done, reopening them, postponing cards to “Not Now”, and sending cards back to triage. Each action fired a webhook that my script captured and logged. In total, I captured 13 webhook deliveries covering 10 different action types. The only event I could not capture was “Card moved to Not Now due to inactivity” — Fizzy triggers this automatically after a period of card inactivity, so it was not practical to reproduce during this test. Card body content is not included. The card object in webhook payloads only contains the , not the full description or body content. Comments include both and versions, but cards do not. Since Fizzy doesn’t have a public API ( DHH is working on it ), you can’t fetch the full card content programmatically - you’ll need to use the field to view the card in the browser. Column data is only present when relevant. The object only appears on , , and events - the events where a card actually moves to a specific column. IDs are strings, not integers. All identifiers in the payload are strings like , not numeric IDs. I created a short webhook documentation based on this research: FIZZY_WEBHOOKS.pdf It includes the full payload structure, all event types with examples, and code samples for signature verification in both Ruby and JavaScript. Hopefully this helps you get up and running with Fizzy’s webhooks. Let me know if you discover additional events or edge cases. Since the source code is available, you can also submit PRs to fix or enhance aspects of the webhook system if you find something missing or want to contribute improvements. A team metrics dashboard that tracks how long cards take to move from to and which assignees or boards close issues the fastest. Personal Slack or Teams digests that send each person a daily summary of cards they created, were assigned, or closed based on , , , and events. A churn detector that flags cards that bounce between columns or get sent back to triage repeatedly using , , and . A cross-board incident view that watches to keep a separate dashboard of cards moving into your incident or escalation boards. A comment activity stream that ships events into a search index or knowledge base so you can search discussions across boards. Stakeholder status reports that email non-technical stakeholders a weekly summary of key cards: what was created, closed, postponed, or sent back to triage on their projects. You can group by label, board, or assignee and generate charts or narrative summaries from , , , and events. Capacity and load alerts that watch for people who are getting overloaded. For example, you could send a notification to a manager when someone is assigned more than N open cards, or when cards assigned to them sit in the same column for too long without a or event. SLA and escalation notifications that integrate with PagerDuty or similar tools. When certain cards (for example, labeled “Incident” or on a specific board) are not closed within an agreed time window, you can trigger an alert or automatically move the card to an escalation board using , , and . Customer-facing status updates that keep clients in the loop without giving them direct access to Fizzy. You could generate per-customer email updates or a small status page based on events for cards tagged with that customer’s name, combining , , and to show progress and recent discussion. Meeting prep packs that assemble the last week’s events for a given board into a concise agenda for standups or planning meetings. You can collate newly created cards, reopened work, and high-churn items from , , , and , then email the summary to attendees before the meeting. - new card created - card moved to a column / - assignment changes - card moved to Done - card reopened from Done - card moved to Not Now - card moved back to Maybe? - card moved to different board - comment added to a card

0 views
DHH 1 months ago

Six billion reasons to cheer for Shopify

Black Friday is usually when ecommerce sets new records. This has certainly been true for Shopify through most of its existence. So much so that the company spends months in advance preparing for The Big Day(s). You'd think after more than twenty years, though, that things would have leveled out. But you'd be wrong. This year, merchants sold an astounding $6.2 billion worth of wares through Shopify on Black Friday. That's up 25% from last year, when the record was ~$5 billion. Just crazy high growth on a crazy big base. The law of big numbers clearly hasn't found a way to apply itself here yet! That volume of orders means the Shopify monolith gets put through its paces. The backend API peaked at 31 million requests per minute. The databases carried 53 million reads and 2 million writes per second. Bonkers. It's this kind of frontier load and criticality that makes Shopify the ideal patron saint of the Rails framework and the Ruby programming language.  Rarely do the stars align to shine so brightly that a single company is stewarded by a still-active programmer with a stellar pedigree of core contributions, saddled with such unceasing success, faced with a constant barrage of novel technical challenges, and willing to contribute everything they learn and build back into the open-source base pillars. But that's Shopify. Ultimately, this is all downstream from being a founder-led business. Tobi Lütke not only served on the Rails core team in the early days, but continues to steer the Shopify ship with a programmer's eye for detail and exploration. The latest release of Omarchy even features his new Try tool. How many CEOs of companies worth two hundred billion dollars still program like that? Despite all this, there's occasionally still some fringe consternation in the Ruby world about Shopify's dominance. In Rails, Shopify employs almost half the core contributors. In Ruby, they have several people on the core team too. Seeing this as anything but a blessing is silly, though. We wouldn't have such battle-tested releases of Rails without Shopify running production on the framework's edge. We wouldn't have gotten YJIT without the years of effort they sunk into improving Ruby's core performance. And we wouldn't have seen the recent production-proving of Ractors without them either. Any programming community should be so lucky as to have a Shopify! Now I'm obviously biased here. Not only have I been friends with Tobi for over twenty years, but I also serve on the board of directors for the company. I'm both socially and economically incentivized to cheer for this extraordinary company. But that doesn't mean it isn't all true too! Shopify is indeed the patron saint of Ruby on Rails. Its infrastructure team is the backbone of our ecosystem, and its continued success the best case study of how far you can take this framework and language. They deserve a gawd damn parade for all they do. So on this Cyber Monday, I say cheers to Tobi, cheers to the thousands of Shopifolk. You're killing it for merchants, shoppers, and all of us working with Ruby on Rails. Bravo.

0 views
Taranis 1 months ago

Datacenters in space are a terrible, horrible, no good idea.

In the interests of clarity, I am a former NASA engineer/scientist with a PhD in space electronics. I also worked at Google for 10 years, in various parts of the company including YouTube and the bit of Cloud responsible for deploying AI capacity, so I'm quite well placed to have an opinion here. The short version: this is an absolutely terrible idea, and really makes zero sense whatsoever. There are multiple reasons for this, but they all amount to saying that the kind of electronics needed to make a datacenter work, particularly a datacenter deploying AI capacity in the form of GPUs and TPUs, is exactly the opposite of what works in space. If you've not worked specifically in this area before, I'll caution against making gut assumptions, because the reality of making space hardware actually function in space is not necessarily intuitively obvious. The first reason for doing this that seems to come up is abundant access to power in space. This really isn't the case. You basically have two options: solar and nuclear. Solar means deploying a solar array with photovoltaic cells – something essentially equivalent to what I have on the roof of my house here in Ireland, just in space. It works, but it isn't somehow magically better than installing solar panels on the ground – you don't lose that much power through the atmosphere, so intuition about the area needed transfers pretty well. The biggest solar array ever deployed in space is that of the International Space Station (ISS), which at peak can deliver a bit over 200kW of power. It is important to mention that it took several Shuttle flights and a lot of work to deploy this system – it measures about 2500 square metres, over half the size of an American football field. Taking the NVIDIA H200 as a reference, the per-GPU-device power requirements are on the order of 0.7kW per chip. These won't work on their own, and power conversion isn't 100% efficient, so in practice 1kW per GPU might be a better baseline. A huge, ISS-sized, array could therefore power roughly 200 GPUs. This sounds like a lot, but lets keep some perspective: OpenAI's upcoming Norway datacenter is intending to house 100,000 GPUs, probably each more power hungry than the H200. To equal this capacity, you'd need to launch 500 ISS-sized satellites. In contrast, a single server rack (as sold by NVIDIA preconfigured) will house 72 GPUs, so each monster satellite is only equivalent to roughly three racks. Nuclear won't help. We are not talking nuclear reactors here – we are talking about radioisotope thermal generators (RTGs) , which typically have a power output of about 50W - 150W. So not enough to even run a single GPU, even if you can persuade someone to give you a subcritical lump of plutonium and not mind you having hundreds of chances to scatter it across a wide area when your launch vehicle explosively self-disassembles. Thermal Regulation I've seen quite a few comments about this concept where people are saying things like, "Well, space is cold, so that will make cooling really easy, right?" Really, really no. Cooling on Earth is relatively straightforward. Air convection works pretty well – blow air across a surface, particularly one designed to have a large surface area to volume ratio like a heatsink, will transfer heat from the heatsink to the air quite effectively. If you need more power density than can be directly cooled in this way (and higher power GPUs are definitely in that category), you can use liquid cooling to transfer heat from the chip to a larger radiator/heatsink elsewhere. In datacenters on Earth, it is common to set up cooling loops where machines are cooled via chilled coolant (usually water) that is pumped around racks, with the heat extracted and cold coolant returned to the loop. Typically the coolant is cooled via convective cooling to the air, so one way or another this is how things work on Earth. In space, there is no air. The environment is close enough to a hard, total vacuum as makes no practical difference, so convection just doesn't happen. On the space engineering side, we typically think about thermal management , not just cooling. Thing is, space doesn't really have a temperature as-such. Only materials have a temperature. It may come as a surprise, but in the Earth-Moon system the average temperature of pretty much anything is basically the same as the average temperature of Earth, because this is why Earth has that particular temperature. If a satellite is rotating, a bit like a chicken on a rotisserie, it will tend toward having a consistent temperature that's roughly similar to that of the Earth surface. If it isn't rotating, the side pointing away from the sun will tend to get progressively colder, with a limit due to the cosmic microwave background, around 4 Kelvin, just a little bit above absolute zero. On the sunward side, things can get a bit cooked, hitting hundreds of centigrade. Thermal management therefore requires very careful design, making sure that heat is carefully directed where it needs to go. Because there is no convection in a vacuum, this can only be achieved by conduction, or via some kind of heat pump. I've designed space hardware that has flown in space. In one particular case, I designed a camera system that needed to be very small and lightweight, whilst still providing science-grade imaging capabilities. Thermal management was front and centre in the design process – it had to be, because power is scarce in small spacecraft, and thermal management has to be achieved whilst keeping mass to a minimum. So no heat pumps or fancy stuff for me – I went in the other direction, designing the system to draw a maximum of about 1 watt at peak, dropping to around 10% of that when the camera was idle. All this electrical power turns into heat, so if I can draw 1 watt only while capturing an image, then turn the image sensor off as soon as the data is in RAM, I can halve the consumption, then when the image has been downloaded to the flight computer I can turn the RAM off and drop the power down to a comparative trickle. The only thermal management needed was bolting the edge of the board to the chassis so the internal copper planes in the board could transfer any heat generated. Cooling even a single H200 will be an absolute nightmare. Clearly a heatsink and fan won't do anything at all, but there is a liquid cooled H200 variant. Let's say this was used. This heat would need to be transferred to a radiator panel – this isn't like the radiator in your car, no convection, remember? – which needs to radiate heat into space. Let's assume that we can point this away from the sun. The Active Thermal Control System (ATCS) on the ISS is an example of such a thermal control system. This is a very complex system, using an ammonia cooling loop and a large thermal radiator panel system. It has a dissipation limit of 16kW, so roughly 16 H200 GPUs, a bit over the equivalent to a quarter of a ground-based rack. The thermal radiator panel system measures 13.6m x 3.12 m, i.e., roughly 42.5 square metres. If we use 200kW as a baseline and assume all of that power will be fed to GPUs, we'd need a system 12.5 times bigger, i.e., roughly 531 square metres, or about 2.6 times the size of the relevant solar array. This is now going to be a very large satellite, dwarfing the ISS in area, all for the equivalent of three standard server racks on Earth. Radiation Tolerance This is getting into my PhD work now. Assuming you can both power and cool your electronics in space, you have the further problem of radiation tolerance. The first question is where in space? If you are in low Earth orbit (LEO), you are inside the inner radiation belt, where radiation dose is similar to that experienced by high altitude aircraft – more than an airliner, but not terrible. Further out, in mid Earth orbit (MEO), where the GPS satellites live, they are not protected by the Van Allen belts – worse, this orbit is literally inside them. Outside the belts, you are essentially in deep space (details vary with how close to the Sun you happen to be, but the principles are similar). There are two main sources of radiation in space – from our own star, the Sun, and from deep space. This basically involves charged particles moving at a substantial percentage of the speed of light, from electrons to the nuclei of atoms with masses up to roughly that of oxygen. These can cause direct damage, by smashing into the material from which chips are made, or indirectly, by travelling through the silicon die without hitting anything but still leaving a trail of charge behind them. The most common conseqence of this happening is a single-event upset (SEU), where a direct impact or (more commonly) a particle passing through a transistor briefly (approx 600 picoseconds) causes a pulse to happen where it shouldn't have. If this causes a bit to be flipped, we call this a SEU. Other than damage to data, they don't cause permanent damage. Worse is single-event latch-up. This happens when a pulse from a charged particle causes a voltage to go outside the power rails powering the chip, causing a transistor essentially to turn on and stay on indefinitely. I'll skip the semiconductor physics involved, but the short version is that if this happens in a bad way, you can get a pathway connected between the power rails that shouldn't be there, burning out a gate permanently. This may or may not destroy the chip, but without mitigation it can make it unusable. For longer duration missions, which would be the case with space based datacenters because they would be so expensive that they would have to fly for a long time in order to be economically viable, it's also necessary to consider total dose effects . Over time, the performance of chips in space degrades, because repeated particle impacts make the tiny field-effect transistors switch more slowly and turn on and off less completely. In practice, this causes maximum viable clock rates to decay over time, and for power consumption to increase. Though not the hardest issue to deal with, this must still be mitigated or you tend to run into a situation where a chip that was working fine at launch stops working because either the power supply or cooling has become inadequate, or the clock is running faster than the chip can cope with. It's therefore necessary to have a clock generator that can throttle down to a lower speed as needed – this can also be used to control power consumption, so rather than a chip ceasing to function it will just get slower. The next FAQ is, can't you just use shielding? No, not really, or maybe up to a point. Some kinds of shielding can make the problem worse – an impact to the shield can cause a shower of particles that then cause multiple impact at once, which is far harder to mitigate. The very strongest cosmic rays can go through an astonishing amount of solid lead – since mass is always at a premium, it's rarely possible to deploy significant amounts of shielding, so radiation tolerance must be built into the system (this is often described as Radiation Hardness By Design, RHBD). GPUs and TPUs and the high bandwidth RAM they depend on are absolutely worst case for radiation tolerance purposes. Small geometry transistors are inherently much more prone both to SEUs and latch-up. The very large silicon die area also makes the frequency of impacts higher, since that scales with area. Chips genuinely designed to work in space are taped out with different gate structures and much larger geometries. The processors that are typically used have the performance of roughly a 20-year-old PowerPC from 2005. Bigger geometries are inherently more tolerant, both to SEUs and total dose, and the different gate topologies are immune to latch up, whilst providing some degree of SEU mitigation via fine-grained redundancy at the circuit level. Taping out a GPU or TPU with this kind of approach is certainly possible, but the performance would be a tiny fraction of that of a current generation Earth-based GPU/TPU. There is a you-only-live-once (my terminology!) approach, where you launch the thing and hope for the best. This is commonplace in small cubesats, and also why small cubesats often fail after a few weeks on orbit. Caveat emptor! Communications Most satellites communicate with the ground via radio. It is difficult to get much more than about 1Gbps reliably. There is some interesting work using lasers to communicate with satellites, but this depends on good atmospheric conditions to be feasible. Contrasting this with a typical server rack on Earth, where 100Gbps rack-to-rack interconnect would be considered at the low end, and it's easy to see that this is also a significant gap. Conclusions I suppose this is just about possible if you really want to do it, but I think I've demonstrated above that it would firstly be extremely difficult to achieve, disproportionately costly in comparison with Earth-based datacenters, and offer mediocre performance at best. If you still think this is worth doing, good luck, space is hard. Myself, I think it's a catastrophically bad idea, but you do you.

0 views
Kev Quirk 1 months ago

Static Site Generators

Jan talks about how static site generators are far more complicated than WordPress, despite (ironically) their output being far simpler. Read Post → I enjoyed this post from Jan once I’d translated it from Dutch and could actually read it. He talks about the irony of how the output of a static site generator is so simple, yet they’re somehow difficult to set up and maintain. Since I’ve been around the block when it comes to blogging platforms, I have some thoughts on this. While talking about his time testing Jekyll , Jan says: Ruby dependency mess, this is not what anyone wants, drama. I get where he’s coming from. Dependency issues with Ruby can be annoying. But honestly, I can’t say I’ve had any real problems with Jekyll while using Ubuntu. For me it’s been as simple as following the install instructions , running a quick , and I was off to the races. Yes, setting up a static site is definitely more involved than installing WordPress. I completely agree there. But once it’s up and running, I’ve found there’s very little ongoing maintenance. The worst it gets for me is when a gem updates and I need to run before getting back to work. WordPress, on the other hand, has always been more frustrating for me to manage. There’s just so much noise in the admin UI. So many panels, so many settings, so much stuff most people will never touch. And then you have plugins, their updates, their banners, their upsells. It becomes… messy. That said, if you’re willing to ignore those frustrations, WordPress can be simple. Especially if you stick to an off-the-shelf theme and don’t tinker. The truth is, it’s great that we have all these options. Different people have different expectations, preferences, and levels of comfort. Some folks want a managed dashboard full of knobs and switches. Others want a quiet folder of Markdown files and a build command. I don’t agree that static site generators are inherently more difficult than WordPress. It really comes down to the use case and the person using the tool. Anyway, go read Jan’s post — it’s interesting. And if you have opinions, drop me an email or leave a comment below. Thanks for reading this post via RSS. RSS is great, and you're great for using it. ❤️ Reply to this post by email

0 views
Pat Shaughnessy 1 months ago

Compiling Ruby To Machine Language

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished. Here’s an excerpt from the completely new content for Chapter 4, about YJIT and ZJIT. I’m still finishing this up… so this content is fresh off the page! It’s been a lot of fun for me to learn about how JIT compilers work and to brush up on my Rust skills as well. And it’s very exciting to see all the impressive work the Ruby team at Shopify and other contributors have done to improve Ruby’s runtime performance. To find hot spots, YJIT counts how many times your program calls each function or block. When this count reaches a certain threshold, YJIT stops your program and converts that section of code into machine language. Later Ruby will execute the machine language version instead of the original YARV instructions. To keep track of these counts, YJIT saves an internal counter nearby the YARV instruction sequence for each function or block. Figure 4-5 shows the YARV instruction sequence the main Ruby compiler created for the sum += i block at (3) in Listing 4-1. At the top, above the YARV instructions, Figure 4-5 shows two YJIT related values: jit_entry and jit_entry_calls . As we’ll see in a moment, jit_entry starts as a null value but will later hold a pointer to the machine language instructions YJIT produces for this Ruby block. Below jit_entry , Figure 4-5 also shows jit_entry_calls , YJIT’s internal counter. Each time the program in Listing 4-1 calls this block, YJIT increments the value of jit_entry_calls . Since the range at (1) in Listing 4-1 spans from 1 through 40, this counter will start at zero and increase by 1 each time Range#each calls the block at (3). When the jit_entry_calls reaches a particular threshold, YJIT will compile the YARV instructions into machine language. By default for small Ruby programs YJIT in Ruby 3.5 uses a threshold of 30. Larger programs, like Ruby on Rails web applications, will use a larger threshold value of 120. (You can also change the threshold by passing —yjit-call-threshold when you run your Ruby program.) While compiling your Ruby program, YJIT saves the machine language instructions it creates into YJIT blocks . YJIT blocks, which are distinct from Ruby blocks, each contain a sequence of machine language instructions for a range of corresponding YARV instructions. By grouping YARV instructions and compiling each group into a YJIT block, YJIT can produce more optimized code that is tailored to your program’s behavior and avoid compiling code that your program doesn’t need. As we’ll see next, a single YJIT block doesn’t correspond to a Ruby function or block. YJIT blocks instead represent smaller sections of code: individual YARV instructions or a small range of YARV instructions. Each Ruby function or block typically consists of several YJIT blocks. Let’s see how this works for our example. After the program in Listing 4-1 executes the Ruby block at (3) 29 times, YJIT will increment the jit_entry_calls counter again, just before Ruby runs the block for the 30th time. Since jit_entry_calls reaches the threshold value of 30, YJIT triggers the compilation process. YJIT compiles the first YARV instruction getlocal_WC_1 and saves machine language instructions that perform the same work as getlocal_WC_1 into a new YJIT block: On the left side, Figure 4-6 shows the YARV instructions for the sum += i Ruby block. On the right, Figure 4-6 shows the new YJIT block corresponding to getlocal_WC_1 . Next, the YJIT compiler continues and compiles the second YARV instruction from the left side of Figure 4-7: getlocal_WC_0 at index 2. On the left side, Figure 4-7 shows the same YARV instructions for the sum += i Ruby block that we saw above in Figure 4-6. But now the two dotted arrows indicate that the YJIT block on the right contains the machine language instructions equivalent to both getlocal_WC_1 and getlocal_WC_0 . Let’s take a look inside this new block. YJIT compiles or translates the Ruby YARV instructions into machine language instructions. In this example, running on my Mac laptop, YJIT writes the following machine language instructions into this new block: Figure 4-8 shows a closer view of the new YJIT block that appeared on the right side of Figures 4-6 and 4-7. Inside the block, Figure 4-8 shows the assembly language acronyms corresponding to the ARM64 machine language instructions that YJIT generated for the two YARV instructions shown on the left. The YARV instructions on the left are: getlocal_WC_1 , which loads a value from a local variable located in the previous stack frame and saves it on the YARV stack, and getlocal_WC_0 , which loads a local variable from the current stack from and also saves it on the YARV stack. The machine language instructions on the right side of Figure 4-8 perform the same task, loading these values into registers on my M1 microprocessor: x1 and x9 . If you’re curious and would like to learn more about what the machine language instructions mean and how they work, the section “Adding Two Integers Using Machine Language” discusses the instructions for this example in more detail. Next, YJIT continues down the sequence of YARV instructions and compiles the opt_plus YARV instruction at index 4 in Figures 4-6 and 4-7. But this time, YJIT runs into a problem: It doesn’t know the type of the addition arguments. That is, will opt_plus add two integers? Or two strings, floating point numbers, or some other types? Machine language is very specific. To add two 64-bit integers on an M1 microprocessor, YJIT could use the adds assembly language instruction. But adding two floating pointer numbers would require different instructions. And, of course, adding or concatenating two strings is an entirely different operation altogether. In order for YJIT to know which machine language instructions to save into the YJIT block for opt_plus , YJIT needs to know exactly what type of values the Ruby program might ever add at (3) in Listing 4-1. You and I can tell by reading Listing 4-1 that the Ruby code is adding integers. We know right away that the sum += 1 block at (3) is always adding one integer to another. But YJIT doesn’t know this. YJIT uses a clever trick to solve this problem. Instead of analyzing the entire program ahead of time to determine all of the possible types of values the opt_plus YARV instruction might ever need to add, YJIT simply waits until the block runs and observes which types the program actually passes in. YJIT uses branch stubs to achieve this wait-and-see compile behavior, as shown in Figure 4-9. Figure 4-9 shows the YARV instructions on the left, and the YJIT block for indexes 0000-0002 on the right. But note the bottom right corner of Figure 4-7, which shows an arrow pointing down from the block to a box labeled stub. This arrow represents a YJIT branch. Since this new branch doesn’t point to a block yet, YJIT sets up the branch to point to a branch stub instead.

0 views