Latest Posts (20 found)

SQLAlchemy 2 In Practice - Chapter 5 - Advanced Many-To-Many Relationships

This is the fifth chapter of my SQLAlchemy 2 in Practice book. If you'd like to support my work, I encourage you to buy this book, either directly from my store or on Amazon . Thank you! You have now learned the design blocks used in relational databases. Sometimes, however, these building blocks have to be "tweaked" a bit to achieve a desired goal. This chapter is dedicated to exploring a very useful variation on the many-to-many relationship.

0 views

An Interview with F1 Driver and Venture Capitalist Nico Rosberg About the Drive to Win

Listen to this post: Good morning, This week’s Stratechery Interview is with F1 driver-turned-venture capitalist Nico Rosberg . Rosberg started his F1 career in 2005, and retired after winning the world championship in 2016; Rosberg spent his last four years as teammates on Mercedes with his childhood friend Lewis Hamilton in one of the most intenst teammate rivalries in F1 history. Over the last several years, however, Rosberg has reinvented himself as a venture capitalist, founding Rosberg Ventures , with a specific focus on leveraging his F1 background to build connections between European money and Silicon Valley startups in one direction, and startup products and German businesses in the other. In this interview we cover all aspects of Rosberg’s journey, from having a steering wheel in his crib, pioneering the use of sports psychology in F1, and his decision to retire on top of the world. Then, we discuss how F1 builds connections, the similarities between founders and drivers, and how he realized he could leverage that in a new competition: winning as an investor. What I found particularly interesting is how Rosberg’s background and history seems so varied and unconnected on the surface, yet are clearly linked by a consistent ethos of maximizing opportunity in the service of winning. As a reminder, all Stratechery content, including interviews, is available as a podcast; click the link at the top of this email to add Stratechery to your podcast player. On to the Interview: This interview is lightly edited for clarity. Nico Rosberg, welcome to Stratechery. Nico Rosberg: Thank you very much, Ben, it’s really an honor to be on the show. I hear so much about your show always especially when I’m in the Bay Area. Well, I don’t normally interview venture capitalists on Stratechery, but you are no normal venture capitalist, which you use to your advantage. I want to ask you about that, but needless to say, that made this an easy exception to make, particularly since I’m a big Formula 1 fan. To that end, I always start my interviews talking about the subject background, we may spend a bit more time on yours if that’s okay with you, it’s pretty fascinating. NR: I understand. With pleasure. Okay, good. Well, you were born in 1985 in West Germany to a German mother and a Finnish father. Your father Keke was the 1982 Formula 1 world champion. Was there a steering wheel in your crib when you came home from the hospital? NR: There was actually, yes. (Laughing) Oh, that’s funny. NR: On my Facebook page you would see photos of me in a go-kart when I’m like three years old with a helmet on and everything, so yeah, it was an early discovery of that passion. I’m interested about that because obviously your father was tremendously successful, is he immediately all in on, “You have to do what I did”, or was there ultimately a bit of humoring you, “You can come along and try this but I’m not sure you could ever measure up to what I did?”. NR: There was a go-kart track near our house and he was going there with his friends even before I was born and then when I was born, and then I was six, seven years old, we just gave it a go, I enjoyed it, and I looked pretty fast also. So then he was like, “Maybe this can become a father-son hobby”, it just went from there and then you start doing a race here, a race there, I started winning the races kind of immediately and so that even that hooks me even even more than when you win, of course, it’s amazing, it’s an amazing motivation. So that’s how we just kind of got going and it became an amazing father-son hobby to share. We spent a lot of time with each other, we traveled in a motorhome to the races, so it was really lovely. There definitely is a bit to driving a car very fast. On one hand, of course, you started early, and you see the history of Formula 1 drivers, they start early, but you took to it right away. It’s definitely like father, like son in that regard. NR: Indeed. I think as in every sport — you also see it with golf or tennis — you have to start pretty early now it just gives you a head start and in practicing those skills. And I think, yeah, I guess I inherited some of those genes from my father because we need to be very good at hand-eye coordination, that’s super important. NR: We need to be also very good at processing things very quickly because we have things coming at us at 220 miles an hour, our eyes are flickering left and right all the time, just taking in all the inputs that we’re seeing and also feeling, so I think that also probably has to be a strength of ours. There’s a lot of stuff in your background about your parents really pushing you in terms of academics, learning lots of languages, all that sort of thing. Was that unique to you, or to your bit, it always strikes me that Formula 1 drivers all come across as very intelligent. And to your point, there’s such a high degree of information processing that’s happening on, is that the norm, generally speaking? NR: I think you probably need to be a bit street smarter, at least, to be a successful F1 driver than maybe in some other sports, because we depend so much on this high technology car, and if we’re not able to understand the car, set it up properly, be at least street smart about all these things, then it doesn’t matter how talented you are, you’ll never be able to go fast. So probably I would say that in our sport, yeah, that comes a little bit more to the fore than maybe in other sports. But in my case, actually, my parents pushing me at school was the contrary, my mom and my dad would usually come in late at night and say, “OK, stop now”, because I was always very hard working at school. Somehow we had a group of friends, everybody wanted to achieve, and I wanted to achieve as well, and I had to catch up because I was missing half the week every other week because I was racing. So my parents were more actually telling me to stop now because I was trying to make too much of an effort to catch up. Interesting, because a bit I want to get to here is you’ve had such a widely varying career, even since you finished racing, you finished relatively young , and so that has been a theme for you all along, is like you born with the steering wheel in your crib, but you’re interested in more than that. NR: Yeah, I really always enjoyed the academic side. In fact, if I wasn’t going to make it as a driver, I already had a place reserved for me in Imperial College in London to study aeronautics, that was my plan B of how to get into F1, which would have been as an aerodynamicist. Right, design the car instead of driving it. NR: I don’t know if I would have gotten there in the end, but I think I had a good shot, so that was my plan B was already set. You’re most famous for your rivalry with Lewis Hamilton but as I understand it you actually met him quite young you were teammates in carting as well? NR: It’s a pretty crazy story because the McLaren Formula One team wanted to set up a little go-kart team at the time, and the two rising star drivers at the time was Lewis Hamilton from Great Britain and myself down south, and so they actually funded our two go-karting seasons. And so it was just the two of us driving for the McLaren Mercedes go-karting team and we were winning all the races and championships. Unfortunately for me, more often than not, it was Lewis winning and I was second, but there we go. So it’s incredible because we were best friends at the time and we were 13 years old and we were on holiday together all the time and dreaming, “Imagine what it would be like in 15 years to be in the F1 team together, winning races and championships?”, and it was impossible to achieve that dream, just seemed so far away. And yet really 15 years later, we’re in the Mercedes F1 team as teammates fighting for races and championships, so it’s a pretty incredible story. I mean, why did it seem even that impossible, though? I mean, your dad was an F1 driver, you’ve been racing in karts. What makes F1 feel so far away? NR: Well, come on, you can imagine if you’re 13 year old and you’re playing in your regional tennis camp in the middle of nowhere that you look at the television and you see [Jannik] Sinner and [Carlos] Alcaraz fighting for the Monaco Masters that’s going to look like extremely impossible and far away. Right, but there wasn’t a bit of total self-belief that, “I’m going to be there, there’s no question”? NR: Well maybe Lewis is a little bit more like that, I’m more sensitive, more insecure, less self-belief, so I never actually really believed of myself that I could get there and be good enough, which has pros and cons to think like that, because it also is an incredibly strong motivator. When you don’t have that self-confidence, you just fight so hard to prepare to the best of your abilities all the time. So it has pros and cons, and it was nice to see that, of course, someone like me that did not believe until the very last corner, I was still able to actually win in the end, so that was reassuring. I’m curious about this mindset bit, because this has been an area that you’ve actually talked a lot about. In 2007, you stopped working with your father as closely as you were, went to work with a sports psychologist. At what point was it clear to you that this mental aspect is going to be super important to your success? NR: That became clear to me in my first year of F1 because it was mentally just an enormous struggle. We had a bad car, so we’re either breaking down or finishing well out of the points all the time and it was a really rough start to my career. And this is with Williams at the time? NR: Yeah, with Williams. At times it was almost as if like, “Oof, I might not get taken on for the second year”, because it was such a rough start. So mentally, it was incredibly hard because my dream is at stake, my dream is to be an F1 driver, to win races, so that was difficult. So I decided that, “I’m spending four hours a day on training my body, why am I not training my brain? There must be solutions out there to improve my mental state”. So I sought out help, and I found a psychologist/philosopher and this was incredible for my life, for my performance, I worked 10 years with him. In the winter, two hours every two days, so it was like an incredible effort, it was harder than the physical training was actually the mental training. It was a combination of learning to meditate, learning to visualize, to learning the power of repetition, and also learning to understand myself better. “Why am I scared?”, “Why am I anxious, jealous?”, because then you cannot switch those emotions off very easily or almost not at all. But when you understand why they’re there, you can really adapt your reaction and that has a snowball effect, because when you react in a much better and more appropriate way, it has an enormous snowball effect on your life so it’s these kind of learnings that really helped me so much. Was this pretty novel for an F1 driver to seek this out and do this sort of training at the time? NR: Yeah, it’s a bit like in the startup world. Founders are not really allowed to admit that they’re scared of failing or that they’re working with a brain doctor, as some like to call it at the time in F1, so it was not something that I could really tell anybody about this because it would look weak in a way, but actually it became my superpower to go through that process. And now there’s a little bit more acceptance now, there’s been a couple of other drivers talking about it. I think even Lando Norris, the world champion last year, he sought help in the middle of last year as he was struggling mentally, clearly, and his championship was slipping away from him, and he went out and sought help and made enormous progress, and that’s what got him the world championship in the end so that was great to see. Lando’s always interesting because he seems to wear his insecurities on his sleeve, they just come through sort of so tangibly. Did you feel a lot of like sympathy for his sort of struggles and working through that? NR: Yeah, totally. That’s the state of mind that I can very much relate with, and that’s what people love also because he’s very authentic, so that’s really appreciated. At the same time I wrote Lando a direct message on Instagram and he never replied, but at least I wanted to see if maybe he would read it, because I’ve been through what he’s what he’s been through, and one of the obvious things that I would change if I was Lando, and he did change it a little bit, is to not always talk about the glass half empty, even when he was on pole position he almost only spoke about that one corner where he messed up rather than like, “Hey, that was almost the best lap of my life”. I mean, both is right. “Hey, that was almost the best lap of my entire life”, that would be correct or, “Ah, damn, I messed that last corner up so bad”, that would also be correct. You know? And he just says, “I messed that last corner up”, and, “I need to get my stuff together”, and that’s just unnecessary because it’s repetition, and it really ingrains itself in your mind that you always, if you say, “I make mistakes always”, you’re really going to believe that you make mistakes always. So that’s something that he could quite easily just adapt, even if he keeps on thinking that that, but don’t say it, and don’t say it out to the whole world, because that’s a whole tsunami that you’re setting off there repeatedly, which is not going to be beneficial to your performance. You’ve talked about talking to founders and not being able to show weaknesses. Have there been any examples in the times that as you’ve been an investor and talking to different companies, where you’ve identified someone and been like, “Look, you’re kind of a Lando Norris here” — maybe that’s not the words that you used — but, “Let me talk to you about your mindset and how you can shift that”, has that come in handy yet? NR: I really enjoy that because founders are really very similar to high performance athletes. NR: They’re extremely competitive, their drive is unbelievable, they’re very courageous also, because you have to be so damn brave to bet the company over and over as you’re innovating and pivoting, so there’s great similarities, and that’s why I really enjoy speaking to founders. Just now in the Bay Area, that’s very often the topic that I speak to founders about and they enjoy that as well, to discuss that kind of topic mentally, how they approach that and everything, and so that’s really enjoyable. I think I can really add value as well as I learned for myself also, but I can really add value by adding from my experience. The more founders that you talk to, is there a bit where — if you go back to F1, it’s very visible who’s the best, like it’s very measurable in a certain sense, but it’s interesting at F1 because sometimes you could have a great driver who doesn’t have a great car, and yet people will still say, “That person is excellent, they’re just limited by their circumstances”. Do you get a similar sense in being in tech, dealing with founders, and being able to separate the circumstances from the person and saying, “There’s something there even if the circumstances aren’t allowing it to show”? NR: That’s one very, very important ingredient for a successful founder, because actually it will be often many, many years until there’s any validation as to what he’s building or she’s building and the best founders have to be extremely resilient and not feel the need to bow to consensus thinking of people around them or of their board or whatever. They are the visionary and they have to believe with such high conviction in their idea, in what they’re building and see it through. Because if it was obvious, then everybody would be building it, and most of the time, they’re creating something that’s just not obvious to sometimes anybody except for themselves in the early stages, so that’s absolutely a very important trait. However, in combination with an extreme curiosity and desire to learn and remain open to new ideas and everything, so it’s a balance that has to be found. And again, that’s pretty rare to find both attributes within a founder, but usually that’s the case. Is that tension between the sort of insecurity and confidence and uncertainty and curiosity? Is that what you’re zoomed in on, what you’re looking for? NR: Yeah, totally. Because sometimes it’s like it opposes each other. Right, it’s a paradox. NR: Someone who’s very self-confident their idea will be will be completely arrogant and just so sure that their way is is the right way and that’s it and then they will not be very curious, so that’s why you don’t find it in every person and it’s important. I think these two character traits are very, very important. Continuing with the background, you have a YouTube channel that has 1.46 million subscribers, you haven’t posted on it for a while, but there used to be a whole host of videos. But I went back, scrolled all the way to the bottom, and the original upload was in 2011. A lot of people didn’t know what YouTube was at that point or barely did, how did you find YouTube and why did you start posting videos? NR: As an athlete, there was an opportunity that suddenly that came in those years, which was to connect closer with those out there that were supporting me. Were you the first one to really do that? NR: No, not the first, but I joined some of the early movers and it was amazing to see how you could directly connect with your fanbase, and there was also the belief that, of course, with time, Formula 1 is also about marketing and that can give you an edge over some other drivers. If you build a big following, a big brand for yourself, and you become highly relevant to brands for sponsorship, etc., then a team might choose you over someone who just drives fast. So there’s also that element that to be a successful F1 driver, usually it helps to really try and excel in every single domain that may be relevant and that domain plays a role, as well as working well with the media, because the media is so powerful and that’s a game you also need to try and nail. I’m curious about the sponsorship angle. F1 obviously has huge amounts of sponsorships, it’s an amazing sport where people will willingly wear gear with a bunch of sponsorships on it — I guess all racing is sort of like this. But right now, now that tech is huge and F1 is huge, there’s a lot of tech sponsorships of F1 and I’m just sort of curious: I’m in tech, but generally a lot of these companies are enterprise companies , a lot of B2B things, and this whole world of sponsorships and what goes on around that is somewhat foreign to me. I’m just a blogger here in Wisconsin before in Taiwan, what is in that game and how involved are the drivers? Is that a huge thing? You have to go out and actually help win these sponsorships too? Or you should show up to a bunch of events? I’m just curious, how does that world work? NR: So a few things here. First of all, because of Netflix , the sponsorship fees that the teams are now requesting are like 2-3x from what they were just six, seven years ago. Is that just because it’s more popular or because they also their logos also show up on Netflix? NR: Because it’s so much more popular and because it’s now become relevant in the US. So the whole tech industry has become interested and you’ll see most companies are now also sponsoring. I mean, look at just the Mercedes team , of course, but look at the Audi team also . They have Revolut, so the bank that’s come out of the startup ecosystem, ElevenLabs , the voice AI global Leader, all of these companies. In fact, I’m actually, because I’m so deeply connected now with Silicon Valley, I am more and more also kind of casually supporting some of these tech companies with sponsorships in F1. I’m just presenting one dev tools company, multi-billion dollar, with an opportunity to sponsor a team this week, I’m just sending that through. Because the sponsorship fees have increased so much, a team like Mercedes has $400 million in annual sponsorship revenue. $400 million! That’s so crazy. And then you add their share of TV revenues on top, so they get to beyond like $600 million in annual revenue, and because they inserted budget caps in F1, they don’t spend more than $300 million, even including driver salaries and everything. So they are so hugely profitable, these F1 teams, or especially the successful ones and that’s why the CrowdStrike founder now, George Kurtz , he just bought 5% of the Mercedes F1 team. And that stake, I mean, the Mercedes F1 team was valued at $6 billion, unbelievable. you know so so he paid three hundred three hundred million dollars he paid for a five percent share. Do you feel like you were 10 years too early? NR: I missed that train, because I think with a bit of effort probably at some point I could have had a nice little share in a F1 team somewhere, but I completely missed the train. It’s incredible how this sport has become has become really a business case now, and these these F1 teams have become investable assets, which never used to be the case, so it’s quite phenomenal. So these sponsors, we drivers spend a lot of time with these companies then, they invite all of their customers, I do dinner with them then even during a race weekend or the next morning for breakfast. Monaco Grand Prix, I’m at the Hotel de Paris having breakfast with one of the sponsors, so the drivers do spend a lot of time with those sponsors. And apart from that, the sponsors want visibility because visibility for their logo is just an amazing credibility stamp, and also they want to bring and host people at the races, so that’s what it’s about and I think it works amazingly well. I was talking to Michael Cannon-Brooks , Atlassian is now sponsoring Williams, and this idea of you actually have 24, or this year 22 , around the world, pre-planned, clear places to meet customers and bring them there. He’s like, “It makes scheduling very easy, it’s very straightforward”. NR: And for someone like Atlassian the customers are there anyways in the paddock, because the C-levels of all big companies are always there. To make deals in the paddock is incredible, an incredible opportunity and even I myself, so I do work for Mercedes F1 and they don’t actually pay me in Euros, they actually pay me most of the time with tickets for the F1 races, because I too, I love to host the VC community at the races, it’s such a great way to get to know people, build friendships and of course, yeah, it’s very important for me to really build relationships in this ecosystem. That’s super interesting. Speaking of Mercedes, when Mercedes rejoined F1, acquired Brawn , you were the first driver alongside Michael Schumacher, who was then replaced by Lewis Hamilton — two pretty impressive names to have as teammates to say the least. The rivalries between teammates is the stuff of lore in Formula 1 but is it actually underrated how intense that is? NR: So the norm in F1 is always that a team has a number one driver and a number two driver and that’s clearly kind of set in stone, and that’s the way you go racing. It’s very unusual that a team has two number one drivers, the most legendary such pairing was Ayrton Senna and Alain Prost at McLaren, and that ended in total disaster after only two years. They were crashing, then one guy quit, and it was just a total mess. It’s okay and not too bad as long as you’re racing for like fifth and sixth and seventh place — but as soon as you have the best car and you as teammates are fighting for every single race win, it just becomes so hard because you’re always going to push the boundaries and go into those gray areas because there’s a championship at stake and that’s your childhood dream and that’s what then happened between Lewis and I also. It kind of just spiraled from one going a little bit too far, then the other one paying back and then back again and then crashing and it just became very, very tense and difficult to manage. It was a very uncomfortable environment to be in because not only are you kind of enemies within the team, but also the whole team as such cannot really take a side anymore and they need to stay neutral, so they can’t really support you either anymore, so it’s a complicated dynamic. Well, you lasted longer than Prost and Senna, because I think you made it three years with Lewis Hamilton. Is that right? NR: Four, actually. We would have kept going, I had another contract for a few more years so it was kind of borderline manageable, but only after Toto Wolff made us sign a contract whereby it didn’t matter who was at fault, but if ever we crashed together, then we would have to split the bill, the repair bill, 50-50, and my most expensive one was $360,000 and after that, I made sure to leave extra space when Lewis was anywhere close. (laughing) That’s amazing. Why did you decide to retire? I mean, you finally win, you overcome Lewis, and then you’re done at 31. NR: I gave it a thousand percent, really, much more than any that I thought I could give. Total life commitment, insane intensity, the whole thing, mentally, physically and I achieved my dream, I achieved my dream in the best possible way, I beat the greatest of all time, I won that Formula 1 World Championship with Mercedes , the legendary car brand, it’s not possible for me to do better. I had a young family at home, a child at home, baby at home so it just felt like the right moment for the most beautiful exit possible for me that would carry me for the rest of my life. So it was a bit of a rational decision in that way and I just felt that was what I wanted to try and do. Of course, it was scary because when you make such a decision, you don’t really know how it’s going to go and how you’re going to feel. But now in hindsight, for me personally, it was really the best thing I could do and a great decision, which I’m very lucky to have been able to exit in that way. And a lot of founders listening, because I know you’re very popular with founders, also your podcast, they will be able to relate, it’s kind of the $10 billion or $50 billion dollar exit. NR: Once you put your life into it and you’ve created an enormous success and change people’s lives and then you go out on a high, I think that was my dream to do it that way. You made a lot of changes before that last year, too. But then there’s all these stories of that last year where you won the title, focusing on things like jet lag or like your nutrition and all those bits and pieces. Was that just like, “I have to figure something else to finally get over the hump”? NR: I tried to perfect every single possible marginal gain possible, that was really what I was about, and it went from working with a Professor of Sleep at Harvard , and who now has created a startup based on what we were working together at the time called Timeshifter , actually, which is a nice anecdote. And so there, for example, the secret was eliminating jet lag for the whole year because jet lag is a disaster. As an athlete, the difference between 99% focus and 100% focus is the difference between coming first and second, and jet lag just destroys you, and we’re traveling from continent to continent all the time. I managed to do a whole season with absolutely 0.0 jet lag, and it’s pretty simple. Of course, it takes a lot of discipline, but pretty simple. The secret was one-and-a-half hours maximum of time shift per day and then blackout glasses in the evening, two hours before needing to go to sleep and then also immediately upon waking up, 10,000 lux, like a light, you know, which you’re staring into, which you also see with Bryan Johnson , he does that and then, yeah, I mean, as long as I followed that, it was incredible. So I eliminated jet lag from my whole life for that year and every detail I worked on in that way, you know, really everything. So you see my helmet was black and it was bare carbon because I realized that the helmet was 80 grams and every gram counts in our sport so I took the paint off my helmet, just every single detail. I really tried to work on every single marginal gain possible. This sounds absolutely hellish with family and little kids at home. I can see why you once you accomplished it, you were done. NR: Yeah, of course. I mean with a little baby at home it required a lot of a lot of a great commitment also from my wife Vivian at the time and great support and and she did that awesomely so I’m very very grateful for that. Now you’re sitting here as an investor, but we’re a decade on from when you retired, what was the path to get to where you are now and to realize that, “This is what I want to do with the rest of my life”? NR: Seven years after retiring was first of all, just trying everything and nothing, trying to figure out what could be next in my life. And it’s hard because as an athlete, you are like CEO, you know, you’re top of the company, and you feel like being the king and then after your sports career you drop to zero. There’s nothing there and you cannot use your skill that you learned for something new, it’s just gone. And it’s very hard to accept that you really start from zero and you don’t even know if you’re going to have success in something new or not. So I tried a lot of things and and now finally I’ve landed on what I really enjoy doing and it’s being fully into the venture capital ecosystem building my own VC firm, Rosberg Ventures , out of Europe, investing a lot in the USA or even primarily in the USA. So super exciting and yeah, and I hit the ground running and I’m able to win also pretty quickly, which is what is really motivating. What made you realize there was this opportunity? If you sort of zoom out, this idea that there’s money in Europe, there’s opportunity in the U.S., someone needs to connect those two things together. But was there a specific conversation or something that came along that’s like, “Oh, I could actually do this and be good at it”? NR: Well more than money in Europe it was money in my bank account which was just sitting there. That makes sense. NR: And I was like, “What am I going to do with that?”, because it’s really really hard to invest capital across generations in a smart way. It’s like super, super difficult, as most people will know that or many people know. The way led to the Yale Endowment — everybody who’s interested in finance has once looked at the Yale Endowment because David Swensen is the gold standard for investing capital across generations. And my light bulb moment was then seeing that David Swensen had by then put 20% of the Yale endowment into venture capital, 20%, that’s $8 billion, and it was by far his best performing asset class with 21% yearly performance, 21% IRR. So that was my light bulb moment because I said, “Wow, I love startup anyways”, but I didn’t know you could make an asset class out of this, “Let me try and replicate what David Swensen did”, and I believe that with time because I have my unique angles, including F1, that with time, I can also build the right access by adding value into the ecosystem and everything to kind of replicate the approach that David Swensen took to the asset class. And that’s where we are now, we actually made it work. What are those unique angles? I think that sort of ties this together. You have the F1 background, you’re European. NR: So the unique angle, of course I have the F1 platform, which is a really unique advantage to be able to meet people from the VC ecosystem, make friendships, get insights. Appear on this podcast. NR: (laughing) I’m very, very lucky in that sense. But that’s something you seem to think about very strategically. Like, “This is an advantage that I have, I’m going to exploit this and push this”. Is this part of the thesis up front, particularly once you started? NR: Well, first of all, I really enjoy welcoming this incredible community to my sport, it’s amazing for me to be able to showcase my sport in a way. So this is where you did better from Drive to Survive in the end, because even if you sort of missed that era, now suddenly everyone’s interested in F1. NR: Oh yeah, definitely, I would not be here today if it wasn’t for Drive to Survive because that’s what has really engaged the whole tech community in my sport. It’s lovely to be able to invite people, bring them up close, show them what my sport is about, and see how excited everybody is and to share that with them is really amazing, so I enjoy that. And it’s a great opportunity to, as I said, build friendships and get insights, but then also to add value. How does that start? First of all, of course, curating the group that I invite. I invite the founder and then I invite the CIO of a big company and they then actually have a very valuable exchange. The CIO happens to be looking for the product that the founder is building, the founder obviously needs to go to market, so there’s a great way for me to build connections, and that’s how you start adding value. And beyond that, what we do is also we bring U.S. innovation to the German large corporates, we help with that. So Germany is your specific focus in particular in Europe. NR: Because I’m German, and because of my history and everything, I’m very well connected in Germany to all the C-levels in the large corporates. Does this even go back to like not just growing up in Germany, but also working for Mercedes, being the driver who’s interacting with all this? NR: Yeah, of course. All these large caps have been sponsors in F1, they’re all in the paddock, so I know them very well, and they’re all in desperate need of transformation now. Of course, there’s AI, there’s sustainability, there’s all these points and they’re not exactly the fastest, the German companies. They’re a little bit — many of them are real legacy businesses, who are not necessarily known to being the most brave when it comes to adopting new innovation and things like that. And are these generally like just regular companies, like manufacturer companies, things like that? NR: It goes all the way to the car manufacturers, whether it’s BMW or Mercedes and we have found a unique positioning where we’re able to support, just selectively, with bringing their attention to a couple of products that are just being built in the US in the startup ecosystem, whether it was vibe coding or it’s even legal tech, all these different things, and we can bring their attention to some of these innovations and really add value by creating these connections. So this is one of the secret sauces to Rosberg Ventures and to adding value, which works very well, and we’re hosting dinners with some of the C-levels and inviting some of the startups, etc., and it works very well. So you recently announced a new fund, $200 million assets under management . How did you grow your network on the asset side? Is that mostly then German money that’s coming back to the U.S. and you’re completing the cycle there? NR: Mainly German, so it’s German capital because the Europeans really lack connectivity, I realized that the Europeans lack access to U.S. venture capital and they know of the importance and the value that’s being created there, but they don’t have the access and they really kind of miss the boat on that, so it’s not too easy to convince them that, “Hey let’s join forces and partner up here, and let’s invest in the best opportunities in the U.S.”. So that’s been working very well and my way to raise or to convince these families is really going via the principle who I may know from F1 or whatever and then I say — I don’t even say too much like what i’m building because you don’t want to sell straight away — it’s more like, “Hey can you introduce me to your family office? I would love to just have a conversation with them”, and then the introduction, and I speak to them, I explain what we’re doing, and it’s just an obvious one. We’re kind of indexing the top 10 VC funds in the U.S., and also the top 10 growth stage companies, startups in the U.S., and indexing those and it’s kind of a no-brainer then, that’s how we’ve been able to raise capital very, very quickly. That makes sense. So everyone sees the opportunity, it’s not clear to get the capital in, you go in first sort of as like a seed investor with your own money, and that sort of starts that virtuous cycle, and that makes sense and then they get access to the German market in the long run. You’re bringing a unique angle and it’s just all about deal flow, I think it’s pretty compelling. Why is it so hard to do business in Europe ? Has everyone just given up on having a big startup ecosystem there and, “Let’s just get our money into the U.S.”? NR: So you mean the startup ecosystem in Europe? NR: There are flashes of real hope at the moment. Vibe coding was pioneered in Europe, the vibe coding for prosumers, that’s Lovable out of Sweden, and there’s many other examples. I mean, ElevenLabs, the global leader in voice AI, European, and many, many more examples. So there is flashes of real hope. But of course, we lack the breadth in the whole ecosystem and that’s as a result of a few things. It’s a bit of a chicken-and-egg. One, of course, it’s much harder to scale in Europe because of the geographical limitations, it’s so hard to go from Germany to France, different language, different regulatory framework, it’s just a huge friction there in the go-to market, so that’s one challenge. And then historically also, there’s been quite a lag in the distributions and liquidity in that asset class in Europe and so therefore, funding is not as ample as in the U.S. So it’s kind of a chicken-and-egg there also. But I think Europe is really working on trying to introduce one regulatory framework across the entire Europe, across all countries for startups, so that’s in the plan, so a lot is happening, and let’s see if Europe can develop more and more such promising companies. How have you managed this shift? You started out sort of a fund of funds sort of model, then you mentioned you’re doing more direct investing. Is that just a natural evolution of getting more access, having more assets under management? Or what was that explicit goal and strategy that you were seeking to pursue? NR: Well I think the holy grail in venture capital is is to invest directly in the startups and the fund of fund was the natural starting point from an asset class point of view, also from from copying and being inspired by what Yale did, and then from there the fund of fund is like a Trojan horse because it gets you positioned well into into the market where you see everything and then it really helps to identify which are the breakout startups, which are the most promising with the generational founders. So it really helps to create a short list and also to create those connections and to build those opportunities to actually invest directly in the startups. We met in San Francisco a couple of months ago, you had just met with Dreamer , I actually met with them the next day, they launched and were immediately acquired by Meta , was that your first exit of a direct investment? NR: So this is an important point that I don’t just like try and support the companies that I’ve backed. So in this case, this was the CTO of Stripe, the ex-CTO of Stripe, who was my friend, David Singleton , he built this together with Hugo [Barra] , who used to have a senior role at Facebook. Yep, I knew him when he was at Xiaomi , he was at Google, he was at Meta, he’s been all over the place. NR: Everywhere, it’s an incredibly promising founding team, and so I was just trying to support. And they happened to say that Stratechery, that they were the biggest fans in the world of you and Stratechery, so I was like, “Okay, well, that’s easy, I just met Ben yesterday, so I can make the connection there”. Yeah, it’s a pity how that went — I mean, pity because also from our point of view, I was so excited about that product, actually, it was vibe coding AI agents. Yep, it’s very compelling. I was looking forward to writing about it, they got snapped up before I could even get there. NR: I was looking forward to really using it at scale, but, yeah, now it’s bought by Meta and let’s see what Meta does with it, but it will certainly be, I’m sure, very promising what they build with that. As you’ve made this transition and levered up into tech and going from fund of funds to direct investment, it’s a time of great upheaval in tech , given AI. Theoretically, this should mean more startup opportunities. On the other hand, the frontier lab models might just eat everything. How are you thinking about that as an investor? Is it like, “I’m finally getting to the stage where I can get into startups, and now I’m not sure that I want to”? Or are you optimistic? NR: I’m very optimistic. I’m very optimistic because AI, the value creation within this wave of AI is going to be something like we’ve never seen before, and I do think there’s a lot of opportunities beyond just the frontier labs to capture market share, create new markets. But at the same time, you do need to be careful because we see the legal tech. Legal tech is a really big new market that’s being created there with a leader like Harvey and Legora , the two leaders, and then now Anthropic came out with a product which kind of starts to threaten their position a little bit. And Anthropic has been doing that for every sector, it feels like almost, so that is a little bit of a concern. It does feel like a safer place at the moment to be invested in frontier labs and neo labs, that does seem the more safe place to be. But nevertheless, I think there’s like, for example, Elevan Labs, voice AI, it’s very defensible what they’re building. They are a frontier lab themselves, by the way, because they build their own models. But still, voice probably is going to commoditize, the research, as in many cases and there it’s then going to be about the platform, distribution, products. And there, ElevenLabs is doing an excellent job. So it does look at the moment like they’re going to be able to really win and sustain any potential threat from these frontier labs so there are examples where beyond the frontier labs, many, many examples where they can be success stories, so it’s an exciting time. You mentioned platform and distribution, and this sort of seems to be a theme: you’ve thought about the F1 reputation and background, “I can leverage that, I know these sort of companies, I can leverage that”, you saw YouTube early on, you were on that, you’re here on this interview. Is that why you still do Sky Sports? Everyone’s favorite commentator , is that you love to commentate, does that keep Nico Rosberg sort of front and center? NR: You’re right. I do enjoy staying connected with the sport, but there’s the second reason that it’s really helpful for me to stay kind of relevant and it does help me also with relevance, even in the tech ecosystem. Because, of course, if then some people enjoy watching me and things like that, it’s easier to connect with them in future, even in the tech ecosystem. So that is twofold. We talked before, you were born with sort of steering wheel in your crib, in some respects, a advantageous background. But what I see as an overall theme is pretty consistently you identifying and leveraging your advantages and like what we just articulated is a good example. So now you’re in the investing world, totally separate, but figuring out what you have, how to work with it and build towards that. Is that the overarching sort of theme that you see in your life? What still drives you, is it that bit about being a little bit insecure and wanting to prove yourself and being super competitive? Is that just like you can’t turn that off and that’s what that’s why you’re still here? NR: I’m a super extreme competitor, I need to compete, I want to win, and I have now chosen venture capital as my space to try and win more and more in future. And I think, yeah, this is what I’m carrying over from the sport. I was very methodical about how do I get that win, in sports, every detail. I worked on every single detail possible to put all the pieces together to be the best that I could be and to get to that win eventually and I think that’s something that I’m now replicating in the world of venture capital, trying to optimize for everything and put everything together to be able to win more and more. How do you think about that with your kids, just out of curiosity? Your daughter sort of popped into the background on the call here. NR: So with my kids, because I went through such an extreme intensity in my sporting career, I, with them, am more focused on well-being rather than pushing them towards some success. But at the same time, you just credited your massive drive and competitiveness with your success. NR: Exactly, yeah, but wellbeing and happiness is what I put at a higher level for my kids and that doesn’t necessarily have to be success. So I’m very eager to push to try and help them discover their real passions, and we’re getting there. So my daughter, I put her in a go-kart two weeks ago, she drove slower than I could walk, so I could walk faster, and she ended up crying, so I hope she doesn’t listen to this one day, but I don’t see which one it is either, so we’re fine because I have two daughters. So it was clear that this is not her passion, and then we will never go again. But I can see that her passion is music, guitar, singing and so there I do nudge her towards more lessons, guitar lessons, drum lessons, without overdoing it, because I see that that’s her natural passion, you know? So that’s the approach I’m taking, but definitely really focused on happiness and well-being. So you mentioned you’re on holiday in Ibiza. I understand you have an ice cream shop there , is that right? NR: So yeah, with my wife, because she’s an interior designer, so she’s super creative and for some reason, we both of us, we love ice cream and we’ve been coming to Ibiza all our life, and there’s never been a nice ice cream place. So just as a hobby, we just said, “Hey, why don’t we open one ourselves?” — our friend, our common friend, he likes to make ice cream, so we do that, and it’s become a huge business. We have now a chain here in Ibiza, and very successful, and it’s the number one ice cream place. So Ben, next time you’re in Ibiza, ice cream is on us. (laughing) Sounds like a deal. You have an interesting life in terms of you learn five languages growing up, you have parents from different countries. Obviously, as part of being an F1 driver, you’re all over the world. You’re doing this connection between Germany in particular and Silicon Valley. Do you feel like, you talk about eras and riding them and starting and beginning in terms of F1 — do you feel that era, you’re like the pinnacle of like globalized civilization? Do you feel that that is an era that is going to persist past you, or do you feel that sort of cracking and changing? NR: This is related to the sport or? Just in general, just given you are like an international man of mystery, although maybe not that mysterious, but it’s like your superpower is connecting and linking all these disparate pieces together and seeing the ability to sort of build through them. And I’m wondering, is that something, an opportunity, that you think is going to persist given the way the world is going? NR: Well, I’m very optimistic in that sense, I’m very optimistic. And I see a long road ahead. And I think it’s an amazing time for venture capital now, it’s incredible, a time that we’ve never seen something like that before, the speed of innovation, and there may be my F1 speed also helps me, it doesn’t scare me at the moment because I’m used to driving 220 miles an hour. So maybe I’m one of the only people in the world where I’m not getting scared by the speed of innovation that we’re seeing in the startup ecosystem, because I’m quite used to speed. You actually focused a lot on e-mobility and electric vehicles. I do have to ask you, how are you feeling about the current F1 regulations , this 50-50 split? A lot of complaints that driver’s skills being taken away. What’s your view? NR: I saw a message from Toto actually recently, and he said, the F1 driver job might be the very last place that AI is going to endanger that job. Because it’s very, very hard for AI to try and replicate what we are doing in that racing car at the edge of physics. But has it been diminished a little bit if you’re going around a curve or you’re on a straight and your car’s just slowing down on its own? NR: No, I understand, F1 has tried to stay technologically relevant so they have gone full hybrid which is one of the most efficient powertrains in the world, the way they’ve done it, but of course yeah it’s a little bit to the detriment of racing on the edge, because now they’re going through a high speed corner towards the end of the straight and they actually downshift on the straight after the corner which is unheard of in the sport. But to be honest I’m quite easygoing about that because I like to really focus on just, “Is the racing exciting?”, “Is there good battles?”, “Is it unpredictable?”, “Is there rivalries?”, and as long as that’s happening, I think all fans will kind of forget about these regulations and will just enjoy the sport once again and be super excited. I think the season is shaping up really nicely. We have this super underdog, this 19-year-old who was really having a struggle last year, who suddenly has come to life and is showing his real talent and is dominating the championship so far, 19 years old, he’s still like a child, it’s incredible, Kimi Antonelli , Italian guy, driving for Mercedes. So it’s so exciting to see him in front and now everybody else trying to catch up to him, I think it’s great. You are associated with Mercedes, they are doing very well, I am a Kimi fan, my kids got a picture with him last year, so he’s by default who we’re cheering, for sure. But who do you cheer for in F1? NR: I do cheer for Kimi as well now because he used to be my driver in go-karting as well, so I know him since he’s 12 years old, and he is a generational talent of the level of [Max] Verstappen, Hamilton. His talent is exceptional and he’s so humble and authentic and nice guy also, so you can only cheer for him. It’s such a challenge that he’s facing, being a driver of the Mercedes team, leading the championship all of a sudden, an incredible challenge, and I can so relate because I was in that position and it’s so hard. It is so hard what he’s getting himself into now for the rest of the year. I’ve been writing him also and I said, just without telling him what he should do, I just told him like what I did and what worked for me, I’ve been writing him. And one thing, for example, was just really take it race by race, don’t think about the end of the season, don’t think about championship, just race by race, try and optimize for the next race, go in to win, and that’s it and then the rest will just see how it goes. Are you surprised it’s been a decade and Lewis [Hamilton] is still in F1? NR: I am quite surprised, because that’s a long time, and we weren’t exactly young at the time. So when I stopped 10 years ago, he was already almost 32 and he’s still going now, which is incredible and huge respect, respect for him to keep going, keep grinding, keep the motivation. Still seems as motivated as ever, driving really well again this year, he’ll definitely win some races this year, I think he’ll win some, so he’s doing really well. And every win that Lewis gets is another notch on your belt, right? NR: (laughing) That’s a little bit of an egotistical view to it, which sometimes I do think about. Yes, the better my success looks, which is nice, yeah. You won one, you beat Lewis. It’s a championship, if you’re going to win one, that’s about as good as it gets. But, hey, you didn’t stop there, it’s super impressive what you built, very interesting to learn more and I look forward in 10 years when Nico Rosberg is the champion VC investor. What is it, the Midas list ? Are you gunning for number one? NR: Yeah, sure, Midas List, that’s gonna be a hard one, but those kind of targets, at some point, yes. Nico Rosberg, great to talk to you. NR: Thank you very much. This Daily Update Interview is also available as a podcast. To receive it in your podcast player, visit Stratechery . The Daily Update is intended for a single recipient, but occasional forwarding is totally fine! If you would like to order multiple subscriptions for your team with a group discount (minimum 5), please contact me directly. Thanks for being a supporter, and have a great day!

0 views

Writing an LLM from scratch, part 32k -- Interventions: training a better model locally with gradient accumulation

I've been working on a GPT-2-small-style LLM based on Sebastian Raschka 's book " Build a Large Language Model (from Scratch) ". I've trained various versions of it in the cloud to work out which interventions to the model and training code had the best effects on the loss it gets on a specific test dataset, and now I wanted to do a training run locally to match the best of those. For that, I wanted to match the batch size I was using for the cloud training runs. When I first started learning this stuff, batching seemed like a performance thing -- with highly parallel systems like GPUs, it generally turned out that you could run a batch of (say) two inputs through a model in less than twice the time you could run one, so it made sense to batch them up. For inference, that is exactly the advantage you get, but when training, it's become increasingly clear to me that you can also get an improvement in the quality of the model from batching. The best intuitive model I have is that if you run inputs through one-by-one, adjusting parameters after each, then it's easy for the model to "overcorrect" each time. With batches, you get an average set of gradients across all of the items -- which smooths things out and stabilises the training. Of course, it's possible to overdo it. As an extreme example, imagine that you were somehow able to fit your whole training set into one batch -- then you could train by running that single batch through, doing a single backward pass, and then adjusting the parameters once. It's pretty clear that that would not work very well -- just one single update of the initially-random parameters. When training on my local machine, I could fit a batch of six sequences into my RTX 3090. I'd found that when I moved to cloud machines, it had a very positive effect on the loss I got out of the models when I tested them. From a quick-and-dirty bit of curve-fitting , I estimated that the optimal batch size for this model, with that training run, was somewhere around 97. Conveniently, that was close to the maximum I could fit onto an 8x A100 40 GiB/GPU machine, so I used a batch size of 96 to test the different interventions I was trying. And when I finally put all of the interventions that helped with training together , I found (somewhat to my surprise) that their combined effect -- an improvement in loss of 0.113765 -- was less than half of the loss improvement of 0.252474 that I had got from increasing the batch size. What that all made clear was that if I wanted to do a local training run that matched the quality of the cloud-trained model, I'd need to not only add on the interventions that I'd been testing in detail, but I'd need to match the cloud batch size. And for that, I needed to learn about gradient accumulation. Gradient accumulation is pretty much what it sounds like; instead of the normal technique of doing a forward pass, working out the loss, getting gradients with a backward pass, and then applying them by stepping the optimiser, you do multiple forward-backward phases, letting the gradients accumulate, and then do one optimiser step after that. When you do that, you're getting the training stabilisation benefits of a larger batch size, even though you're not getting the performance boost. Sounds simple enough, and it is, in theory, but implementation got a little more complicated. Let's work through it step-by-step. To start with, imagine you have a really simple training loop: Adding gradient accumulation to that is really simple! Let's assume that has a length divisible by , the number of steps we want to run through before we step the optimiser. As a first (not quite correct) cut, you could just do this: You can see that we're just stepping the optimiser every steps. An alternative way to do it would be with an inner loop: Which of those is better would depend on the details of the training loop -- in general, if you wanted the "other stuff" to be done once per training batch, then you'd want to use the first option, whereas if you wanted it to be done once per optimiser step, the second would be easier. As you'll see in a bit, I went for the second one for my code. However, there's one small correction that we need to do to make either of these properly. Remember that when you calculate loss across a batch -- for example, cross entropy loss like this: ...you're getting the average loss across the batch, so when you do the backward pass, you're getting the average gradients. By contrast, in the code above, we're doing a backward pass on the complete loss at each step, so the gradients that are being generated in each backward pass are being added to each other -- you wind up with the sum of all of them rather than the average. So the gradients that the optimizer applied would be times larger than they should be -- it would be as if we'd multiplied the learning rate by that number! But that's easy enough to fix. The average gradients over a number of steps are the sum divided by the number of steps, and we can do that division ahead of time just by scaling the loss down. Adding that into the first example above: And that's basically it; with those changes, the original basic training loop becomes one that uses gradient accumulation. The effective batch size is whatever the real batch size is, times the number of gradient accumulation steps. However, the real training loop that I'm using for these experiments is a bit more complicated than that simple example. There's checkpointing, AMP, and -- most importantly -- it can handle multi-GPU training using DistributedDataParallel. That made things a little bit more complicated. The first thing was to look into the way I was selecting the data to train on. My dataset was already in batches, but we had to split those batches up between GPUs. The solution in the code was to work out how many global steps there were -- each global step being one batch going through each GPU on the machine -- like this: , if you remember from the DDP post , is the number of processes running in a multi-GPU training run -- one per GPU. Next, in the training loop, I iterated over the global steps: ...for each one, getting the appropriate batch out for the specific GPU that was running the code: is a zero-indexed number, unique to each of the per-GPU processes. So this basically split into chunks of length , and then each GPU was fed the batch at its 's offset into the chunk. I wanted to keep things shaped such that when I was running with gradient accumulation locally, it would be similar to a cloud run with per-GPU batching. Specifically: when I was training in the cloud, I had eight GPUs with a per-GPU microbatch size of 12, giving a total batch size of 96. Locally, I could fit a batch size of six on my GPU, so I needed to do gradient accumulation over 96 / 6 = 16 steps. To keep things as similar as possible, I decided that I wanted the concept of a "global step" to match between the runs. In other words, it would expand slightly, from meaning "one batch per GPU" to being "one optimiser step per GPU". So, each time through that loop, we'd do multiple forward-backward passes, and then one optimiser step. That would mean that the best way to do things would be with something much more like the second of the two bits of sample code above -- the one with the inner loop rather than the modulus. Maybe that's easier to show in code: That required a change to the data lookup; I decided that would be split into chunks of size , and then each of those would be split into chunks of size , so the code to get the appropriate batch for a given run through the loop became this: That required a corresponding change in to make sure that was divisible by both the world size, the per-GPU batch ("microbatch") size, and the number of gradient accumulation steps, but that was easy: ...became this: That was enough to get the gradient accumulation happening! Next, I needed to change the backward pass code to scale down the loss so that we got averaged rather than summed gradients. Because we might be using AMP with a scaler, the code wasn't just a simple : ...but the change was obvious enough: All of those changes put together, plus a bit of shuffling around of code, were enough to get a correct gradient accumulation training loop! But there was one small tweak I needed to add. When you're using DDP, gradients need to be synchronised between the different per-GPU processes. As a reminder, what happens is: Now, with my first cut of the gradient accumulation code above, what would have happened is this: That would be correct, but not very efficient. We're sending out gradients and averaging on every accumulation step. But because each of our per-GPU processes is keeping its own "local" average (by accumulating the scaled-down gradients), we only really need to send those local averages out and get a global average once, just before we step the optimiser. If we do that, we can save quite a lot of work. The trick to avoid that was to use the method on the class that our own model is wrapped in. What we wanted to do was suppress the gradient synchronisation for each of the accumulation steps apart from the last one. It was easy to work out whether we were on the last gradient accumulation step: Now, what we needed to do was to wrap this: ...in , but only if was false. Conditional statements can be a little fiddly, but Python has a "do-nothing" context manager in -- that is, ...is identical to just: So we can combine that with the ternary operator like this: ...which does exactly what we want 1 . With that change, I had something I was happy with; you can see the diff here . So now it was time to do a training run! I'd originally been planning to jump right in and do a training run based on my last cloud run , with all of the interventions I'd decided were worth using, but locally with gradient accumulation. However, I decided that it would be interesting to try doing a new "baseline" train first. I'd done my local training runs, and then established a baseline version in the cloud by taking exactly the same configuration and doing the training run on an 8x A100 40 GiB with an overall batch size of 96. So I could repeat that locally with gradient accumulation, and that would show two things (or perhaps, the same thing but in different lights): That would help confirm my understanding that it was the increased batch size that helped in the cloud, and not, say, some architectural difference -- and would also act as a good test of the gradient accumulation code. Here's the training run config . I kicked it off: That looked like the right number of global steps; it matched the numbers I saw when training in the cloud. And 44 hours for the training run seemed correct: my original local runs took 48, but with them I was spending quite a lot of time on validation, which this code didn't do. Just less than two days later: That all looked good. The loss chart looked like this: For comparison, here's the one from the cloud training run with the same config (but using larger batches rather than gradient accumulation): You can see that they're similar, but not identical. That's pretty much what you'd expect! The two training runs were on different architectures -- RTX 3090 vs A100 -- and so there will probably be differences in the CUDA kernels, and also PyTorch's AMP (which uses 16-bit instead of 32-bit in cases where it makes sense) might make different decisions. I think that if we'd run it on a machine with one A100, then the results of using gradient accumulation would be even closer (perhaps even identical) to a larger batch size, especially if we were training without AMP. I uploaded the model to Hugging Face and it was time for the evals. The smoke test first: As usual, reasonably coherent. But the important one was the loss on the test set: That's solid! The cloud-trained baseline model got 3.691526, so this local one was actually very slightly better, by 0.007691. But that's very close indeed, which is what we wanted to see :-) It was time to see what effect adding on the interventions would have. As a reminder, here are the changes I made to the config for this run: It did not include QKV bias. Here's the config . I kicked it off, and: It looked like it was going to take 40 hours; that matched what happened in the cloud runs, as removing dropout speeds things up quite a lot. Just less than two days later: The loss chart over the training run looked like this: That's very smooth, with no loss spikes. For comparison, here's the chart when we did the same training run in the cloud; you can see that it was a bit choppier than the local one. The gradient norm chart was also interesting: If you compare it to the one from the cloud training run below, you can see that the local one was actually noisier -- the cloud run has a few gradient spikes near the start but calms down from around global step 6,000 or so, whereas the local one is spiky up to about 3,000, then calm, but has a massive spike at around 10,000. The learning rate we don't need to compare, but it was worth sanity checking to make sure we really did train the right way: So that all looked good. The training run did have some differences to the cloud one, but (as with the previous baseline train) it looked similar enough. Architectural differences between the A100s in the cloud and the local RTX 3090 seemed like a plausible cause. I uploaded the model to Hugging Face , and it was time to run the evals. The smoke test first: Reasonably coherent -- and I think that's the first time I've seen an token in a smoke test output! But the important one is, as ever, the loss, and: Let's add both this one and the local baseline to the results table for all interventions: That's really weird! The local run with the interventions, , is 0.039600 points better than the cloud version of the same training run, . That's nice, in that lower loss is always better, but it's also rather confusing -- that's a bigger loss improvement than some of the interventions. In theory, all that we changed between the cloud version of this training run, and the local one was the architecture. I was expecting that to have an effect, but thought that it would be small -- as, indeed, it was with the baseline trains and , where you can see the loss difference was just 0.007691 -- about five times smaller. Now, when I was looking into the effects of noise on training loss , I found that changing the random seed that was used to initialise the weights (but starting the training run itself at the same random seed) had a much bigger effect on the resulting model quality than keeping the weights identical but varying the seed at the start of the post-initialisation phase of the training run. The standard deviation of the varied-weights, same-train models was about double the SD of the same-weights, varied-train. That was interesting, though not directly comparable -- those tests were done with the same training run, but the architecture held constant -- a 8x A100 40 GiB machine for each test. However, it felt like it would be a good idea to at least see whether we started with the same weights locally and when training in the cloud. My suspicion was that we probably would; the weight initialisation uses deterministic non-GPU code, so with the same seed we'd expect the same weights regardless of the computer. The similarity of the loss results for the local and cloud baseline training runs also seemed to point in that direction. But it was worth testing. I created a throwaway branch of the training code, which -- after creating the model -- just dumped the model weights to a file, then exited. I ran it locally using the config, and then I fired up yet another 8x A100 40 GiB machine on Lambda, ran the same code there, this time with the config, and then ed down the weights. Identical. That was reassuring! I considered doing more analysis on this; for example, in my investigations into noise, I found that keeping the same weights but altering the random seed for the rest of the training run, I got results with a standard deviation of 0.008672 -- more than four times smaller than the difference between the local and cloud trains with the interventions. Might that be a number I could use for some kind of comparison? However, I decided that it's not really comparable. That number was from varying the random seed, but keeping the same architecture. There's not really any solid reason to believe that keeping the seed constant but changing the architecture would cause the same kind of differences. They might be more similar, they might be less. I think that all we can really say here is that the change of machine changed some aspects of the training dynamics in a way that happened to get us a lower loss. I can easily imagine that if I'd done something slightly different -- used a local RTX 4090, for example -- it could equally well have gone in the other direction. And at least it's reassuring that the improvement was smaller than the interventions I was most convinced by; the only smaller ones were full-fat float32, gradient clipping, and QKV bias -- ones that I'd already decided might have only been beneficial due to noise. Most importantly, it was orders of magnitude smaller than the 0.252474 improvement I originally saw when I moved from local training to larger-batch cloud training. So, I think that that brings me to the end of this set of training experiments. We started with a locally-trained model that got a loss of 3.943522 on our test set, compared to the original GPT-2 small model, which got 3.499677 2 . I've tried a bunch of interventions to try to get my model closer, and finally I've managed to get almost all of the way there, to 3.538161. That's really pleasing! I think that there are two things to do before I can fully wrap up this "interventions" mini-series, and get back to the main-line LLM from scratch stuff. Firstly, I should revisit the instruction fine-tuning tests, which I put on hold while doing these training runs. That would give us some indication as to whether the loss improvement was just a technical improvement that made a number go down, or whether it actually improved the usefulness of the model. Secondly, I think I really need to write a wrap-up. I've been working on this stuff on and off since December, and I think a summary of what I did would be quite nice! I'll post soon; don't touch that dial :-) Thanks to this Stack Overflow answer for that trick.  ↩ I'm going to switch to six decimal places from now on -- previously I was rounding it to three, hence 3.500.  ↩ Each process does a forward pass. Each process does a backward pass. When they have the gradients, they essentially share them so that each process has an average of the gradients from all of those backward passes. Then they all step their optimisers to apply the average gradients to each process's copy of the model. For each gradient accumulation step: Each process does a forward pass. Each process does a backward pass. The average is worked out They all step their optimisers based on the most recent average Whether the increased effective batch size had as positive an effect on the loss as the increased real batch size did when I did my cloud runs. Whether the locally-trained gradient accumulation model was similar to the cloud-trained big-batch model in terms of its loss. Gradient clipping at 3.5 Learning rate changed from 0.0004 to 0.0014, with a warmup over 5% of the run then a cosine decay to 0.00014. Weight decay changed from 0.1 to 0.01 Dropout removed Thanks to this Stack Overflow answer for that trick.  ↩ I'm going to switch to six decimal places from now on -- previously I was rounding it to three, hence 3.500.  ↩

0 views

Speed is Not Conducive to Wisdom

Speed has become the primary virtue of the modern world. Everything is sacrificed to it. Move fast (and break things, not as a goal but as a consequence). Wisdom requires allowing yourself to be undone by experience: Experiencing these can be slow and uncomfortable, but if you keep up your speed you can outrun them — never reflecting on what happened in your wake. Speed is how you avoid reckoning. It guarantees you miss things, and you can’t learn from what you don’t notice. Wisdom’s feedback loop is slow. Wise people I’ve met seem unhurried. I don’t think it’s because they’re slow thinkers or actors. I think it’s because they’ve learned that important things take the time they take, no amount of urgency changes that. Wisdom is chasing all of us, but we’re going too fast to notice what it’s trying to teach us. Reply via: Email · Mastodon · Bluesky An opinion dismantled by reality. An artifact torn apart by the real world. An idea destroyed by its own shortsightedness.

0 views

Linux Apps Starter Kit (Gnome Edition)

I find beautiful, well-designed, native applications to be a source of inspiration when using my computer. I've posted on Mastodon about the native Mac applications that were hard to leave when switching to Linux. Now that I've fully made the switch, I figure it's only fitting to do the reverse post on the Linux applications that I've fallen in love with. For this post, I'll be focusing on Gnome/GTK/Adwaita applications. Why? Two reasons. First off, I use Fedora with Gnome 49 so I'm most familiar with this territory. Second, Gnome has a very well defined HIG (Human Interface Guidelines), resulting in a strong visual identity. Applications enhance the operating system in a consistent, fluid way, rather than serving a jarring experience (ie an electron app with radically different UI/UX). This is key to me for finding inspiration and joy when using an application. With all that said, let's dig into the apps I consider essential! Internet radio is awesome and Shortwave is the best application I've found on any platform for listening to it. Search for stations, add to your library and jam! It also has a DVR-like function (okay I get it, I'm getting old) so you can download tracks you've listened to. Finally, there's an amazing skeuomorphic mini-player (I'm a sucker for skeuomorphic design). 📦 Shortwave on Flathub ♥️ Support Shortwave 👤 Meet the Developer "Plays music, and nothing else" is the tagline of this beautiful audio player. For those of us still rocking local media collections, Amberol is the way to go. I mean, just look at it! Point it at a folder, play the music inside, easy! I have my NAS mounted as a bookmark in Nautilus, so I just point Amberol to my network music folder. Who needs streaming?!? 📦 Amberol on Flathub ♥️ Support Amberol 👤 Meet the Developer I've been using Blanket longer than most apps on this list, long before I made the full switch to Linux (and heck, it's probably one of the reasons I eventually made the switch). It's a no-frills ambient noise machine. Comes with a large selection of high-quality samples that can individually be toggled and adjusted. You can save preset configurations (ie coffee shop in a thunder storm), and add your own audio samples. On any other platform this would cost $15 or more, but here it is on Linux, free and open-source. 📦 Blanket on Flathub ♥️ Support Blanket 👤 Meet the Developer Need to quickly edit an image or make a thumbnail? Pinta to the rescue! It's fast and has a familiar UX. Sure, it's not as powerful as GIMP, but I find myself reaching to it more often. 📦 Pinta on Flathub 👤 Meet the Developers This app right here should be a default Gnome app, it's that good! Hands down the most powerful and user friendly screenshot tool I've used (and yeah, I've tried the popular Mac OS ones). Bind Gradia to a shortcut (I use Super + Shift + S) and it'll open after you take a screenshot. Gradia lets you add arrows, drawings, blur text, perform OCR, crop, add backdrops and more. It's honestly an essential application, and performs better than apps I paid $15+ for on Mac. 📦 Gradia on Flathub ♥️ Support Gradia 👤 Meet the Developer There's a lot of single purpose, well-built applications for Gnome, and Switcheroo is a great one I use daily. It takes an image in, and outputs in a different format. You can add on compression, resizing, strip metadata and replace transparency. I use it to optimize images for web. 📦 Switcheroo on Flathub ♥️ Support Switcheroo 👤 Meet the Developer I don't use social media beyond Mastodon, but Tuba makes me glad I'm at least on that platform. Tuba is well designed, fast and filled with thoughtful features (like a custom emoji picker and the ability to schedule posts). I've tried the best on Mac (Ice Cubes), and it doesn't get close to comparing with Tuba. 📦 Tuba on Flathub ♥️ Support Tuba 👤 Meet the Developer Mmmm RSS, my favorite (and probably how you're reading this article)! Newsflash is a great excellent, way to stay on top of your feeds. It's got categories, tags, OPML import/export, themes, and more. My favorite feature is the "Today" tab filtered by unread, great to catch up on what's new. 📦 Newsflash on Flathub 👤 Meet the Developer Here it is, my top pick. You don't even need to read this, just go download Planify, it's incredible. Alain took todos and added a bucketload of thoughtfully designed microinteractions. Labels, scheduling, today view, sections, kanban board, natural text to date parsing, the list goes on. When you hover the "Add" button, it does a little animation. When you complete a task, it gives a little sound. There's so many thoughtfully designed pieces in here! 📦 Planify on Flathub ♥️ Support Planify 👤 Meet the Developer Markdown based note taking, done very well. Notes are organized into notebooks and paired with a pleasant, minimalist markdown editor. 📦 Folio on Flathub 👤 Meet the Developer Distraction free markdown editor for writing long form content. Basically, the Linux alternative to iA Writer on Mac. It's beautiful, fast and has just enough features. I use it to write most of my blog posts! 📦 Apostrophe on Flathub ♥️ Support Apostrophe 👤 Meet the Developer Another excellent, single-purpose application that I use on a daily basis. Sessions is an egg/pomodoro timer that beeps when time's up. You just drag the slider and the timer starts. Great for keeping yourself focused! 📦 Sessions on Flathub ♥️ Support Sessions 👤 Meet the Developer Holy crap this app looks good! John did an incredible job building the best ebook reader on Linux. You can bring your own books, or use the catalogs feature to discover public domain literature. There's support for annotations (with import/export), bookmarks, text to voice and theming. 📦 Foliate on Flathub ♥️ Support Foliate 👤 Meet the Developer Got a sqlite database and want to know what's inside? Bobby to the rescue! Drag and drop your database file in and see the data. Simple, well designed and useful! 📦 Bobby on Flathub ♥️ Support Bobby 👤 Meet the Developer There's so much value packed into this app! Replace random sketchy websites you found on Google by using Dev Toolbox to generate a QR code, check contrast ratios, parse CRON strings and so much more. There's too much in here to cover, but it's become an essential part of my toolkit. 📦 Dev Toolbox on Flathub 👤 Meet the Developer Bazaar is a faster, more reliable and visually more appealing alternative to the default Gnome Software application. It's one of my first installs on a new system and another application that should be a default Gnome app. 📦 Bazaar on Flathub ♥️ Support Bazaar 👤 Meet the Developer The absolute best way to discover, install and update Gnome shell extensions! 📦 Extension Manager on Flathub ♥️ Support Extension Manager 👤 Meet the Developer Copyous is a shell extension, and it's the best clipboard manager out there. Visually browse and search your clipboard history. Supports image previews, syntax highlighting, color previews (ie copy a hex code and it shows the color) and so much more! 📦 Copyous on Gnome Extensions There's so many amazing applications on Linux that I definitely missed some! Feel free to shoot me an email at [email protected] with recommendations. I'll do a separate post in the future for KDE applications! I mentioned a few times in this article that some applications on Linux provide better value than alternatives I paid for on Mac OS. There's not a single paid application on this list, but that does not mean you shouldn't support the developers! These developers work hard to design, build, test and support the software that makes Linux great. If you like their work, show them some love!

0 views
DHH Yesterday

The malleable computer

Open source promised that users would be free to change whatever code they were running. The reality, however, is that hardly any of them ever did — it was simply too hard. Now, with AI, it suddenly isn't. This is very exciting. Being able to add features to any local open-source application and then use that bespoke fork for your own benefit is an incredible step toward the original open source promise. This isn't just about regular users, either. Even if you are a programmer, you might not be familiar with the language the application is written in. And even if you are, taking the time to get familiar with any substantial codebase is a tall order. AI is compressing that complexity and making it malleable at a ferocious rate. What excites me even more, though, is when this power is applied to the operating system, and thus the entire computer. When you're able to change not just individual applications, but your system's menu bars, your window manager, your notification system, your everything.  But you can only do this on Linux. With Windows and macOS, the core elements of the operating system are owned by the companies that make them. While it's often possible to hack certain aspects, it's far from truly having the malleable computer that Linux allows. I've already seen this a lot in the Omarchy world: users who aren't super technical making the system their own with the help of AI and being utterly delighted by the outcome. And while this is still a pretty nerdy thing to do, I don't think it will remain contained to that niche for long. As models get even more powerful, the idea that your system is tied down as a fixed black box is likely to become an archaic notion pretty quickly. As always, the future is already here, it's just not evenly distributed.

0 views
David Bushell Yesterday

Warning: containment breach in cascade layer!

CSS cascade layers are the ultimate tool to win the specificity wars. Used alongside the selector, specificity problems are a thing of the past. Or so I thought. Turns out cascade layers are leakier than a xenonite sieve. Cross-layer shenanigans can make bad CSS even badder. I discovered a whole new level of specificity hell. Scroll down if you dare! There are advantages too, I’ll start with a neat trick. To setup this trick I’ll quickly cover my favoured CSS methodology for a small website. I find defining three cascade layers is plenty. In I add my reset styles , custom properties, anything that touches a global element, etc. In I add the core of the website. In I add classes that look suspiciously like Tailwind , for pragmatic use. Visually-hidden is a utility class in my system. I recently built a design where many headings and UI elements used an alternate font with a unique style. It made practical sense to use a utility class like the one below. This is but a tribute, the real class had more properties. The class is DRY and easily integrated into templates and content editors. Adding this to the highest cascade layer makes sense. I don’t have to worry about juggling source order or overriding properties on the class itself. I especially do not have to care about specificity or slap everywhere like a fool. This worked well. Then I zoom further into the Figma picture and was betrayed! The design had an edge case where letter-spacing varied for one specific component. It made sense for the design. It did not make sense for my system. If you remember, my cascade layer takes priority over my layer so I can’t simply apply a unique style to the component. For the sake of a demo let’s assume my component has this markup. I want to change back to the normal letter-spacing. Oops, I’ve lost the specificity war regardless of what selector I use. The utility class wins because I set it up to win. My “escape hatch” uses custom property fallback values . In most cases is not defined and the default is applied. For my edge case component I can ‘configure’ the utility class. I’ve found this to be an effective solution that feels logical and intuitive. I’m working with the cascade. It’s a good thing that custom properties are not locked within cascade layers! I don’t think anyone would expect that to happen. In drafting this post I was going to use an example to show the power of cascade layers. I was going to say that not even wins. Then I tested my example and found that does actually override higher cascade layers. It breaches containment too! What colour are the paragraphs? Suffice it to say that things get very weird. See my CodePen . Spoiler: blue wins. I’m sure there is a perfectly cromulent reason for this behaviour but on face value I don’t like it! Bleh! I feel like should be locked within a cascade layer. I don’t even want to talk about the inversion… I’m sure there are GitHub issues, IRC logs, and cave wall paintings that discuss how cascade layers should handle — they got it wrong! The fools! We could have had something good here! Okay, maybe I’m being dramatic. I’m missing the big picture, is there a real reason it has to work this way? It just feels… wrong? I’ve never seen a use case for that wasn’t tear-inducing technical debt. Permeating layers with feels wrong even though custom properties behaving similar feels right. It’s hard to explain. I reckon if you’ve built enough websites you’ll get that sense too? Or am I just talking nonsense? I subscribe to the dogma that says should never be used but it’s not always my choice . I build a lot of bespoke themes. The WordPress + plugin ecosystem is the ultimate specificity war. WordPress core laughs in the face of “CSS methodology” and loves to put styles where they don’t belong . Plugin authors are forced to write even gnarlier selectors. When I finally get to play, styles are an unmitigated disaster. Cascade layers can curtail unruly WordPress plugins but if they use it’s game over; I’m back to writing even worse code. Thanks for reading! Follow me on Mastodon and Bluesky . Subscribe to my Blog and Notes or Combined feeds.

0 views
Kev Quirk Yesterday

I Wish I Could Talk to My Dad

My best friend lost his Dad yesterday. Understandably he's extremely upset, and I feel awful for him. I never know what to do in these situations - "how are you doing?" just feels such a stupid thing to say. Like it's nowhere near enough. Of course he isn't doing well, you fucking idiot! His loss has brought about feelings of loss following the death of my own Dad. Who we lost back in 2008 to cancer, when he was 47. Watching him just wither away was heartbreaking. Especially at the age of 23. Now, nearly 20 years on, I rarely get upset about the loss. I still think about him all the time, but seeing what my friend has been going through has jumped it right to the front of my mind. Especially since the loss of my sister is still so raw. I had a dream about my dad last night, the first I've had in a while. The dream was nothing special, I don't even fully remember what happened in it. But what I do vividly remember was that his voice wasn't right. And then I realised, I don't remember what my Dad's voice sounded like. I have no videos of him, and no recordings on his voice. For a year or so after he died, I used to call his phone as it would go straight to voicemail and I'd get to hear his voice. Eventually the line was cut though. I wish I'd recorded it, just to have something. I don't even have many photos of him. Most of them are from when I was a baby. I only have 1 photo of him and I as adults, which was taken on the day I passed out of basic training in the Army. LTR: My dad, me, my dad's dad. Not being able to remember his voice isn't the only reason I'd love to talk to him again. He was funny, and always made me belly laugh. He loved to sing too - and was bloody good at it! I'm also a very different person now than I was in 2008. I'd like for him to meet his grandsons, and I'd like to know what he thinks of the man I've turned into. He only met my (now) wife once or twice - he'd have loved her, and she'd have loved him. All very narcissistic, I know. Be he was my dad! Conversely, I'd love to know what kind of an old man he turned into. Would he still be as funny? Or would have turned into a grumpy old curmudgeon? Would we still go for a couple beers every Friday? Would he come here for barbecues in the summer? I'd have loved that. There's no real point to this post, really. These thoughts have just been spinning around my grey matter for the last few days, and I wanted to work through them, which I think I've done a pretty poor job of. So yeah, losing a loved one is shit. It never leaves you, and I feel horrendously sorry for my mate. I'll try and make the next one more positive... Thanks for reading this post via RSS. RSS is ace, and so are you. ❤️ You can reply to this post by email , or leave a comment .

0 views

How an SSD Works

☕ Welcome to The Coder Cafe! Today, we explore quantum physics. Not the abstract kind, but the kind that runs inside the device you are reading this on. Indeed, every time you save a file to an SSD, electrons exploit quantum physics to cross a physical barrier they classically have no business crossing. I’m not a physicist, but I’ve been in love with quantum physics for years, and over the last few months I've gone deep into these concepts. Get cozy, grab a coffee, and let’s begin! An Introduction to Matter To start, what is matter? Matter is made up of molecules, and molecules are assemblages of atoms , the building blocks of matter. For example, water is an H₂O molecule: 2 hydrogen atoms and 1 oxygen atom. An atom is itself composed of a nucleus and electrons , which carry a negative charge and orbit around it. The nucleus contains two types of particles: Protons , which carry a positive charge, naturally repel each other. And neutrons , which carry no electric charge and act as a kind of “glue,” helping to keep the nucleus stable. The attraction between electrons (−) and protons (+) keeps the whole thing in a stable state . On the other hand, too few or too many neutrons relative to the protons, and the nucleus becomes unstable. It will eventually decay by emitting energy. This is the principle of radioactivity. Carbon-14, for example, is slightly unstable. It decays slowly and predictably. This predictability allows it to be used as a clock to date ancient elements. One might think that when touching a solid object, like a table, for instance, what gives the table its solidity is that it is “filled” with matter, preventing our finger from passing through. Yet, if we took the nucleus and enlarged it into a marble and placed that marble on a football pitch, the electrons would be orbiting at the level of the stands. If the nucleus of an atom were the size of a marble placed at the center of a football pitch, the electrons would only be found orbiting at the level of the distant stands with almost nothing in between. An atom is therefore almost entirely empty . Solid matter is almost nothing, and what gives this impression of solidity are forces between atoms called electromagnetic forces . The universe is made up of 4 and only 4 fundamental forces: Gravity : Attracts everything with mass toward everything else with mass. The strong nuclear force : It glues protons and neutrons together inside the nucleus. The weak nuclear force : Responsible for certain radioactive decays. It is what allows a neutron to transform into a proton (or vice versa). And the electromagnetic force . If we focus on this last one, it is the one that: Attracts opposite charges And repels identical charges. Unlike the two nuclear forces, which only act inside the nucleus, the electromagnetic force has an infinite range. That is why it is the one that governs interactions between atoms at our scale. It is therefore the electromagnetic force that creates the illusion of solidity . When we touch a table, it is the electrons in our hand and those in the table that repel each other. We never truly touch anything. Let’s now talk about light. So, what is light ? It is an electromagnetic wave, a disturbance of the electric and magnetic fields that propagates through space. Light is a spectrum . Indeed, so-called visible light, the light our eyes can perceive, is only a tiny portion of what exists. The full spectrum is called the electromagnetic spectrum: Radio wave → Microwave → Infrared → Visible light → UV → X-rays → Gamma rays When a radio picks up radio waves, it is therefore picking up light, invisible due to its frequency. Indeed, what varies across the electromagnetic spectrum is the frequency of the wave, and therefore its energy. But light hides a surprise: it is also a particle. NOTE : A particle can be summarized as follows: an indivisible packet of energy. We know it is a particle thanks to Einstein in 1905 (for which he received his only Nobel Prize, not for relativity). When a light bulb emits light, it emits specific particles called photons. When we vary the intensity of that light bulb, one might assume it is the energy of the photon that varies, but that is not the case. The energy of each photon is fixed by its frequency. The higher the frequency of a photon, the more energetic each photon is. That is why, for example, UV rays burn the skin. What makes a light bulb emit more light is the increase in electric voltage, which therefore produces more photons. It is the quantity of photons that makes a light bulb shine more or less. In flight, the photon behaves like a wave : it propagates, it oscillates, and it can interfere with other photons. But when it comes into contact with matter, it behaves like a particle: it interacts in one single hit, in one single place. When a photon collides with matter, it can either be: Absorbed : The photon ceases to exist. Its energy is transferred to an atom, which moves to a higher energy level. This is what an eye does: it absorbs the photon and converts it into an electrical signal. Reflected : Technically, this is not a true reflection because it is not the same photon that leaves. The atom absorbs the photon and then re-emits a photon of the same energy in a different direction. NOTE : What determines whether a photon is absorbed or reflected depends on the energy levels of the electrons in the atoms of the surface. If the photon’s frequency matches an available energy level, the atom absorbs it. Otherwise, the photon is re-emitted. That is why glass is transparent, why the retina absorbs light, and why a mirror reflects almost everything. We have seen that light is a wave. But how do we know this? This is where Young’s double-slit experiment comes in, and it is this very experiment that will lay the foundations of quantum physics. Young’s experiment, carried out for the first time in 1801, is as follows: A laser projects photons (light) A wall with two small slits, A and B A screen behind to detect where the photons land If light were a “packet” of something, we would see the following result: If light behaved purely as a particle, firing it through two slits would simply produce two bright bands on the screen, one for each slit ( credits ). Yet, the result of Young’s double-slit experiment is as follows: Instead of two bands, light actually produces multiple alternating stripes on the screen, proof that it behaves as a wave, interfering with itself after passing through both slits simultaneously ( credits ). We obtain what is called an interference pattern . The wave passes through both slits simultaneously, splits into two, and these two waves meet on the other side. Where two light waves meet after passing through a double slit, they either reinforce or cancel each other out, creating an alternating pattern of bright and dark bands on the screen. When two waves meet, they add up or cancel out depending on their respective phase: Two crests meeting → they add up → bright zone A crest meeting a trough → they cancel out → dark zone The result is an alternating pattern of bright and dark bands on the screen: that is an interference pattern . In the 20th century, researchers then had an idea: apply Young’s experiment no longer by projecting photons (light) but electrons (matter). The experiment is therefore similar, but instead of a laser, an electron gun is used to then measure on the screen where the matter lands. Obviously, with this experiment, we are going to get two bands of matter, right? Well, still no! An interference pattern is observed as well . This result was not a complete surprise to everyone. In 1924, physicist Louis de Broglie had already theoretically proposed that matter, like light, could have a wave-like nature. But this time, it's not a wave-like light; it's a probability wave . This is one of the greatest discoveries in quantum physics: at the atomic level, the particle has no defined position . The position of a particle is determined by a function called the wave function , , which describes the probabilities of finding that particle at a given location in time. A smooth sinusoidal curve representing the wave function ψ(x), showing how the probability of finding a particle oscillates across different positions in space. A small clarification on this concept of undefined position to make sure the concept is clear, because this is the moment where our rational brain can start to “let go.” Let’s take a coin for a coin toss. We throw it in the air and hide the result. We are in a state of uncertainty, but this uncertainty is called epistemic . We do not know the result (heads or tails) because we have not looked yet, yet that result already exists. For a particle in the quantum world, the uncertainty is called ontological . It is not that we lack information about the position of the particle; it is that this position simply does not exist yet . This is what is called quantum superposition : an unmeasured particle exists in multiple states simultaneously. However, measurement changes everything. When we measure the position of a particle, we will find it in one of the possible positions described by the wave function. We then say that the wave function “collapses” because it restricts the possibilities into a single real state. Once a particle’s position is measured, its wave function collapses from a spread of possibilities into a single sharp spike, pinpointing the particle at one exact location. As an analogy, it is a bit like Minecraft. A default Minecraft map is 60 million x 60 million blocks. For the initial loading, the server does not generate the entire map. It only generates the world around the observer , i.e., the player. However, when the player moves, they force the server to generate the world's continuation. Where this analogy reaches its limits is that the generation of the Minecraft world, even if it is random, is still deterministic because each world has its own seed. The quantum world, on the other hand, appears to be purely random, meaning without hidden information. Let’s return to Young’s experiment. What would happen if, when a particle passes through a slit, we placed a detector there to observe which slit the particle goes through? We recall that a wave passes through both slits at once. When a detector is added to observe which slit the particle passes through, the outcome on the screen becomes uncertain because the act of measuring the particle’s position disrupts its wave-like behavior. This is the moment where the brain completely lets go: observing the particle changes the result of the experiment . Indeed, observing that particle “forces” it to have a defined position, and it then behaves like a classical marble. The result, therefore, gives us two bands of matter . To summarize what we have seen so far: an unobserved particle exists as a probability wave, in multiple positions simultaneously. As soon as we measure it, this wave collapses, and the particle ends up at a precise location. But then, why do we never see this in everyday life? The answer is decoherence . Quantum superposition is only possible as long as a particle remains isolated from its environment. As soon as it interacts with anything , another atom, a photon, an electric field, that interaction constitutes a measurement in the quantum sense. The wave function collapses, and the particle ends up in a precise state. An isolated electron in a vacuum can remain in superposition. But a macroscopic object like a table is made up of billions upon billions of atoms that permanently interact with the surrounding air, light photons, and electromagnetic fields. These interactions occur billions of times per second. The superposition collapses instantaneously before we can even observe it. That is why quantum physics is only observable at the atomic scale. And that is also why a single electron in a transistor behaves very differently from an object we can hold in our hand. OK, so the original Young’s experiment with light produces an interference pattern because light is a wave. The variation with electrons (or indeed subsequently other elements such as atoms) also produces an interference pattern, which proves that matter is a wave, but this time a probability wave. When we measure the result, we change the result of the experiment because we force the particle to “choose” its position. But incidentally, how does this measurement work in the experiment? It works thanks to photons . Indeed, when the electron passes through one of the slits, we project a photon which will interact with the electron and be re-emitted in a direction that allows us to deduce which slit the electron went through. Researchers wanted to know what would happen if they performed the exact same experiment, measuring which slit the particle went through, but this time, instead of reading the information encoded in the orientation of the photon, they destroyed that information . And here, another surprise: if we destroy the information, we return to an interference pattern . It was as if, since we were not using that information, there was nothing forcing the particle to choose which slit to go through, and so it could remain in the form of a probability wave. This new experiment, therefore, demonstrates something fundamental in quantum physics: technically, it is not the act of measuring that influences the experiment, but whether or not this information exists somewhere in the universe . If this information is destroyed, the interference pattern returns. The key is therefore information . NOTE : How does the destruction of this information work? One might think it would simply be a matter of having the photon absorbed by an absorbing surface before reading it, but this does not work, and we are left with bands. Indeed, by doing so, the information theoretically exists because the absorbing surface could have determined the position of the particle through the orientation of the photon. The destruction works with another incredible principle of quantum physics that I will not detail in this article: entanglement. The photon is sent onto a special crystal, which splits it into two twin photons quantumly linked. One of the twins is then destroyed, making the information unrecoverable because to read the information, one absolutely needs to read both twins. To simplify, the two twins are not copies; they form a single system whose properties are not individually defined. We are slowly getting closer to SSDs. But before that, there is one last quantum concept we need to talk about: the tunnel effect . We said that an unobserved electron does not exist like a marble at a precise location. It exists as a probability wave spread out in space. This wave function gives a probability of finding the electron at each point in space. Now let’s imagine a physical barrier . We send an electron toward this barrier. Classically, if the electron does not have enough energy to pass over it, it is blocked. Full stop. Yet quantum mechanically, the wave function of the electron does not stop abruptly at the barrier. Because it is a wave, it propagates and gradually decays through the barrier. It does not fall to zero. On the other side, there therefore remains a non-zero probability of finding the electron . This is the tunnel effect: a real chance for the electron to end up on the other side , without having had the classical energy needed to cross. This probability is not fixed. It depends directly on the thickness of the barrier: the thinner the barrier, the more the wave function survives on the other side, and the higher the tunneling probability. At our scale, the barriers are far too thick for this effect to be observable. But at the scale of a few nanometers, the probability exists. In an SSD, we want to store data. They work with bits, but it is precisely in the management of these bits that the principles of quantum physics come into play. In an SSD, each bit is encoded in cells called floating gates : small zones isolated on all sides by an insulating layer. This box can contain electrons or not: Box with electrons : Bit = 0 Box without electrons : Bit = 1 In an SSD, each bit is stored in a floating gate cell: a cell filled with electrons represents a 0, while an empty cell represents a 1. If we need: To write , we therefore need to make electrons enter this isolated box. What we do is apply an electric voltage that deforms the wave function of the electrons and increases their probability of ending up on the other side. The electrons, therefore, cross the barrier via the tunnel effect. To erase , we apply a reverse voltage, which also impacts the wave function, and the electrons cross in the other direction. To read , it is a classical, non-quantum measurement: we measure the electric current passing through the transistor. Electrons present: weak current: 0. No electrons: strong current: 1. We saw, however, that the wave function gives a probability , not a certainty. If we apply a voltage to write or erase, we therefore only have a probability that the electron will cross the barrier. How can an SSD be reliable then? An individual electron is unpredictable, but we never send just one electron. We send millions simultaneously. Statistically, enough of them cross the barrier to charge the floating gate reliably. And after each write, the controller immediately re-reads the cell to verify. If not enough electrons have crossed, it tries again. That is why SSDs embed error correction mechanisms, ECC (Error Correcting Code) , precisely because the process is probabilistic by nature. When a cell exceeds a certain error threshold over time, it is finally marked as defective and taken out of service. The data it held is moved to a healthy cell. That is why SSDs always have an over-provisioning capacity: a reserve of cells invisible to the user, planned from the manufacturing stage to replace defective cells over time. And that is also why an SSD does not fail all at once; it degrades progressively , cell by cell, until the reserve is exhausted. And this is where quantum physics imposes its limits. The more transistors shrink, the thinner the insulating barriers become, and the more the tunnel effect becomes uncontrollable, electrons escape spontaneously, errors increase, and cells age faster. Moore’s Law, which predicts a doubling of transistor density every two years, is today running up against these fundamental physical limits. This is not an engineering problem: it is quantum physics that sets the boundary . Matter is made up of atoms, themselves composed of a nucleus (protons and neutrons) and electrons. An atom is almost entirely empty: what we perceive as “solid” is an illusion created by the electromagnetic forces between atoms. Light is both an electromagnetic wave and a particle called a photon. In flight, it behaves like a wave, but it is emitted and absorbed like a particle, in one single hit, in one single place. Young’s double-slit experiment proves that light is a wave : it produces an interference pattern, impossible to obtain with classical particles. Matter behaves in the same way. But unlike light, its wave is not physical: it is a probability wave that describes the possible positions of a particle. This is quantum superposition: an unmeasured particle exists in multiple states simultaneously. It is not the act of measuring that collapses the superposition: it is the existence of the information somewhere in the universe. If the information is destroyed, the superposition is restored. Decoherence explains why we never see superposition at our scale: any macroscopic object permanently interacts with its environment, which instantaneously collapses its wave function. The tunnel effect is a direct consequence of the wave-like nature of particles: the wave function of an electron does not stop abruptly at a physical barrier. There exists a non-zero probability of finding it on the other side, without having had the classical energy to cross. SSDs exploit the tunnel effect to write and erase data: an electric voltage deforms the wave function of electrons and increases their probability of crossing the insulating barrier of a floating gate. Reliability rests on the large number of electrons sent and on ECC. Missing direction in your tech career? At The Coder Cafe, we serve timeless concepts with your coffee to help you master the fundamentals. Written by a Google SWE and trusted by thousands of readers, we support your growth as an engineer, one coffee at a time. Instruction Pipelining Simultaneous Multithreading Linux Soft vs. Hard Lockup Something Deeply Hidden We Have No Idea Quantum Country The Double-Slit Experiment - Veritasium ❤️ If you enjoyed this post, please hit the like button. 💬 Did you know quantum physics was hiding in your laptop all along? I’d love to hear your reaction in the comments. Leave a comment An Introduction to Matter To start, what is matter? Matter is made up of molecules, and molecules are assemblages of atoms , the building blocks of matter. For example, water is an H₂O molecule: 2 hydrogen atoms and 1 oxygen atom. An atom is itself composed of a nucleus and electrons , which carry a negative charge and orbit around it. The nucleus contains two types of particles: Protons , which carry a positive charge, naturally repel each other. And neutrons , which carry no electric charge and act as a kind of “glue,” helping to keep the nucleus stable. If the nucleus of an atom were the size of a marble placed at the center of a football pitch, the electrons would only be found orbiting at the level of the distant stands with almost nothing in between. An atom is therefore almost entirely empty . Solid matter is almost nothing, and what gives this impression of solidity are forces between atoms called electromagnetic forces . The Fundamental Forces in the Universe The universe is made up of 4 and only 4 fundamental forces: Gravity : Attracts everything with mass toward everything else with mass. The strong nuclear force : It glues protons and neutrons together inside the nucleus. The weak nuclear force : Responsible for certain radioactive decays. It is what allows a neutron to transform into a proton (or vice versa). And the electromagnetic force . Attracts opposite charges And repels identical charges. Absorbed : The photon ceases to exist. Its energy is transferred to an atom, which moves to a higher energy level. This is what an eye does: it absorbs the photon and converts it into an electrical signal. Reflected : Technically, this is not a true reflection because it is not the same photon that leaves. The atom absorbs the photon and then re-emits a photon of the same energy in a different direction. A laser projects photons (light) A wall with two small slits, A and B A screen behind to detect where the photons land If light behaved purely as a particle, firing it through two slits would simply produce two bright bands on the screen, one for each slit ( credits ). Yet, the result of Young’s double-slit experiment is as follows: Instead of two bands, light actually produces multiple alternating stripes on the screen, proof that it behaves as a wave, interfering with itself after passing through both slits simultaneously ( credits ). We obtain what is called an interference pattern . The wave passes through both slits simultaneously, splits into two, and these two waves meet on the other side. Where two light waves meet after passing through a double slit, they either reinforce or cancel each other out, creating an alternating pattern of bright and dark bands on the screen. When two waves meet, they add up or cancel out depending on their respective phase: Two crests meeting → they add up → bright zone A crest meeting a trough → they cancel out → dark zone A smooth sinusoidal curve representing the wave function ψ(x), showing how the probability of finding a particle oscillates across different positions in space. A small clarification on this concept of undefined position to make sure the concept is clear, because this is the moment where our rational brain can start to “let go.” Let’s take a coin for a coin toss. We throw it in the air and hide the result. We are in a state of uncertainty, but this uncertainty is called epistemic . We do not know the result (heads or tails) because we have not looked yet, yet that result already exists. For a particle in the quantum world, the uncertainty is called ontological . It is not that we lack information about the position of the particle; it is that this position simply does not exist yet . This is what is called quantum superposition : an unmeasured particle exists in multiple states simultaneously. However, measurement changes everything. When we measure the position of a particle, we will find it in one of the possible positions described by the wave function. We then say that the wave function “collapses” because it restricts the possibilities into a single real state. Once a particle’s position is measured, its wave function collapses from a spread of possibilities into a single sharp spike, pinpointing the particle at one exact location. As an analogy, it is a bit like Minecraft. A default Minecraft map is 60 million x 60 million blocks. For the initial loading, the server does not generate the entire map. It only generates the world around the observer , i.e., the player. However, when the player moves, they force the server to generate the world's continuation. Where this analogy reaches its limits is that the generation of the Minecraft world, even if it is random, is still deterministic because each world has its own seed. The quantum world, on the other hand, appears to be purely random, meaning without hidden information. Let’s return to Young’s experiment. What would happen if, when a particle passes through a slit, we placed a detector there to observe which slit the particle goes through? We recall that a wave passes through both slits at once. When a detector is added to observe which slit the particle passes through, the outcome on the screen becomes uncertain because the act of measuring the particle’s position disrupts its wave-like behavior. This is the moment where the brain completely lets go: observing the particle changes the result of the experiment . Indeed, observing that particle “forces” it to have a defined position, and it then behaves like a classical marble. The result, therefore, gives us two bands of matter . To summarize what we have seen so far: an unobserved particle exists as a probability wave, in multiple positions simultaneously. As soon as we measure it, this wave collapses, and the particle ends up at a precise location. But then, why do we never see this in everyday life? The answer is decoherence . Decoherence Quantum superposition is only possible as long as a particle remains isolated from its environment. As soon as it interacts with anything , another atom, a photon, an electric field, that interaction constitutes a measurement in the quantum sense. The wave function collapses, and the particle ends up in a precise state. An isolated electron in a vacuum can remain in superposition. But a macroscopic object like a table is made up of billions upon billions of atoms that permanently interact with the surrounding air, light photons, and electromagnetic fields. These interactions occur billions of times per second. The superposition collapses instantaneously before we can even observe it. That is why quantum physics is only observable at the atomic scale. And that is also why a single electron in a transistor behaves very differently from an object we can hold in our hand. The Key is Information OK, so the original Young’s experiment with light produces an interference pattern because light is a wave. The variation with electrons (or indeed subsequently other elements such as atoms) also produces an interference pattern, which proves that matter is a wave, but this time a probability wave. When we measure the result, we change the result of the experiment because we force the particle to “choose” its position. But incidentally, how does this measurement work in the experiment? It works thanks to photons . Indeed, when the electron passes through one of the slits, we project a photon which will interact with the electron and be re-emitted in a direction that allows us to deduce which slit the electron went through. Researchers wanted to know what would happen if they performed the exact same experiment, measuring which slit the particle went through, but this time, instead of reading the information encoded in the orientation of the photon, they destroyed that information . And here, another surprise: if we destroy the information, we return to an interference pattern . It was as if, since we were not using that information, there was nothing forcing the particle to choose which slit to go through, and so it could remain in the form of a probability wave. This new experiment, therefore, demonstrates something fundamental in quantum physics: technically, it is not the act of measuring that influences the experiment, but whether or not this information exists somewhere in the universe . If this information is destroyed, the interference pattern returns. The key is therefore information . NOTE : How does the destruction of this information work? One might think it would simply be a matter of having the photon absorbed by an absorbing surface before reading it, but this does not work, and we are left with bands. Indeed, by doing so, the information theoretically exists because the absorbing surface could have determined the position of the particle through the orientation of the photon. The destruction works with another incredible principle of quantum physics that I will not detail in this article: entanglement. The photon is sent onto a special crystal, which splits it into two twin photons quantumly linked. One of the twins is then destroyed, making the information unrecoverable because to read the information, one absolutely needs to read both twins. To simplify, the two twins are not copies; they form a single system whose properties are not individually defined. The Tunnel Effect We are slowly getting closer to SSDs. But before that, there is one last quantum concept we need to talk about: the tunnel effect . We said that an unobserved electron does not exist like a marble at a precise location. It exists as a probability wave spread out in space. This wave function gives a probability of finding the electron at each point in space. Now let’s imagine a physical barrier . We send an electron toward this barrier. Classically, if the electron does not have enough energy to pass over it, it is blocked. Full stop. Yet quantum mechanically, the wave function of the electron does not stop abruptly at the barrier. Because it is a wave, it propagates and gradually decays through the barrier. It does not fall to zero. On the other side, there therefore remains a non-zero probability of finding the electron . This is the tunnel effect: a real chance for the electron to end up on the other side , without having had the classical energy needed to cross. This probability is not fixed. It depends directly on the thickness of the barrier: the thinner the barrier, the more the wave function survives on the other side, and the higher the tunneling probability. At our scale, the barriers are far too thick for this effect to be observable. But at the scale of a few nanometers, the probability exists. How SSDs Use Quantum Physics In an SSD, we want to store data. They work with bits, but it is precisely in the management of these bits that the principles of quantum physics come into play. In an SSD, each bit is encoded in cells called floating gates : small zones isolated on all sides by an insulating layer. This box can contain electrons or not: Box with electrons : Bit = 0 Box without electrons : Bit = 1 In an SSD, each bit is stored in a floating gate cell: a cell filled with electrons represents a 0, while an empty cell represents a 1. If we need: To write , we therefore need to make electrons enter this isolated box. What we do is apply an electric voltage that deforms the wave function of the electrons and increases their probability of ending up on the other side. The electrons, therefore, cross the barrier via the tunnel effect. To erase , we apply a reverse voltage, which also impacts the wave function, and the electrons cross in the other direction. To read , it is a classical, non-quantum measurement: we measure the electric current passing through the transistor. Electrons present: weak current: 0. No electrons: strong current: 1. Matter is made up of atoms, themselves composed of a nucleus (protons and neutrons) and electrons. An atom is almost entirely empty: what we perceive as “solid” is an illusion created by the electromagnetic forces between atoms. Light is both an electromagnetic wave and a particle called a photon. In flight, it behaves like a wave, but it is emitted and absorbed like a particle, in one single hit, in one single place. Young’s double-slit experiment proves that light is a wave : it produces an interference pattern, impossible to obtain with classical particles. Matter behaves in the same way. But unlike light, its wave is not physical: it is a probability wave that describes the possible positions of a particle. This is quantum superposition: an unmeasured particle exists in multiple states simultaneously. It is not the act of measuring that collapses the superposition: it is the existence of the information somewhere in the universe. If the information is destroyed, the superposition is restored. Decoherence explains why we never see superposition at our scale: any macroscopic object permanently interacts with its environment, which instantaneously collapses its wave function. The tunnel effect is a direct consequence of the wave-like nature of particles: the wave function of an electron does not stop abruptly at a physical barrier. There exists a non-zero probability of finding it on the other side, without having had the classical energy to cross. SSDs exploit the tunnel effect to write and erase data: an electric voltage deforms the wave function of electrons and increases their probability of crossing the insulating barrier of a floating gate. Reliability rests on the large number of electrons sent and on ECC. Instruction Pipelining Simultaneous Multithreading Linux Soft vs. Hard Lockup Something Deeply Hidden We Have No Idea Quantum Country The Double-Slit Experiment - Veritasium

0 views
Stratechery Yesterday

Amazon Buys Globalstar, Delta to Add Leo, The Apple Angle

Apple's Globalstar acquisition is being framed as Apple versus SpaceX, but I think the real story is about Apple.

0 views

This blog post tells the time

Computer clock synchronization is a complicated process, requiring protocols like NTP and a specialized server to answer requests. In this post I explore a “serverless” method, which relies on widely available CDNs to distribute time. It’s the serverless time servers we didn’t realize we already had. This clock should display the correct time. If your device’s clock is set to the wrong time, it should tell you how far off the clock is set. The page starts the process by requesting a tiny asset through the Cloudflare CDN. As Cloudflare builds the response, an HTTP header transform rule adds timing information, like http.request.timestamp.sec , to the response headers. The client waits for the response and then analyzes the network request using the fine-grained metrics provided by performance resource timers . Finally, some math is applied to adjust for network delay. The PerformanceResourceTiming interface exposes detailed network timing information. This is similar to the web developer tools network tab, just accessible via JavaScript in the page. These metrics are extremely helpful to developers who are troubleshooting performance issues and will prove useful here. Notice that “sending” and “receiving” are both shown as zero milliseconds. The request and response used here are so small they likely fit in a single packet, so these events appear instantaneous. The only measurable part of the HTTP portion of the request is waiting for the network and the server to do their work. These detailed performance timing metrics help us address a major challenge of time distribution: any server provided timestamp has grown stale by the time it reaches the client. To account for this effect, the NTP protocol and similar software estimate the network round trip time. A good estimate for the network delay experienced by the response is “half the round trip time”, although this is not always accurate. Additional adjustment based on server processing delay further improves accuracy. Cloudflare helps us out by providing server-side timing information. The client generally can’t distinguish between network delay and server delay, so this information helps us estimate when the server generated the timestamp. This data includes metrics like cf.timings.origin_ttfb_msec which tells us how long the Cloudflare CDN waited on a response from Cloudflare Pages. At the end of all the measurement and the math, the clock display is an estimate. We’re guessing how much the server-provided timestamp aged before it reached the web browser. It’s an educated guess, informed by a lot of metrics, but there is uncertainty here. For a technique I’ve been calling serverless, I’ve sure talked about servers a lot. The term serverless really means that we’re not managing individual servers ourselves, the cloud hosting provider has abstracted those away. This setup uses Cloudflare Pages to host the tiny asset which this page fetches. The HTTP header transform rule is part of the CDN, we don’t even need Cloudflare Workers . So it’s just files I’ve pushed to GitLab, served by Cloudflare Pages, and some CDN configuration. Tons of servers, but abstracted away. Contrast this with NTP, where we’d need to run the NTP daemon and perhaps manage the underlying operating system. It feels “serverless” in comparison. The clock display includes error bounds, which describe the precision of the provided time. Network latency plays a big part here as we don’t know how long it took to reach Cloudflare or how long it took to get back. Network paths could be asymmetric or packet loss could cause unexpected delay in a single direction. While in normal cases we’d expect the server to process the request (and generate the timestamp) in the middle of our waiting period, in extreme cases those events could fall far to one side. This uncertainty, and the associated error bounds, are reduced when the network latency is lower, which plays into a strength of Cloudflare: their CDN points-of-presence are located geographically near major population centers. For most of us, network latency to Cloudflare is quite small. The performance resource timers also help us precisely estimate when Cloudflare processes the HTTP request as we can eliminate delays caused by DNS resolution, TCP handshake, and TLS session initialization. Precision could be improved by performing multiple requests and applying statistical analysis, but this page makes no such effort. In my testing I’ve often seen 60ms error bounds shown in the web clock. NTP clients, like the command line ntpdig has a much tighter estimate, closer to 6ms. This is an order of magnitude difference. While this method provides decent synchronization with the clock on Cloudflare CDN servers, we’ve got to consider how well synchronized that clock is with the official time. After all, if Cloudflare’s CDN servers provide the wrong timestamp it doesn’t matter how precisely we’ve synchronized, we’ll display the wrong time. Cloudflare’s CDN is not formally a time server, so we need to tread carefully when using it this way. I checked the accuracy against a couple sources. When I collected the ntpdig output shown above, my web clock reported I was behind by 130±70ms. These measurements are within each other’s error bounds, which shows agreement. I also checked using a GPS debugging app on my phone. GPS provides extremely accurate clocks and is likely the most accurate clock I can access. The clocks appeared to update in lock-step, again showing agreement. In this screenshot notice that my phone’s clock is ahead of the other clocks and this offset is detected by the web clock. In any case, it seems risky to depend on an unwitting time server, so without specific promises from Cloudflare I’d just consider this a demo. After all, the Internet doesn’t always know the right time . So far I’ve tested this on my laptop and phone, but I’d be interested to see how well it works for others. You can use tools like ntpdig or GPS debugging apps to compare. I’ve built a standalone web clock for that sort of testing. You may be surprised by how inaccurate your system time can be; slightly offset clocks are quite common. This is especially true when a device sleeps, suspends, or hibernates. When a computer’s CMOS battery is missing of failed, clocks can fall very far out of sync. I’d be curious to see what people discover (contact info at bottom). While the precision of this CDN-based method is relatively poor for a time synchronization protocol, it does offer some attractive features over current solutions. First and foremost: it’s web-native! NTP’s lack of security has been a growing concern. One replacement, Network Time Security (NTS), cryptographically authenticates information sent by the time server . The authenticated encryption of HTTPS similarly protects the CDN-based web clock approach. This avoids situations where an attacker-in-the-middle tampers with insecure NTP responses, messing up your system’s clock. There’s a lot of hazards here, unfortunately. Alternate time synchronization protocols have a history of mistakes, so its wise to be wary. Microsoft tried a TLS-based synchronization approach via Secure Time Seeding (STS). Their approach relied on time metadata in TLS connections, but most servers actually provide random data in the relevant field. This caused clock to reset to random times . In either case, this underscores the risks of getting a clock reference from systems that don’t realize they are being used as time servers. Closing on a more nostalgic note, NIST’s time.gov has a wonderfully retro clock widget . Unfortunately they no longer allow you to host it on your own site, probably due to server load. Here’s my own 88x31 badge, which is hereby MIT licensed. It makes use of SVG’s questionably ability to embed scripts in images.

0 views

Patch Tuesday, April 2026 Edition

Microsoft today pushed software updates to fix a staggering 167 security vulnerabilities in its Windows operating systems and related software, including a SharePoint Server zero-day and a publicly disclosed weakness in Windows Defender dubbed “ BlueHammer .” Separately, Google Chrome fixed its fourth zero-day of 2026, and an emergency update for Adobe Reader nixes an actively exploited flaw that can lead to remote code execution. Redmond warns that attackers are already targeting CVE-2026-32201 , a vulnerability in Microsoft SharePoint Server that allows attackers to spoof trusted content or interfaces over a network. Mike Walters , president and co-founder of Action1 , said CVE-2026-32201 can be used to deceive employees, partners, or customers by presenting falsified information within trusted SharePoint environments. “This CVE can enable phishing attacks, unauthorized data manipulation, or social engineering campaigns that lead to further compromise,” Walters said. “The presence of active exploitation significantly increases organizational risk.” Microsoft also addressed BlueHammer ( CVE-2026-33825 ), a privilege escalation bug in Windows Defender. According to BleepingComputer, the researcher who discovered the flaw published exploit code for it after notifying Microsoft and growing exasperated with their response. Will Dormann , senior principal vulnerability analyst at Tharros , says he confirmed that the public BlueHammer exploit code no longer works after installing today’s patches. Satnam Narang , senior staff research engineer at Tenable , said April marks the second-biggest Patch Tuesday ever for Microsoft. Narang also said there are indications that a zero-day flaw Adobe patched in an emergency update on April 11 — CVE-2026-34621 — has seen active exploitation since at least November 2025. Adam Barnett , lead software engineer at Rapid7 , called the patch total from Microsoft today “a new record in that category” because it includes nearly 60 browser vulnerabilities. Barnett said it might be tempting to imagine that this sudden spike was tied to the buzz around the announcement a week ago today of Project Glasswing — a much-hyped but still unreleased new AI capability from Anthropic that is reportedly quite good at finding bugs in a vast array of software. But he notes that Microsoft Edge is based on the Chromium engine, and the Chromium maintainers acknowledge a wide range of researchers for the vulnerabilities which Microsoft republished last Friday. “A safe conclusion is that this increase in volume is driven by ever-expanding AI capabilities,” Barnett said. “We should expect to see further increases in vulnerability reporting volume as the impact of AI models extend further, both in terms of capability and availability.” Finally, no matter what browser you use to surf the web, it’s important to completely close out and restart the browser periodically. This is really easy to put off (especially if you have a bajillion tabs open at any time) but it’s the only way to ensure that any available updates get installed. For example, a Google Chrome update released earlier this month fixed 21 security holes, including the high-severity zero-day flaw CVE-2026-5281 . For a clickable, per-patch breakdown, check out the SANS Internet Storm Center Patch Tuesday roundup . Running into problems applying any of these updates? Leave a note about it in the comments below and there’s a decent chance someone here will pipe in with a solution.

0 views
neilzone 2 days ago

Resources to aid understanding someone else's perimenopause / menopause

I asked for reading recommendations, for a partner of someone who is going through the perimenopause / menopause. I got a lot of responses; thank you. I have included below those which seemed most relevant, for me to follow up on them. Apologies if I didn’t include your particular suggestions. I received quite a lot of advice too; thank you. Thayer said: I often help men understand their partners’ journeys as part of my therapy & coaching as it really affects men as well “Burning Up, Frozen Out” by Joe Warner and Rob Kemp “Menopause Manifesto” by Dr Jen Gunter (several recommendations for this) “Perimenopause Power” by Maisie Hill “Woman on Fire” by Sheila de Liz (multiple recommendations) anything by Dr Louise Newsome Trans experience of the menopause by Quinn Rhodes Two posts by Sundial : “Perimenopause hit me like a brick” and “Perimenopause: My HRT Journey” “Nobody told me about the way menopause restructures marriage. Here’s what I wish I knew then.” Ben’s toots “Body of Evidence” , including this episode “What’s Up Docs?” , including this episode “BDSM and the menopause” a Davina McCall documentatary (possibly this one )

0 views
Rik Huijzer 2 days ago

Is Jonathan Shelley A False Teacher?

Thanks to the following interaction, it seems to me Jonathan Shelly is a false teacher ![bannedpastor.jpg](/files/d46b244676374587) For example, _1 Timothy 3_, "This is a true saying, if a man desire the office of a bishop, he desireth a good work. A bishop then must be blameless, the husband of one wife, vigilant, sober, of good behaviour, given to hospitality, apt to teach; Not given to wine, no striker, not greedy of filthy lucre; but patient, not a brawler, not covetous;" Even if I was wrong on my "false teacher" claim, calling another pastor "lame" seems not to be an example of "good b...

0 views

I Will Never Respect A Website

If you like this piece and want to support my independent reporting and analysis, why not subscribe to my premium newsletter? It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words, including vast, detailed analyses of NVIDIA , Anthropic and OpenAI’s finances , and the AI bubble writ large . I recently put out the timely and important Hater’s Guide To The SaaSpocalypse , another on How AI Isn't Too Big To Fail , and a deep (17,500 word) Hater’s Guide To OpenAI .  Subscribing to premium is both great value and makes it possible to write these large, deeply-researched free pieces every week.  Soundtrack: Muse — Stockholm Syndrome I think the most enlightening thing about AI is that it shows you how even the most mediocre text inspires some sort of emotion. Soulless LinkedIn slop makes you feel frustration with a person for their lack of authenticity, but you can still imagine how they forced it out of their heads. You still connect with them, even if it’s in a bad way.  AI copy is dead. It is inert. The reason you can spot it is that it sounds hollow. I don’t care if a website says stuff on it because I typed in, just like I don’t care if it responds in a way that sounds human, because it all feels like nothing to me. I am not here to give a website respect, I will not be impressed by a website, nor will I grant a website any extra credit if it can’t do the right thing every time. The computer is meant to work for me. If the computer doesn’t do what I want, I change the kind of computer I use. LLMs will always hallucinate, their outputs are not trustworthy as a result, they cannot be deterministic, and any chance of any mistakes of any kind are unforgivable. I don’t care how the website made you feel: it’s a machine that doesn’t always work, and that’s not a very good machine.  I feel nothing when I see an LLM’s output. Tell me thank you or whatever, I don’t care. You’re a website. Oh you can spit out code? Amazing. Still a website.  Perhaps you’ve found value in LLMs. Congratulations! You should feel no compulsion to have to convince me, nor should you feel any pride in using a particular website. And if you feel you’re being judged for using AI, perhaps you should ask why you feel so vilified? Did the industry do something to somehow warrant judgment? Is there something weird or embarrassing about the product, such as it famously having a propensity to get things wrong? Perhaps it loses billions of dollars? Oh, it’s damaging to the environment too? And people are telling outright lies about it and constantly saying it’ll replace people’s jobs? And the CEOs are all greedy oafish sociopaths?  Did you try being cloying, judgmental, condescending, and aggressive to those who don’t like AI? Oh, that didn’t work? I can’t imagine why.  Sounds embarrassing! You must really like that website.  ChatGPT is a website. Claude is a website. While I guess Claude Code runs in a terminal window, that just means it’s an app, which I put in exactly the same mental box as I do a website.  Yet everything you read or hear or see about AI does everything it can to make you think that AI is something other than a website or an app. People that “discover the power of AI” immediately stop discussing it in the same terms as Microsoft Word, Google, or any other app or website. It’s never just about what AI can do today, but always about some theoretical “AGI” or vague shit about “AI agents” that are some sort of indeterminate level of “valuable” without anyone being able to describe why. Truly useful technology isn’t described in oblique or hyperbolic terms. For example, last week, IBM’s Dave McCann described using a series of “AI agents” to Business Insider Sounds like a website to me.  Sounds like a website using an LLM to summarize stuff to me. Why are we making all this effort to talk about what a website does?  My friend, this isn’t a “series of agents.” It’s an LLM that looks at stuff and spits out an answer. Chatbots have done this kind of thing forever. These aren’t “agents.” “Agents” makes it sound like there’s some sort of futuristic autonomous presence rather than a chatbot that’s looking at documents using technology that’s guaranteed to hallucinate incorrect information . Here’s a fun exercise: replace the word “agent” with “app,” and replace “AI” with “application.” In fact, let’s try that with the next quote: A variety of functions including searching for stuff, looking at stuff, generating stuff, transcribing a meeting, and searching for stuff. Wow! Who gives a fuck. Every “AI agent” story is either about code generation, summarizing some sort of information source, or generating something based on an information source that you may or may not be able to trust.  “Agent” is an intentional act of deception, and even “modern” agents like OpenClaw and its respective ripoffs ultimately boil down to “I can send you a reminder” or “I can transcribe a text you send me.” Yet everybody seems to want to believe these things are “valuable” or “useful” without ever explaining why. A page of OpenClaw integrations claiming to share “real projects, real automations [and] real magic” includes such incredible, magical use cases as “reads my X bookmarks and discusses them with me,” “check incoming mail and remove spam,” “researches people before meetings and creates briefing docs,” “schedule reminders,” “tracking who visits a website” (summarizing information), and “using voice notes to tell OpenClaw what to do,” which includes “distilling market research” (searching for stuff) and “tightening a proposal” (generating stuff after looking at it). I’d have no quarrel with any of this if it wasn’t literally described as magical and innovative. This is exactly the shit that software has always done — automations, shortcuts, reminders, and document work. Boring, potentially useful stuff done in an inefficient way requiring a Mac Mini and hundreds of dollars a day of API calls.  Even Stephen Fry’s effusive review of the iPad from 2010 , in referring to it as a “magical object,” still referred to it as “class,” “a different order of experience,” remarking on its speed, responsiveness, its “smooth glide,” and remarking that it’s so simple . Even Fry, a writer beloved for his effervescence and sophisticated lexicon, was still able to point at the things he liked (such as the design and simplicity) in clear terms. Even in couching it in terms of the future, Fry is still able to cogently explain why he’s excited about the present. Conversely, articles about Large Language Models and their associated products often describe them in one of three ways: This simply doesn’t happen outside of bubbles. The original CNET review of the iPhone — a technology I’d argue literally changed the way that human beings live their lives — still described it in terms that mirrored the reality we live in: I’d argue that technologies like cloud storage, contactless payments, streaming music, and video and digital photography have transformed our societies in ways that were obvious from the very beginning. Nobody sat around cajoling us to accept that we’d need to sunset our Nokia 3210s and get used to touchscreens because it was blatantly obvious that it was better on using the first iPhone.  Nobody ostracized you for not being sufficiently excited about iPhone apps. Git, launched in 2005, is arguably one of the single-most transformational technologies in tech history, changing how software engineers built all kinds of software . And I’d argue that Github, which came a few years later, was equally transformational.  I can’t find a single example of somebody being shamed for not being sufficiently excited, other than people arguing over whether Git was the superior version control software , or saying that  Github, a cloud-based repository for code and collaboration, was obvious in its utility. Those that liked it didn’t feel particularly defensive. Even articles about GitHub’s growth spoke entirely in terms rooted in the present. I realize this was before the hyper-polarized world of post-Musk Twitter, one where venture capital and the tech industry in general was a fraction of the size, but it’s really weird how different it feels when you read about how the stuff that actually mattered was covered. I must repeat that this was a very different world with very different incentives. Today’s tech industry is a series of giant group chats across various social networks and physical locations, with a much-larger startup community (yCombinator’s last batch had 199 people — the first had 8) influenced heavily by the whims of investors and the various cults of personality in the valley. While social pressure absolutely existed, the speed at which it could manifest and mutate was minute in comparison to the rabid dogs of Twitter or the current state of Hackernews. There were fewer VCs, too. In any case, no previous real or imagined tech revolution has ever inspired such eager defensiveness, tribalism or outright aggression toward dissenters, nor such ridiculous attempts to obfuscate the truth about a product outside of cryptocurrency, an industry with obvious corruption and financial incentives.  We’ve never had a cult of personality around a specific technology at this scale. There is something that AI does to people — in the way it both functions and the way that people react to it —  that inspires them to act, defensively, weirdly, tribally. I think it starts with LLMs themselves, and the feeling they create within a user. We all love prompts. We love to be asked questions about ourselves. We feel important when somebody takes interest in what we’re doing, and even more-so when they remember things about it and seem to be paying attention. LLMs are built to completely focus themselves on us and do so while affirming every single interaction.  Human beings also naturally crave order and structure, which means we’ve created frameworks in our head about what authoritative-sounding or looking information looks like, and the language that engenders trust in it. We trust Wikipedia both because it’s an incredibly well-maintained library of information riddled with citations and because it tonally and structurally resembles an authoritative source. Large Language Models have been explicitly trained to deliver information (through training on much of the internet including Wikipedia) in a structured manner that makes us trust it like we would another source massaged with language we’d expect from a trusted friend or endlessly-patient teacher. All of this is done with the intention of making you forget that you’re using a website. And that deception is what starts to make people act strangely. The fact that an LLM can maybe do something is enough to make people try it, along with the constant pressure from social media, peers and the mainstream media.  Some people — such as myself — have used LLMs to do things, seen that making them do said things isn’t going to happen very easily, and walked away because I am not going to use a website that doesn’t do what it says.  As I’ve previously said, technology is a tool to do stuff. Some technology requires you to “get used to it” — iPhones and iPads were both novel (and weird) in their time, as was learning to use the Moonlander ZSK — but in basically every example doesn’t involve you tolerating the inherent failings of the underlying product under the auspices of it “one day being better.” Nowhere else in the world of technology does someone gaslight you into believing that the problems don’t exist or will magically disappear. It’s not like the iPhone only occasionally allowed you to successfully take a photo, and reliable photography was something that you’d have to wait until the iPhone 3GS to enjoy. While the picture quality improved over time, every generation of iPhone all did the same basic thing successfully, reliably, and consistently.  I also think that the challenge of making an LLM do something useful is addictive and transformative. When people say they’ve “learned to use AI,” often they mean that they’ve worked out ways to fudge their prompts, navigate its failures, mitigate its hallucinations, and connect it to various different APIs and systems of record in such a way that it now, on a prompt, does something , and because they’re the ones that built this messy little process, they feel superior — because the model has repeatedly told them that they were smart for doing it and celebrated with them when they “succeeded.”  The term “AI agent” exists as both a marketing term and a way to ingratiate the user. Saying “yeah I used a chatbot to do some stuff” sounds boring, like you’re talking to an app or a website, but “using an AI agent” makes you sound like a futuristic cyber-warrior , even though you’re doing exactly the same thing. LLMs are excellent digital busyboxes for those who want to come up with a way to work differently rather than actually doing work. In WIRED’s article about journalists using AI , Alex Heath boasts that he “feels like he’s cheating in a way that feels amazing”: The linguistics of “transmitting an idea to an AI agent” misrepresent what is a deeply boring and soulless experience. Alex speaks into a microphone, his words are transcribed, then an LLM burps out a draft. A bunch of different services connect to Claude Cowork and a text document (that’s what the “custom set of instructions” is) that says how to write like him, and then it writes like him, and then he talks to it and then sometimes writes bits of the story himself. This is also most decidedly not automation. Heath still must sit and prompt a model again and again. He must still maintain connections to various services and make sure the associated documents in Notion are correct. He must make sure that Granola actually gets the transcriptions from his interview. He must (I would hope) still check both the AI transcription and the output from the model to make sure quotes are accurate. He must make sure his calendar reflects accurate information. He must make sure that Claude still follows his “voice and writing style” — if you can call it that given the amount of distance between him and the product. Well, Alex, you’re not telling anybody anything, your ideas and words come out of a Large Language Model that has convinced you that you’re writing them.  In any case, Heath’s process is a great example of what makes people think they’re “using powerful AI.” Large Language Models are extremely adept at convincing human beings to do most of the work and then credit “AI” with the outcomes. Alex’s process sounds convoluted and, if I’m honest, a lot more work than the old way of doing things. It’s like writing a blog using a machine from Pee-wee’s Playhouse.  I couldn’t eat breakfast that way every morning. I bet it would get old pretty quick. This is the reality of the Large Language Model era. LLMs are not “artificial intelligence” at all. They do not think, they do not have knowledge, they are conjuring up their own training data (or reflecting post-training instructions from those developing them or documents instructing them to act a certain way), and any time you try and make them do something more-complicated, they begin to fall apart, and/or become exponentially more-expensive. You’ll notice that most AI boosters have some sort of bizarre, overly-complicated way of explaining how they use AI. They spin up “multiple agents” (chatbots) that each have their own “skills document” (a text document) and connect “harnesses” (python scripts, text files that tell it what to do, a search engine, an API) that “let it run agentic workflows” (query various tools to get an outcome.”  The so-called “agentic AI” that is supposedly powerful and autonomous is actually incredibly demanding of its human users — you must set it up in so many different ways and connect it to so many different services and check that every “agent” (different chatbot) is instructed in exactly the right way, and that none of these agents cause any problems (they will) with each other. Oh, don’t forget to set certain ones to “high-thinking” for certain tasks and make sure that other tasks that are “easier” are given to cheaper models, and make sure that those models are prompted as necessary so they don’t burn tokens. But the process of setting up all those agents is so satisfying, and when they actually succeed in doing something — even if it took fucking forever and costs a bunch and is incredibly inefficient — you feel like a god! And because you can “spin up multiple agents,” each one ready and waiting for you to give them commands (and ready to affirm each and every one of them), you feel powerful, like you’re commanding an army that also requires you to monitor whatever it does. The reason that LLMs have become so interesting for software engineers is that this is already how they lived. Writing software is often a case of taping together different systems and creating little scripts and automations that make them all work, and the satisfaction of building functional software is incredible, even at the early stages.  Large Language Models perform an impression of automating that process, but for the most part force you, the user, to do the shit that matters, even if that means “be responsible for the code that it puts out.” Heath’s process does not appear to take less time than his previous one — he’s just moved stuff around a bit and found a website to tell him he’s smart for doing so.  They are Language Models interpreting language without any knowledge or thoughts or feelings or ability to learn, and each time they read something they interpret meaning based on their training data, which means they can (and will!) make mistakes, and when they’re, say, talking to another chatbot to tell it what to do next, that little mistake might build a fundamental flaw in the software, or just break the process entirely.  And Large Language Models — using the media — exist to try and convince you that these mistakes are acceptable. When Anthropic launched its Claude For Finance tool , which claims to “automate financial modeling” with “pre-built agents” (chatbots) but really appears to just be able to create questionably-useful models via Excel spreadsheets and “financial research” based on connecting to documents in your various systems, I imagine with a specific system prompt. Anthropic also proudly announced that it had scored a 55.3% on the Finance Agent Test .  I hate to repeat myself, but I will not respect a website, and I will not tolerate something being “55% good” at something if its alleged use case is that it’s an artificial intelligence.  Yet that’s the other remarkable thing about the LLM era — that there are people who are extremely tolerant of potential failures because they believe they’re either A) smart enough to catch them or B) smart enough to build systems that do so for them, with a little sprinkle of “humans make mistakes too,” conflating “an LLM that doesn’t know anything fucking up by definition” with “a human being with experiences and the capacity for adaptation making a mistake.”  I truly have no beef with people using LLMs to speed up Python scripts to do fun little automations or to dig through big datasets, but please don’t try and convince me they’re being futuristic by doing so. If you want to learn Python, I recommend reading Al Sweigart’s Automate The Boring Stuff . Anytime somebody sneers at you and says you are being “left behind” because you’re not using AI should be forced to show you what it is they’ve created or done, and the specific system they used to do so. They should have to show you how much work it took to prepare the system, and why it’s superior to just doing it themselves.  Karpathy also had a recent (and very long) tweet about “ the growing gap in understanding of AI capability ,” involving more word salad than a fucking SweetGreen: Wondering what those “staggering improvements” are?  The one tangible (and theoretical!) example Karpathy gives is an example of how hard people work to overstate the capabilities of LLMs. “Coherently restructuring” a codebase might happen when you feed it to an LLM (while also costing a shit-ton of tokens, but putting that aside), or it might not understand at all because Claude Opus is acting funny that day , or it might sort-of fix it but mess something subtle up that breaks things in the future. This is an LLM doing exactly what an LLM does — it looks at a block of text, sees whether it matches up with what a user said, sees how that matches with its training data, and then either tells you things to do or generates new code, much like it would do if you had a paragraph of text you needed to fact-check. Perhaps it would get some of the facts right if connected to the right system. Perhaps it might make a subtle error. Perhaps it might get everything wrong. This is the core problem with the “checkmate, boosters — AI can write code!” problem. AI can write code. We knew that already. It gets “better” as measured by benchmarks that don’t really compare to real world success , and even with the supposedly meteoric improvements over the last few months, nobody can actually explain what the result of it being better is, nor does it appear to extend to any domain outside of coding. You’ll also notice that Karpathy’s language is as ingratiating to true believers as it is vague. Other domains are left unexplained other than references to “research” and “math.” I’m in a research-heavy business, and I have tried the most-powerful LLMs and highest-priced RAG/post-RAG research tools, and every time find them bereft of any unique analysis or suggestions.  I don’t dispute that LLMs are useful for generating code, nor do I question whether or not they’re being used by software developers at scale. I just think that they would be used dramatically less if there weren’t an industrial-scale publicity campaign run through the media and the majority of corporate America both incentivizing and forcing them to do so.  Similarly, I’m not sure anybody would’ve been anywhere near as excited if OpenAI and Anthropic hadn’t intentionally sold them a product that was impossible to support long-term.  This entire industry has been sold on a lie, and as capacity becomes an issue, even true believers are turning on the AI labs. About a year ago, I warned you that Anthropic and OpenAI had begun the Subprime AI Crisis , where both companies created “priority processing tiers” for enterprise customers (read: AI startups like Replit and Cursor), dramatically increasing the cost of running their services to the point that both had to dramatically change their features as a result. A few weeks later, I wrote another piece about how Anthropic was allowing its subscribers to burn thousands of dollars’ worth of tokens on its $100 and $200-a-month subscriptions, and asked the following question at the end: I was right to ask, as a few weeks ago ( as I wrote in the Subprime AI Crisis Is Here ) that Anthropic had added “peak hours” to its rate limits, and users found across the board that they were burning through their limits in some cases in only a few prompts . Anthropic’s response was, after saying it was looking into why rate limits were being hit so fast , to say that users were ineffectively utilizing the 1-million-token context window and failing to adjust Claude’s “thinking effort level” based on whatever task it is they were doing. Anthropic’s customers were (and remain) furious , as you can see in the replies of its thread on the r/Anthropic Subreddit . To make matters worse, it appears that — deliberately or otherwise — Anthropic has been degrading the performance of both Claude Opus 4.6 and Claude Code itself , with developers, including AMD Senior AI Director Stella Laurenzo, documenting the problem at length (per VentureBeat): Think that Anthropic cares? Think again:  Another developer found that Claude Opus 4.6 was “thinking 67% less than it used to,” though Anthropic didn’t even bother to respond. In fact, Anthropic has done very little to explain what’s actually happening, other than to say that it doesn’t degrade its models to better serve demand . To be clear, this is far from the only time that I’ve seen people complain about these models “getting dumber” — users on basically every AI Subreddit will say, at some point, that models randomly can’t do things they used to be able to, with nobody really having an answer other than “yeah dude, same.”  Back in September 2025, developer Theo Browne complained that Claude had got dumber , but Anthropic near-immediately responded to say that the degraded responses were a result of bugs that “intermittently degraded responses from Claude,” adding the following:  Which begs the question: is Anthropic accidentally making its models worse? Because it’s obvious it’s happening, it’s obvious they know something is happening, and its response, at least so far, has been to say that either users need to tweak their settings or nothing is wrong at all. Yet these complaints have happened for years, and have reached a crescendo with the latest ones that involve, in some cases, Claude Code burning way more tokens for absolutely no reason , hitting rate limits earlier than expected or wasting actual dollars spent on API calls. Some suggest that the problems are a result of capacity issues over at Anthropic, which have led to a stunning (at least for software used by millions of people) amounts of downtime, per the Wall Street Journal : This naturally led to boosters (and, for that matter, the Wall Street Journal) immediately saying that this was a sign of the “insatiable demand for AI compute”: Before I go any further: if anyone has been taking $2.75-per-hour-per-GPU for any kind of Blackwell GPU, they are losing money. Shit, I think they are at $4.08. While these are examples from on-demand pricing (versus paid-up years-long contracts like Anthropic buys), if they’re indicative of wider pricing on Blackwell, this is an economic catastrophe. In any case, Anthropic’s compute constraints are a convenient excuse to start fucking over its customers at scale. Rate limits that were initially believed to be a “ bug ” are now the standard operating limits of using Anthropic’s services, and its models are absolutely, fundamentally worse than they were even a month ago. It’s January 14 2026, and you just read The Atlantic’s breathless hype-slop about Claude Code , believing that it was “bigger than the ChatGPT moment,” that it was an “inflection point for AI progress,” and that it could build whatever software you imagined. While you’re not exactly sure what it is you’re meant to be excited about, your boss has been going on and on about how “those who don’t use AI will be left behind,” and your boss allows you to pay $200 for a year’s access to Claude Pro. You, as a customer, no longer have access to the product you purchased. Your rate limits are entirely different, service uptime is measurably worse, and model performance has, for some reason, taken a massive dip. You hit your rate limits in minutes rather than hours. Prompts that previously allowed you a healthy back-and-forth over a project are now either impractical or impossible.  Your boss now has you vibe-coding barely-functional apps as a means of “integrating you with the development stack,” but every time you feed it a screenshot of what’s going wrong with the app you seem to hit your rate limits again. You ask your boss if he’ll upgrade you to the $100-a-month subscription, and he says that “you’ve got to make do, times are tough.” You sit at your desk trying to work out what the fuck to do for the next four hours, as you do not know how to code and what little you’ve been able to do is now impossible. This is the reality for a lot of AI subscribers, though in many cases they’ll simply subscribe to OpenAI Codex or another service that hasn’t brought the hammer down on their rate limits. …for now, at least. The con of the Large Language Model era is that any subscription you pay for is massively subsidized, and that any product you use can and will see its service degraded as these companies desperately try to either ease their capacity issues or lower their burn rate. Yet it’s unclear whether “more capacity” means that things will be cheaper, or better, or just a way of Anthropic scaling an increasingly-shittier experience.  To explain, when an AI lab like Anthropic or OpenAI “hits capacity limits,” it doesn’t mean that they start turning away business or stop accepting subscribers, but that current (and new) subscribers will face randomized downtime and model issues, along with increasingly-punishing rate limits.  Neither company is facing a financial shortfall as a result of being unable to provide their services (rather, they’re facing financial shortfalls because they’re providing their services to customers. And yet, the only people that are the only people paying that price because of these “capacity limits” are the customers. This is because AI labs must, when planning capacity, make arbitrary guesses about how large the company will get, and in the event that they acquire too much capacity, they’ll find themselves in financial dire straits, as Anthropic CEO Dario Amodei told Dwarkesh Patel back in February :  What happens if you don’t buy enough compute? Well, you find yourself having to buy it last-minute, which costs more money, which further erodes your margins, per The Information : In other words, compute capacity is a knife-catching game. Ordering compute in advance lets you lock in a better rate, but having to buy compute at the last-minute spikes those prices, eating any potential margin that might have been saved as a result of serving that extra demand.  Order too little compute and you’ll find yourself unable to run stable and reliable services, spiking your costs as you rush to find more capacity. Order too much capacity and you’ll have too little revenue to pay for it. It’s important to note that the “demand” in question here isn’t revenue waiting in the wings, but customers that are already paying you that want to do more with the product they paid for. More capacity allows you to potentially onboard new customers, but they too face the same problems as your capacity fills.  This also begs the question: how much capacity is “enough”? It’s clear that current capacity issues are a result of the inference (the creation of outputs) demands of Anthropic’s users. What does adding more capacity do, other than potentially bringing that under control?  This also suggests that Anthropic’s (and OpenAI’s by extension) business model is fundamentally flawed. At its current infrastructure scale, Anthropic cannot satisfactorily serve its current paying customer base , and even with this questionably-stable farce of a product, Anthropic still expects to burn $14 billion . While adding more capacity might potentially allow new customers to subscribe, said new customers would also add more strain on capacity, which would likely mean that nobody’s service improves but Anthropic still makes money. It ultimately comes down to the definition of the word “demand.” Let me explain. Data center development is very slow. Only 5GW of capacity is under construction worldwide (and “construction” can mean anything from a single steel beam to a near-complete building). As a result, both Anthropic and OpenAI are planning and paying for capacity years in advance based on “demand.” “Demand” in this case doesn’t just mean “people who want to pay for services,” but “the amount of compute that the people who pay us now and may pay us in the future will need for whatever it is they do.”  The amount of compute that a user may use varies wildly based on the model they choose and the task in question — a source at Microsoft once told me in the middle of last year that a single user could take up as many as 12 GPUs with a coding task using OpenAI’s o4-mini — which means that in a very real sense these guys are guessing and hoping for the best. It also means that their natural choice will be to fuck over their current users to ease their capacity issues, especially when those users are paying on a monthly or — ideally — annual basis. OpenAI and Anthropic need to show continued revenue growth, which means that they must have capacity available for new customers, which means that old customers will always be the first to be punished. We’re already seeing this with OpenAI’s new $100-a-month subscription, a kind of middle ground between its $20 and $200-a-month ChatGPT subscriptions that appears to have immediately reduced rate limits for $20-a-month subscribers.  To obfuscate the changes further, OpenAI also launched a bonus rate limit period through May 31 2026 , telling users that they will have “10x or 20x higher rate limits than plus” on its pricing page while also featuring a tiny little note that’s very easy for somebody to miss: This is a fundamentally insane and deceptive way to run a business, and I believe things will only get worse as capacity issues continue. Not only must Anthropic and OpenAI find a way to make their unsustainable and unprofitable services burn less money, but they must also constantly dance with metering out whatever capacity they have to their customers, because the more extra capacity they buy, the more money they lose.   However you feel about what LLMs can do, it’s impossible to ignore the incredible abuse and deception happening to just about every customer of an AI service. As I’ve said for years, AI companies are inherently unsustainable due to the unreliable and inconsistent outputs of Large Language Models and the incredible costs of providing the services. It’s also clear, at this point, that Anthropic and OpenAI have both offered subscriptions that were impossible to provide at scale at the price and availability that they were leading up to 2026, and that they did so with the intention of growing their revenue to acquire more customers, equity investment and attention.  As a result, customers of AI services have built workflows and habits based on an act of deceit. While some will say “this is just what tech companies do, they get you in when it’s cheap then jack up the price,” doing so is an act of cowardice and allegiance with the rich and powerful.  To be clear, Anthropic and OpenAI need to do this. They’ve always needed to do this. In fact, the ethical thing to do would’ve been to charge for and restrict the services in line with their actual costs so that users could have reliable and consistent access to the services in question. As of now, anyone that purchases any kind of AI subscription is subject to the whims of both the AI labs and their ability to successfully manage their capacity, which may or may not involve making the product that a user pays for worse. The “demand” for AI as it stands is an act of fiction, as much of that demand was conjured up using products that were either cheaper or more-available. Every one of those effusive, breathless hype-screeds about Claude Code from January or February 2026 are discussing a product that no longer exists. On June 1 2026, any article or post about Codex’s efficacy must be rewritten, as rate limits will be halved .  While for legal reasons I’ll stop short of the most obvious word, Anthropic and OpenAI are running — intentionally or otherwise — deeply deceitful businesses where their customers cannot realistically judge the quality or availability of the service long-term. These companies also are clearly aware that their services are deeply unpopular and capacity-constrained, yet aggressively court and market toward new customers, guaranteeing further service degradations and potential issues with models. This applies even to API customers, who face exactly the same downtime and model quality issues, all with the indignity of paying on a per-million token basis, even when Claude Opus 4.6 decides to crap itself while refactoring something, running token-intensive “agents” to fix simple bugs or fails to abide by a user’s guidelines .  This is not a dignified way to use software, nor is it an ethical way to sell it.  How can you plan around this technology? Every month some new bullshit pops up. While incremental model gains may seem like a boon, how do you actually say “ok, let’s plan ahead” for a technology that CHANGES, for better or for worse, at random intervals? You’re constantly reevaluating model choices and harnesses and prompts and all kinds of other bullshit that also breaks in random ways because “that’s how large language models work.” Is that fun? Is that exciting? Do you like this? It seems exhausting to me, and nobody seems to be able to explain what’s good about it. How, exactly, does this change?  Right now, I’d guess that OpenAI has access to around 2GW of capacity ( as of the end of 2025 ), and Anthropic around 1GW based on discussions with sources. OpenAI is already building out around 10GW of capacity with Oracle, as well as locking in deals with CoreWeave ( $22.4 billion ), Amazon Web Services ( $138 billion ), Microsoft Azure ( $250 billion ), and Cerebras (“ 750MW ”). Meanwhile, Anthropic is now bringing on “multiple gigawatts of Google’s next-generation TPU capacity ” on top of deals with Microsoft , Hut8 , CoreWeave and Amazon Web Services. Both of these companies are making extremely large bets that their growth will continue at an astonishing, near-impossible rate. If OpenAI has reached “ $2 billion a month ” (which I doubt it can pay for) with around 2GW of capacity, this means that it has pre-ordered compute assuming it will make $10 billion or $20 billion a month in a few short years, which fits with The Information’s reporting that OpenAI projects it will make $113 billion in revenue in 2028. And if it doesn’t make that much revenue — and also doesn’t get funding or debt to support it — OpenAI will run out of money, much as Anthropic will if that capacity gets built and it doesn’t make tens of billions of dollars a month to pay for it. I see no scenario where costs come down, or where rate limits are eased. In fact, I think that as capacity limits get hit, both Anthropic and OpenAI will degrade the experience for the user (either through model degradation or rate limit decay) as much as they can.  I imagine that at some point enterprise customers will be able to pay for an even higher priority tier, and that Anthropic’s “Teams” subscription (which allows you to use the same subsidized subscriptions as everyone else) will be killed off, forcing anyone in an organization paying for Claude Code (and eventually Codex) via the API, as has already happened for Anthropic’s enterprise users. Anyone integrating generative AI is part of a very large and randomized beta test. The product you pay for today will be materially different in its quality and availability in mere months. I told you this would happen in September 2024 . I have been trying to warn you this would happen, and I will repeat myself: these companies are losing so much more money than you can think of, and they are going to twist the knife in and take as many liberties with their users and the media as they can on the way down.  It is fundamentally insane that we are treating these companies as real businesses, either in their economics or in the consistency of the product they offer.  These are unethical products sold in deceptive ways, both in their functionality and availability, and to defend them is to help assist in a society-wide con with very few winners. And even if you like this, mark my words — your current way of life is unsustainable, and these companies have already made it clear they will make the service worse, without warning, if they even acknowledge that they’ve done so directly. The thing you pay for is not sustainable at its current price and they have no way to fix that problem.  Do you not see you are being had? Do you not see that you are being used?  Do any of you think this is good? Does any of this actually feel like progress?  I think it’s miserable, joyless and corrosive to the human soul, at least in the way that so many people talk about AI. It isn’t even intelligent. It’s just more software that is built to make you defend it, to support it, to do the work it can’t so you can present the work as your own but also give it all the credit.  And to be clear, these companies absolutely fucking loathe you. They’ll make your service worse at a moment’s notice and then tell you nothing is wrong.  Anyone using a subscription to OpenAI or Anthropic’s services needs to wake up and realize that their way of life is going away — that rate limits will make current workflows impossible, that prices will increase, and that the product they’re selling even today is not one that makes any economic sense. Every single LLM product is being sold under false pretenses about what’s actually sustainable and possible long term. With AI, you’re not just the product, you’re a beta tester that pays for the privilege. And you’re a mark for untrustworthy con men selling software using deceptive and dangerous rhetoric.  I will be abundantly clear for legal reasons that it is illegal to throw a Molotov cocktail at anyone, as it is morally objectionable to do so. I explicitly and fundamentally object to the recent acts of violence against Sam Altman. It is also morally repugnant for Sam Altman to somehow suggest that the careful, thoughtful, determined, and eagerly fair work of Ronan Farrow and Andrew Marantz is in any way responsible for these acts of violence. Doing so is a deliberate attempt to chill the air around criticism of AI and its associated companies. Altman has since walked back the comments , claiming he “wishes he hadn’t used” a non-specific amount of the following words: These words remain on his blog, which suggests that Altman doesn’t regret them enough to remove them. I do, however, agree with Mr. Altman that the rhetoric around AI does need to change.  Both he and Mr. Amodei need to immediately stop overstating the capabilities of Large Language Models. Mr. Altman and Mr. Amodei should not discuss being “ scared ” of their models, or being “uncomfortable” that men such as they are in control unless they wish to shut down their services, or that they “ don’t know if models are conscious .”  They should immediately stop misleading people through company documentation that models are “ blackmailing ” people or, as Anthropic did in its Mythos system card , suggest a model has “broken containment and sent a message” when it A) was instructed to do so and B) did not actually break out of any container. They must stop discussing threats to jobs without actual meaningful data that is significantly more sound than “jobs that might be affected someday but for now we’ve got a chatbot.” Mr. Amodei should immediately cease any and all discussions of AI potentially or otherwise eliminating 50% of white collar jobs , as Mr. Altman should cease predicting when Superintelligence might arrive, as Mr. Amodei should actively reject and denounce any suggestions of AI “ creating a white collar bloodbath .” Those that defend AI labs will claim that these are “difficult conversations that need to be had,” when in actuality they engage in dangerous and frightening rhetoric as a means of boosting a company’s valuation and garnering attention. If either of these men truly believed these things were true, they would do something about it other than saying “you should be scared of us and the things we’re making, and I’m the only one brave enough to say anything.”  These conversations are also nonsensical and misleading when you compare them to what Large Language Models can do, and this rhetoric is a blatant attempt to scare people into paying for software today based on what it absolutely cannot and will not do in the future . It is an attempt to obfuscate the actual efficacy of a technology as a means of deceiving investors, the media and the general public.  Both Altman and Amodei engage in the language of AI doomerism as a means of generating attention, revenue and investment capital, actively selling their software and future investment potential based on their ownership of a technology that they say (disingenuously) is potentially going to take everybody’s jobs.  Based on reports from his Instagram , the man who threw the molotov cocktail at Sam Altman’s house was at least partially inspired by If Anyone Builds It, Everyone Dies, a doomer porn fantasy written by a pair of overly-verbose dunces spreading fearful language about the power of AI, inspired by the fearmongering of Altman himself. Altman suggested in 2023 that one of the authors might deserve the Nobel Peace Prize . I only see one side engaged in dangerous rhetoric, and it’s the ones that have the most to gain from spreading it. I need to be clear that this act of violence is not something I endorse in any way. I am also glad that nobody was hurt.  I also think we need to be clear about the circumstances — and the rhetoric — that led somebody to do this, and why the AI industry needs to be well aware that the society they’re continually threatening with job loss is one full of people that are very, very close to the edge. This is not about anybody being “deserving” of anything, but a frank evaluation of cause and effect.  People feel like they’re being fucking tortured every time they load social media. Their money doesn’t go as far. Their financial situation has never been worse . Every time they read something it’s a story about ICE patrols or a near-nuclear war in Iran, or that gas is more expensive, or that there’s worrying things happening in private credit. Nobody can afford a house and layoffs are constant. One group, however, appears to exist in an alternative world where anything they want is possible. They can raise as much money as they want . They can build as big a building as they want anywhere in the world. Everything they do is taken so seriously that the government will call a meeting about it . Every single media outlet talks about everything they do. Your boss forces you to use it. Every piece of software forces you to at least acknowledge that they use it too. Everyone is talking about it with complete certainty despite it not being completely clear why. As many people writhe in continual agony and fear, AI promises — but never quite delivers — some sort of vague utopia at the highest cost known to man. And these companies are, in no uncertain terms, coming for your job.  That’s what they want to do. They all say it. They use deceptively-worded studies that talk about “AI-exposed” careers to scare and mislead people into believing LLMs are coming for their jobs, all while spreading vague proclamations about how said job loss is imminent but also always 12 months away . Altman even says that jobs that will vanish weren’t real work to begin with , much as former OpenAI CTO Mira Murati said that some creative jobs shouldn’t have existed in the first place . These people who sell a product with no benefit comparable on any level to its ruinous, trillion-dollar cost are able to get anything they want at a time when those who work hard are given a kick in the fucking teeth, sneered at for not “using AI” that doesn’t actually seem to make their lives easier, and then told that their labor doesn’t constitute “real work.” At a time when nobody living a normal life feels like they have enough, the AI industry always seems to get more. There’s not enough money for free college or housing or healthcare or daycare but there’s always more money for AI compute.  Regular people face the harshest credit market in generations but private credit and specifically data centers can always get more money and more land .  AI can never fail — it can only be failed. If it doesn’t work, you simply don’t know how to “use AI” properly and will be “ at a huge disadvantage " despite the sales pitch being “this is intelligent software that just does stuff.”  AI companies can get as much attention as they need, their failings explained away, their meager successes celebrated like the ball dropping on New Years Eve, their half-assed sub-War Of The Worlds “Mythos” horseshit treated like they’ve opened the gates of Hell .  Regular people feel ignored and like they’re not taken seriously, and the people being given the most money and attention are the ones loudly saying “we’re richer than anyone has ever been, we intend to spend more than anyone has ever spent, and we intend to take your job.”  Why are they surprised that somebody mentally unstable took them seriously? Did they not think that people would be angry? Constantly talking about how your company will make an indeterminate amount of people jobless while also being able to raise over $162 billion in the space of two years and taking up as much space on Earth as you please is something that could send somebody over the edge.  Every day the news reminds you that everything sucks and is more expensive unless you’re in AI, where you’ll be given as much money and told you’re the most special person alive. I can imagine it tearing at a person’s soul as the world beats them down. What they did was a disgraceful act of violence.  Unstable people in various stages of torment act in erratic and dangerous ways. The suspect in the molotov cocktail incident apparently had a manifesto where he had listed the names and addresses of both Altman and multiple other AI executives, and, per CNBC, discussed the threat of AI to humanity as a justification for his actions. I am genuinely happy to hear that this person was apprehended without anyone being hurt.  These actions are morally wrong, and are also the direct result of the AI industry’s deceptive and manipulative scare campaign, one promoted by men like Altman and Amodei, as well as doomer fanfiction writers like Yudowsky, and, of course, Daniel Kokotajlo of AI 2027 — both of whom have had their work validated and propagated via the New York Times.  On the subject of “dangerous rhetoric,” I think we need to reckon with the fact that the mainstream media has helped spread harmful propaganda, and that a lack of scrutiny of said propaganda is causing genuine harm.  I also do not hear any attempts by Mr. Altman to deal with the actual, documented threat of AI psychosis, and the people that have been twisted by Large Language Models to take their lives and those of others . These are acts of violence that could have been stopped had ChatGPT and similar applications not been anthropomorphized by design, and trained to be “friendly.”  These dangerous acts of violence were not inspired by Ronan Farrow publishing a piece about Sam Altman. They were caused by a years-long publicity campaign that has, since the beginning, been about how scary the technology is and how much money its owners make.  I separately believe that these executives and their cohort are intentionally scaring people as a means of growing their companies, and that these continual statements of “we’re making something to take your job and we need more money and space to do it” could be construed as a threat by somebody that’s already on edge.  I agree that the dangerous rhetoric around AI must stop. Dario Amodei and Sam Altman must immediately cease their manipulative and disingenuous scare-tactics, and begin describing Large Language Models in terms that match their actual abilities, all while dispensing with any further attempts to extrapolate their future capabilities. Enough with the fluff. Enough with the bullshit. Stop talking about AGI. Start talking about this like regular old software, because that’s all that ChatGPT is.  In the end, if Altman wants to engage with “good-faith criticism,” he should start acting in good faith. That starts with taking ownership of his role in a global disinformation campaign. It starts with recognizing how the AI industry has sold itself based on spreading mythology with the intent of creating unrest and fear.  And it starts with Altman and his ilk accepting any kind of responsibility for their actions. I’m not holding my breath. As if their ability to try to do some of a task allows them to do the entire task.   As if their ability to do tasks is somehow impressive or a justification for their cost. An excuse for why they cannot do more hinged on something happening in the future.

0 views
Manuel Moreale 2 days ago

Vertical Tabs

The other day, as I was driving home, I had the bad idea of listening to the most recent Waveform podcast , where they were discussing vertical vs horizontal tabs in browsers (and many other things). The whole discussion was truly painful to listen to, you’d hope people who talk tech for a living have some more elaborate takes on this kind of stuff, and yet, the whole discussion was very, very dumb. I am not going to discuss the merits of vertical vs horizontal tabs, but I am going to say that if you are a fan of vertical tabs, you probably want to check out browser.horse , which has, in my opinion, the best take on vertical tabs I’ve seen so far. It’s obviously not for everyone, especially because it’s a browser with a subscription—for what should probably be an add-on on top of your regular browser—but still, it is a clever idea, that goes beyond simply putting tabs on the side. Thank you for keeping RSS alive. You're awesome. Email me :: Sign my guestbook :: Support for 1$/month :: See my generous supporters :: Subscribe to People and Blogs

0 views
Martin Fowler 2 days ago

Fragments: April 14

I attended the first Pragmatic Summit early this year, and while there host Gergely Orosz interviewed Kent Beck and myself on stage . The video runs for about half-an-hour. I always enjoy nattering with Kent like this, and Gergely pushed into some worthwhile topics. Given the timing, AI dominated the conversation - we compared it to earlier technology shifts, the experience of agile methods, the role of TDD, the danger of unhealthy performance metrics, and how to thrive in an AI-native industry. ❄                ❄                ❄                ❄                ❄ Perl is a language I used a little, but never loved. However the definitive book on it, by its designer Larry Wall, contains a wonderful gem. The three virtues of a programmer: hubris, impatience - and above all - laziness . Bryan Cantrill also loves this virtue : Of these virtues, I have always found laziness to be the most profound: packed within its tongue-in-cheek self-deprecation is a commentary on not just the need for abstraction, but the aesthetics of it. Laziness drives us to make the system as simple as possible (but no simpler!) — to develop the powerful abstractions that then allow us to do much more, much more easily. Of course, the implicit wink here is that it takes a lot of work to be lazy Understanding how to think about a problem domain by building abstractions (models) is my favorite part of programming. I love it because I think it’s what gives me a deeper understanding of a problem domain, and because once I find a good set of abstractions, I get a buzz from the way they make difficulties melt away, allowing me to achieve much more functionality with less lines of code. Cantrill worries that AI is so good at writing code, we risk losing that virtue, something that’s reinforced by brogrammers bragging about how they produce thirty-seven thousand lines of code a day. The problem is that LLMs inherently lack the virtue of laziness. Work costs nothing to an LLM. LLMs do not feel a need to optimize for their own (or anyone’s) future time, and will happily dump more and more onto a layercake of garbage. Left unchecked, LLMs will make systems larger, not better — appealing to perverse vanity metrics, perhaps, but at the cost of everything that matters. As such, LLMs highlight how essential our human laziness is: our finite time forces us to develop crisp abstractions in part because we don’t want to waste our (human!) time on the consequences of clunky ones. The best engineering is always borne of constraints, and the constraint of our time places limits on the cognitive load of the system that we’re willing to accept. This is what drives us to make the system simpler, despite its essential complexity. This reflection particularly struck me this Sunday evening. I’d spent a bit of time making a modification of how my music playlist generator worked. I needed a new capability, spent some time adding it, got frustrated at how long it was taking, and wondered about maybe throwing a coding agent at it. More thought led to realizing that I was doing it in a more complicated way than it needed to be. I was including a facility that I didn’t need, and by applying yagni , I could make the whole thing much easier, doing the task in just a couple of dozen lines of code. If I had used an LLM for this, it may well have done the task much more quickly, but would it have made a similar over-complication? If so would I just shrug and say LGTM? Would that complication cause me (or the LLM) problems in the future? ❄                ❄                ❄                ❄                ❄ Jessica Kerr (Jessitron) has a simple example of applying the principle of Test-Driven Development to prompting agents . She wants all updates to include updating the documentation. Instructions – We can change AGENTS.md to instruct our coding agent to look for documentation files and update them. Verification – We can add a reviewer agent to check each PR for missed documentation updates. This is two changes, so I can break this work into two parts. Which of these should we do first? Of course my initial comment about TDD answers that question ❄                ❄                ❄                ❄                ❄ Mark Little prodded an old memory of mine as he wondered about to work with AIs that are over-confident of their knowledge and thus prone to make up answers to questions, or to act when they should be more hesitant. He draws inspiration from an old, low-budget, but classic SciFi movie: Dark Star . I saw that movie once in my 20s (ie a long time ago), but I still remember the crisis scene where a crew member has to use philosophical argument to prevent a sentient bomb from detonating . Doolittle: You have no absolute proof that Sergeant Pinback ordered you to detonate. Bomb #20: I recall distinctly the detonation order. My memory is good on matters like these. Doolittle: Of course you remember it, but all you remember is merely a series of sensory impulses which you now realize have no real, definite connection with outside reality. Bomb #20: True. But since this is so, I have no real proof that you’re telling me all this. Doolittle: That’s all beside the point. I mean, the concept is valid no matter where it originates. Bomb #20: Hmmmm…. Doolittle: So, if you detonate… Bomb #20: In nine seconds…. Doolittle: …you could be doing so on the basis of false data. Bomb #20: I have no proof it was false data. Doolittle: You have no proof it was correct data! Bomb #20: I must think on this further. Doolittle has to expand the bomb’s consciousness, teaching it to doubt its sensors. As Little puts it: That’s a useful metaphor for where we are with AI today. Most AI systems are optimised for decisiveness. Given an input, produce an output. Given ambiguity, resolve it probabilistically. Given uncertainty, infer. This works well in bounded domains, but it breaks down in open systems where the cost of a wrong decision is asymmetric or irreversible. In those cases, the correct behaviour is often deferral, or even deliberate inaction. But inaction is not a natural outcome of most AI architectures. It has to be designed in. In my more human interactions, I’ve always valued doubt, and distrust people who operate under undue certainty. Doubt doesn’t necessarily lead to indecisiveness, but it does suggest that we include the risk of inaccurate information or faulty reasoning into decisions with profound consequences. If we want AI systems that can operate safely without constant human oversight, we need to teach them not just how to decide, but when not to. In a world of increasing autonomy, restraint isn’t a limitation, it’s a capability. And in many cases, it may be the most important one we build.

0 views

MagiCache: A Virtual In-Cache Computing Engine

MagiCache: A Virtual In-Cache Computing Engine Renhao Fan, Yikai Cui, Weike Li, Mingyu Wang, and Zhaolin Li ISCA'25 This paper presents an implementation of RISC-V vector extensions where all vector computation occurs in the cache (i.e., SRAM-based in-memory computation). It contains an accessible description of in-SRAM computation, and some novel extensions. Recall that SRAM is organized as a 2D array of bits. Each row represents a word, and each column represents a single bit location in many words. A traditional read operation occurs by activating a single row. Analog values are read out from each bit and placed onto shared bit lines. There are two bit lines per column (one holding the value, one holding the complement). Values flow down to sense amplifiers that output digital values. Prior work has shown that this basic structure can be augmented to perform computation. Rather than activating a single row, two rows are activated simultaneously (let’s call the values of these rows and ). The shared bit lines perform computation in the analog domain, which results in two expressions appearing on the output of the sense amplifiers: ( AND ) and ( NOR ). Fig. 1(a) shows a diagram of such an SRAM array: Source: https://dl.acm.org/doi/10.1145/3695053.3731113 If you slap some digital logic at the end of the sense amplifiers, then you can generate other functions like OR, XOR, XNOR, NAND, shift, add. Shift and add involve horizontal connections. Fig. 4(c) shows a hardware diagram of this additional logic at the end of the sense amplifiers. Note that the resulting value can be written back into the SRAM array for future use. Multiplication is not directly supported but can be implemented with a sequence of shift and add operations. Source: https://dl.acm.org/doi/10.1145/3695053.3731113 Virtual Engine The innovation in this paper is to dynamically share a fixed amount of on-chip SRAM for two separate purposes: caching and a vector register file. The logical vector register file capacity required for a particular algorithm depends on the number of architectural registers used, and the width of each architectural register (RISC-V vector extensions allow software to configure a logical vector width). Note that this hardware does not have separate vector ALUs, the computation is performed directly in the SRAM arrays. Fig. 6 illustrates how the hardware dynamically allocates SRAM space between generic cache storage and vector registers (with in-memory compute). The unit of allocation is a segment . The width of a vector register determines how many segments it requires. Source: https://dl.acm.org/doi/10.1145/3695053.3731113 Initially, all SRAM space is dedicated to caching. When the hardware processes an instruction that writes to an uninitialized vector register, then the hardware allocates segments to hold data for that register (evicting cached data if necessary). This system assumes an enlightened compiler which will emit a instruction to hint to the hardware when it has reached a point in the instruction stream where no vector register has valid content. The hardware can use this hint to reallocate all memory back to being used for caching. Fig. 8 shows performance results normalized against prior work (labeled here). This shows a 20%-60% performance improvement, which is pretty good considering that the baseline offers an order-of-magnitude improvement over a standard in-order vector processor. Source: https://dl.acm.org/doi/10.1145/3695053.3731113 Dangling Pointers I wonder how this would compare to hardware that did not have a cache, but rather a scratchpad with support for in-memory computing. Subscribe now

0 views
iDiallo 2 days ago

Back button hijacking is going away

When websites are blatantly hostile, users close them to never come back. Have you ever downloaded an app, realized it was deceptive, and deleted it immediately? It's a common occurrence for me. But there is truly hostile software that we still end up using daily. We don't just delete those apps because the hostility is far more subtle. It's like the boiling frog, the heat turns up so slowly that the frog enjoys a nice warm bath before it's fully cooked. With clever hostile software, they introduce one frustrating feature at a time. Every time I find myself on LinkedIn, it's not out of pleasure. Maybe it's an email about an enticing job. Maybe it's an article someone shared with me. Either way, before I click the link, I have no intention of scrolling through the feed. Yet I end up on it anyway, not because I want to, but because I've been tricked. You see, LinkedIn employs a trick called back button hijacking. You click a LinkedIn URL that a friend shared, read the article, and when you're done, you click the back button expecting to return to whatever app you were on before. But instead of going back, you're still on LinkedIn. Except now, you are on the homepage, where your feed loads with enticing posts that lure you into scrolling. How did that happen? How did you end up on the homepage when you only clicked on a single link? That's back button hijacking. Here's how it works. When you click the original LinkedIn link, you land on a page and read the article. In the background, LinkedIn secretly gets to work. Using the JavaScript method, it swaps the page's URL to the homepage. The method doesn't add an entry to the browser's history. Then LinkedIn manually pushes the original URL you landed on into the history stack. This all happens so fast that the user never notices any change in the URL or the page. As far as the browser is concerned, you opened the LinkedIn homepage and then clicked on a post to read it. So when you click the back button, you're taken back to the homepage, the feed loads, and you're presented with the most engaging post to keep you on the platform. If you spent a few minutes reading the article, you probably won't even remember how you got to the site. So when you click back and see the feed, you won't question it. You'll assume nothing deceptive happened. While LinkedIn only pushes you one level down in the history state, more aggressive websites can break the back button entirely. They push a new history state every time you try to go back, effectively trapping you on their site. In those cases, your only option is to close the tab. I've also seen developers unintentionally break the back button, often when implementing a search feature. On a search box where each keystroke returns a result, an inexperienced developer might push a new history state on every keystroke, intending to let users navigate back to previous search terms. Unfortunately, this creates an excessive number of history entries. If you typed a long search query, you'd have to click the back button for every character (including spaces) just to get back to the previous page. The correct approach is to only push the history state when the user submits or leaves the search box ( ). As of yesterday, Google announced a new spam policy to address this issue. Their reasoning: People report feeling manipulated and eventually less willing to visit unfamiliar sites. As we've stated before, inserting deceptive or manipulative pages into a user's browser history has always been against our Google Search Essentials. Any website using these tactics will be demoted in search results: Pages that are engaging in back button hijacking may be subject to manual spam actions or automated demotions, which can impact the site's performance in Google Search results. To give site owners time to make any needed changes, we're publishing this policy two months in advance of enforcement on June 15, 2026. I'm not sure how much search rankings affect LinkedIn specifically, but in the grand scheme of things, this is a welcome change. I hope this practice is abolished entirely.

0 views